Interview with CTO John Kelvie


john-kelvie-cto-bespokenWhat was your motivation behind starting Bespoken Tools?

Bespoken is an outgrowth of XAPPmedia, where we have been on the cutting edge of interactive audio for some time. And we see platforms like Amazon Alexa and Google Home as natural extensions to what we have been already doing in the mobile space.

And so our goal with Bespoken has been to make creating Voice-First experiences as easy as possible, for developers as well as content creators. This led to the creation of the Bespoken Tools offering, as well as our new diagnostics offering, Logless.

What problem do you solve?

bespoken-logoIn short, our own. We simply looked at the issues we encountered in building and operating Voice experiences, both technically and from a UX perspective. We then looked to remove these barriers as well as adopt best-practices for content creation and ongoing development of Alexa Skills.

The tools have now been out there in use for a bit and we have had a chance to further refine them based on feedback from users. The upshot is a toolkit that we believe is really essential if you are a developer working with Alexa. It makes the entire process faster, more accurate and eliminates a number of typical hassles.

You seem to have put out a number of tools for Alexa development. Can you give our readers some perspective on how they all fit together?

Yes, we help across the entire Skill creation process. For example, our proxy tool forwards requests/responses directly to a developer’s laptop for rapid development and debugging. That means you don’t have to upload your Lambda code to see every change. It will perform locally as you’d expect a deployed system to perform.

And when you are ready to deploy it, we also have a tool for that – it pushes out Lambdas to AWS. In addition, our new Logless diagnostics help with troubleshooting and monitoring skills, both during testing and in production. Finally, we are working on making our “code-free” tools, which we use internally to create skills, available to a wider audience.

How does this come together in concrete terms? A great example is our proxy and logging tool. Using the two during development, one can easily look back at the payloads that have come from Alexa, as well as the debug output from your own code, across the history of interactions with a skill. It really helps! Once the skill is ready for launch, it can be deployed to AWS using our deploy tools. User interactions in production can continue to be monitored via our logging. It is meant to all flow together and keep your work flowing.

I noticed that you have written an extensive developer diary about using Audio Streaming in Alexa development. Why does this merit so much attention and how is it different that what people may expect?

Yes, we were very excited about the release of the AudioPlayer interface. We are also excited about our first Alexa Skill that uses it, We Study Billionaires, which is the Alexa version of a top 20 business podcast, The Investors Podcast. Besides doing the expected podcast things, it also has a Scan feature that allows the user to browse through podcasts based on snippets and then jump into one that sounds interesting.

We also built the skill with the intention that it be a technical showcase. Look for more blogs from us on our learnings about the API, as well as the best practices we have developed for creating skills with it.

We think this is an innovative user experience that is far superior to the limitations offered through the Flash Briefing API. It shows some of the possibilities the AudioPlayer opens up. We look forward to doing more innovative things with it. Keep in mind our background in interactive audio. Stay tuned!

Are your tools specific to Amazon Alexa or can they be applied to other voice service platforms?

We built them initially for Amazon Alexa, but we have kept an eye on the broader market. We are obviously paying close attention to what Google is doing with Google Home and are participating in the developer program. We are planning to make our tools available there as appropriate. The symmetry between the platforms is remarkable. We do think we can help out there a lot as well.

Have you worked with the Google Home interface yet? How is it different than Alexa?

No comment on this today. We will have more to say about Google Home in the near future.

How is developing for voice-first and voice-only interfaces different from work you have done in the past?

Well, as I mentioned at the outset, our background is in interactive audio on mobile. So, it’s a space we are familiar with. In that way, it’s not such a departure.What is different is the absence of any visual UI. On mobile, we focused on voice, but there was also typically a visual component. We like being able to focus just on the voice and audio experience.

Crafting great user experiences that are purely audio is challenging. We are seeing best practices begin to emerge and indeed we have helped define a few of them. The use of repetition; the use of clarification; the use of far more open-ended interaction mechanisms than developers are perhaps typically used to; and accommodating simultaneously new and experienced skill users; all of  these practices differentiate voice from the visual world that preceded it. It’s also important to keep in mind that as quickly as these best practices emerge, the platforms also evolve. As new capabilities come out such as the AudioPlayer, taking advantage of them can make a big difference for your users.

What should a developer new to working with voice services be aware of before they start?

It’s a totally different world than apps! I love the whole new development model with the voice-first platforms, and chatbots more broadly. It is all webhook-driven, meaning that the core assistant, whether it be Alexa, Google Assistant, Facebook, or another, calls out to your service, which is running on its own infrastructure.

Having dealt with the pain of building apps that run on user devices, where an app is launched out into the world onto thousands of permutations of device and operating system versions, as well as configurations, this is a refreshingly simple model. But, it takes some getting used to it. We recommend people check out the free Bespoken tools, as our goal is to make things as easy as possible for this new webhook-centric model.

There are a lot of companies, large and small, jockeying for position as the consumer voice assistant of choice. How do you expect the market to play out?

If you work in technology, I think this has the potential to be the heavyweight battle of the century. So, I like this question.

Amazon took the early lead with the Echo device, which is the first voice experience people really love. But Google is obviously not taking it lightly with the launch of the Home device. I’m amazed at how quickly Google was able to launch it, and how much breadth it looks to have already. Clearly, this space is a priority for them.

A showdown is shaping up, not just between two tech Goliaths, but pitting their great strengths against each other. It is Google’s AI versus Amazon’s API. No one is better at AI than Google, and no one is better at building platforms than Amazon. And both are needing to do what the other has traditionally shined at. So who will win? Obviously unknown, but things that interest me are how fast Amazon has pushed out all aspects of their API. It is not just about Skill development, but also creating Alexa-enabled devices – in some cases the devices even look like direct Echo competitors. How long before we see it as a primary interface in cars? And being built into new homes? The breadth of use-cases they are covering is amazing and clearly they are pushing very, very hard to achieve ubiquity.

So, will that be enough to overcome what almost every reviewer acknowledges is Google’s superior AI? Are Google’s advantages with voice recognition, natural language processing, and contextual semantics simply insurmountable? These are just questions at this point, but they are the ones I am asking and keeping an eye on to gauge the outcome.

What is the biggest issue facing the voice assistant and voice-first market today?

Voice recognition is still very much a challenge. As I mentioned, Echo may be the voice interface people really love, but it still presents lots of challenges for developers, and perhaps even more so for the platform creators. Ideally, you would be able to interact with core capabilities as well as third-party functions seamlessly and intuitively. But the limits of voice recognition do get in the way. That is why for early users, we became accustomed to and frustrated by saying “Alexa, tell Spotify to play Positively Fourth Street by Bob Dylan.” It is a mouthful, it was hard to remember and Alexa did not always get it right.

Amazon alleviated this pain point by allowing Spotify to be selected as the default music service. The experience is much better now. But for non-native skills, this continues to be a challenge as well as a place for innovation.

What is the biggest opportunity in this space?

The long-form audio opportunities enabled by the AudioPlayer are very exciting. From radio to podcasts to audio books, there is a lot to be done here, and lots of possibilities for people who already have audio content.

And more broadly, there is just one of those giant seismic shifts happening as content providers across the board try to adjust to these new platforms. I know you talk about the Voice Web, and I think the analogy is apt in that existing content publishers are challenged to figure how they are going to interact with their users via Alexa and Google Home. If the devices get the sort of traction we expect and hope for, it has the potential to be a tremendously disruptive event, comparable to the rise of the Web or mobile.

What is your favorite voice-enabled skill, application or utility?

I really like EarPlay. I think their backgrounds are compelling as game designers and content creators. They look like people who are in this for the long haul to really marry the creative-side with the new possibilities the technology creates. They are creating choose-your-own-adventure-style skills, but that is a bit reductive. Interactive audio stories I believe is the label they have used.

I think a question they may be asking themselves, and everyone else can as well, is “What would Douglas Adams do?” He embraced new media platforms as an author going back to the 80s, not just writing timeless books but also less well-known, stellar games and interactive experiences for computers. He was able to quickly push past the mechanical capabilities and constraints and wring the art from these new machines. He would have a field day with this new technology.