Amazon Echos Build or Buy

Is it Better to Build or Buy a Voice App? Ask Yourself 3 Questions.

The incredible adoption of voice activated devices is driving brands to rapidly embark on a journey towards building engaging voice applications. In addition to identifying a voice-first use case to build, brands must also decide which tools and techniques their team should use to achieve this vision. In our evolution over the past seven years of building conversational experiences, we’ve seen it all, and want to share our key learnings and considerations when beginning your journey with voice.

Voice App Development Challenges

Organizations early in the voice journey have had to identify and rely upon highly technical skill sets. These skills often include AI/Machine Learning scientists, conversational UI designers, API designers, database administrators, and DevOps engineers. This approach has proven to be expensive, taken longer than expected, and once deployed, maintenance and improvements to drive engagement and success continue to require high investments.

It’s common to see a team working with a variety of tools and frameworks, ranging from technical teams writing custom code in languages such as Python or Node.js, to creative professionals designing conversations and scripting lines using spreadsheets! While a creative use for spreadsheets, it turns out they really are optimized for financial calculations and forecasts, not dialogue writing and design.

These hurdles are the result of inefficient approaches to new challenges presented by building great, modern voice experiences. Here are the considerations and learnings we’ve encountered in our own journey that have led to some of the most exciting voice apps in market such as HBO’s WestWorld, Freemantle’s Match Game, iHeartRadio’s Rising Star Competition and many others.

1. Do you have the resources available to build from the ground-up?

When building voice apps from the ground-up, organizations will need to ensure they have the capability to:

  • Identify the technical resources required to build your voice app
  • Schedule available technical resources to adequately staff the project
  • Train your programmers on the Voice Platform SDK (such as the Amazon Alexa Skills Kit) and conversational AI concepts such as ASR, intent recognition, slot filling, etc.
  • Train your programmers on Cloud Services (such as Amazon AWS) and to write code for them (such as Lambda functions)
  • Create and maintain a process for collaboration between technical resources and non-technical content creators
  • Deploy, monitor, and maintain your voice application’s infrastructure
  • Write the voice app code, including all logic for managing the conversational state and dialogue flow
  • Test, iterate, re-deploy, and repeat as needed, based on user feedback
  • Extract lines of dialogue contextual to user input from source code if other divisions of your company need to review the content of what your skill says to users
  • Publish code, intents, and voice app
  • Monitor voice app performance and experience
  • Analyze data and iterate with technical team

For important voice apps, you’ll need several developers to perform the work. You will also need a project manager, creatives, hardware, software, and computing resources. If you have an internal team (or agency partner) and the capabilities to handle this already, that’s great! With the right voice-centric tools and systems, you will find your team can become highly productive.

However, it is common to see organizations that build voice apps from scratch to find that post launch, the gravity of the next steps begin to weigh too much. Next steps could include expanding to another voice platform (e.g. dual deploying an Alexa Skill as a Google Assistant Action) or starting development on the next voice app. Both of these are very likely to require refactoring the code (lest you accrue technical debt), for example, generalizing interactions for both platforms or conversational mechanics shared by both voice apps, respectively. Both of these options mean that you are at the beginning stages of building a Conversation AI Engine. In addition, a next step could be iterating on the voice app and be as simple as making a small wording change. This change may require a designer to update a spreadsheet, a developer to ingest the changes into code, DevOps to re-deploy the code base, QA to smoke test the voice app, and finally a project manager to give the green light to publish to the production voice app.

At this point, the reality and cost of voice app development begins to set in. Ask yourself: should developing voice app infrastructure be a part of your voice strategy or is it a core part of your business? If not, be aware that with voice app development, switching costs are high and it may make more sense to take on the journey with a trusted partner.

2. What tools should these resources use?

Assuming you’ve got the talent inside your organization to begin building compelling voice apps, the next series of efforts will focus on the user experience of interacting with your voice app. From a design perspective, a variety of tools have been repurposed to help define user flows and interaction models. These tools include Microsoft Visio, Excel, PowerPoint, or design products from Adobe, InVision, or Sketch. None of these tools directly integrate with the voice platforms being designed for, so there are several steps, requiring a developer, to go from a concept to an actual experience one can interact with. This introduces some additional friction and delay, compounding with each iteration, in building the most compelling solution.

This is common in organizations where the primary focus is on the engineering aspect of voice development.  This is understandably important, but the best voice apps go beyond this.  The best voice apps bring real value to users and are pleasant to converse with. These are compellingly brought to life by people who have some combination of product sense, a way with words, and an interaction background, be it human–computer interaction (HCI) or improv acting. Looking at the evolution of web and mobile development, there was a marked leap in quality with the rise of What You See Is What You Get (WYSIWYG) tools empowering designers, especially those that were well integrated with the technical stack.

While there is less “seeing” and more “hearing” in voice app development, the same principle holds true: voice app quality increases when conversation designers have tools that contain the voice interaction iteration loop. A tool that puts just as much focus on the conversation aspect of voice development as engineering and even further, embraces the natural tension between the two, will enable cross-functional teams to rapidly iterate through the design process and build a richer and deeper experience. The more that teams can focus on the actual conversation and business logic of the voice app and not on boilerplate infrastructure, the better the voice ecosystem will become.

3. How much time can you afford to get your voice app to market?

In our journey with organizations over the past number of years, we’ve seen the value in quickly placing experiences in market to test, validate, and learn. As more and more brands aggressively enter the market, deploying quickly and iterating constantly to improve your voice app experiences is increasingly important.  We call this the “crawl, walk, run” approach.  In other words, quickly getting to market with a “crawl” voice app is better than slowly trying to perfect a “run” experience because of how quickly the market is evolving.

As we’ve covered above, building voice apps from the ground up requires developers to become familiar with and implement the various platform components (e.g. SDKs, and APIs, deploying a set of services on platforms like AWS), write boilerplate conversational AI code to handle things like state management, defining slot types, interaction models, content handling, and then record and upload media assets (like sound effect or custom audio files). That takes a lot of time and resources, and if you’re not careful, you can end up going over budget and past deadlines dealing with this overhead if you don’t have experience building voice applications already.

Balancing the Interplay of Resources, Tools and Time

After looking at how much it would cost to onboard and train developers, find the development time, and the costs of using computing resources, a number of leading brands determined that PullString’s voice technology platform offered the best approach for creative-led teams to rapidly design, prototype and publish voice applications that augment their digital strategy.

In conclusion, organizations that are looking to differentiate their brands from competitors by engaging customers through voice platforms need to consider the best use of their resources, tools and time. Whether you decide to buy a voice app solution or build your own, it is important to follow key principles for voice design, which are very different from web or mobile design. The opportunity here is the ability to have good conversations at scale. What conversations do you want to have and what tools will allow you to design and build them? Best of luck on your voice journey and we look forward to hearing about your innovative ideas! 

Patrick Ku is head of solutions at PullString