Does ChatGPT Mark the End of the Voice Assistant Era or is it a False Comparison?
ChatGPT is great, but what about voice assistants? Is the voice era over, or are we on the precipice of a rebirth?
It is logical to conclude we are enmeshed in an era of text and images spurred by ChatGPT and Midjourney. However, ChatGPT now talks, Google Assistant is getting Bard, and Alexa has an LLM treatment. The new digital assistants look different, but the leading general-purpose solutions are becoming conversational.
Against this backdrop, Tobias Dengel’s new book, The Sound of the Future, launched earlier this month. One of the things that struck me when I first spoke with Tobias several years ago was his observation about affective trust (making an emotional connection) versus cognitive trust (reliably serving the intended user need) in relation to voice assistants. He contended that Amazon Alexa and Apple Siri spent too much time on the former while neglecting the latter.
Note that ChatGPT, Midjourney, and others are all blockbuster hits that deliver well on the cognitive trust scale and are seemingly devoid of the personality and empathy that define affective trust. No one is using ChatGPT as a surrogate relationship. Do you remember all of the Alexa marriage proposals?
Voice Tech Benefits Persist
The Sound of the Future explains why voice-interactive solution adoption is not going away, and may start expanding.
80 percent of those now using voice tools describe themselves as satisfied with their performance. Surveys also show that over 70 percent of consumers say they prefer using voice to conduct online searches whenever possible, rather than relying primarily on their keyboards…Today, the uses of voice tools are rapidly expanding beyond the basics–checking sports scores or the weather, or dictating an email. A growing number of employees around the world are using voice tech on the job in some way, from warehouse workers locating items in stock to field technicians delivering and receiving reports from remote client sites. In fact, one study finds that 76 percent of businesses report they have experienced ‘quantifiable benefits’ from voice tech initiatives, such as incorporating voice bots into their processes.
Dengel follows this by laying out the benefits of voice, such as speed, safety, knowledge, inclusion, engagement, and transformation. Voice interactive capabilities extend the value of many existing use cases by offering a better interface for certain contexts. These benefits are universal needs. They are not specific to a technology. Voice interaction happens to make them easier to achieve in many situations. This is a core reason why voice user interfaces will be a more significant part of our future than our present.
The Convenience Imperative
It is worth noting that voice assistants were largely adopted for the convenience they provided users. Alexa, Bixby, Google Assistant, and Siri did not introduce new capabilities like mobile did for ride-sharing. There were existing means to accomplish the same user intents that voice assistants fulfilled. The voice user interface made them faster or more easily accessible.
Of course, voice-and-audio-only interactions sometimes make the digital experience worse. You don’t want to listen to the description of a chart. You just want to view it. There are also times when a single tap on a screen is more efficient than making a spoken request. However, those instances are the exceptions. More commonly, speaking is the most efficient and versatile means of communication.
Assistants vs Interface vs Features
Adam Cheyer, the co-founder of the companies whose solutions became Apple’s Siri and Samsung’s Bixby, stressed that voice is a user interface, not a platform. It facilitates digital interactions. The assistants are the platform. This is an important distinction, and Dengel’s thesis is more about the interface angle.
ChatGPT’s system notes refer to it as an assistant. It offers natural language inputs and outputs, much like a voice assistant. While ChatGPT began with a text interface, you can now use voice interaction in the mobile app. It has extended text interaction to voice and sound as well as images. The revolution is using natural language to control digital interactions and, where appropriate, receive answers. Voice is the sound of the future as it is slowly replacing the keyboard, mouse, and touch interface.
Another consideration is the terminology. We saw this with the rise in voice interactive technology such as natural language processing, and it is true again with generative AI. There is the voice user interface (UI) and the voice assistant. The former is a UI feature, while the latter is a capability, or application if you prefer. There are also generative AI features, such as writing assistance or image generation in Canva, and there are generative AI assistants, such as ChatGPT and Google Bard. It is an assistant if it’s called a Copilot or a similar term. Almost every other instance is simply a bolt-on feature to augment an existing application’s value.
A key point here is that generative AI assistants may be replacing or substituting for voice assistants, but they are also incorporating voice user interfaces. Some of them even have pseudo-conversational abilities. At the same time, voice assistants such as Alexa and Google Assistant are adding generative AI capabilities. It is time to abandon the technology (generative AI) and user interface (voice) modifiers before the term assistant. These are just assistants, or maybe more precisely, they are digital assistants.
Adoption and Stickiness
This brings us back to cognitive trust. Users return to applications that work reliably. The personality and affective trust may enhance the perceived value, but it won’t overcome an unreliable experience. This is why many people employ Alexa as a command and control interface with single-turn interactions. Those features generally work well. The more extended conversations do not.
The conversation-as-entertainment is working for new generative AI applications, such as Chartacter.ai. In those instances, affective trust is more important than for the assistant paradigm. Still, conversational interactions also need the cognitive trust element. This is true whether you are talking about an assistant, user interface, or feature. Technology adoption and stickiness rely on reliability.
What vs How
The technology I’m describing does depend, in part, on the latest developments of artificial intelligence (AI), machine learning, and computing power. But it draws much of its magical-seeming power from the world’s most ancient and ubiquitous human innovation–the 100,000-year-old ‘technology’ known as human speech…
For all of the power of modern digital computers, we’ve been forced to communicate with them using keyboards, mice, and touch screens–all more or less awkward, slow, error-prone, dreadfully unnatural, and for some people’s purposes, practically impossible. For the last 100-plus years, we’ve used our hands to communicate with machines via buttons, knobs, pedals, levers, and keyboards. Voice technology promises to liberate us from these clumsy tools and return to the innate form of communication we humans have known for thousands of years. It’s the ultimate interface that will make everything we do with technology easier, faster, more accurate, more fun, and in the end, more human.
These points by Dengel are worth remembering. While it is unclear whether voice assistants will serve the latest technology upheaval or be replaced by generative AI assistants. What is sure to remain is that voice interaction will be a beneficial user interface when interacting with technology. The assistant is the what. That value delivery mechanism will evolve and change outright. The voice user interface is a how, and it is the most natural way for humans to communicate their intent with other humans or machines. The critical factor is that the machines can understand spoken communication and have the capabilities to fulfill those intents.
This is probably why Dengle’s book is striking a chord with business readers. It just hit #3 on the Wall Street Journal’s bestselling business books for the week and #14 on USA Today’s national bestseller list. The voice assistant era is in transition, but the voice era is just getting started. In fact, generative AI is poised to offer it another boost.
I recommend you check out The Sound of the Future, which is available in hardcover and audiobook. It reinforces some core concepts around the value and application of voice technology and clarifies how voice user interfaces will transcend the evolving technology platforms because spoken interactions are often the most convenient and practical for humans.
Follow @bretkinsella Follow @voicebotai
OpenAI Turns ChatGPT into a Voice Assistant That Can See and Understand Images and Speech