Amazon Gives Alexa a Generative AI Makeover
Amazon has given its Alexa voice assistant a full generative AI overhaul, unveiling a host of new and improved features and functions at this year’s showcase of new devices and services. The tech giant claims the teased preview represents the biggest leap forward yet for Alexa and one that will be shared with every device where Alexa is accessible.
Amazon executives demonstrated several ways Alexa will leverage a new variation of Amazon’s custom large language models (LLMs) to encourage more intuitive and useful interactions. The company did hedge its bets by acknowledging that Alexa “will make mistakes” as generative AI abilities remain imperfect. But it stressed that shortcomings will diminish over time, just as with past Alexa upgrades, but is overall very bullish on generative AI.
“Our latest model has been specifically optimized for voice and the things we know our customers love—like having access to real-time information, efficiently controlling their smart home, and getting the most out of their home entertainment,” senior vice president of Amazon devices and services Dave Limp explained at the event. “I often say that Alexa is the best personal AI out there, but I will tell you it’s always been a little bit more transactional than we would like, but that is a limit of the technology, not the vision. And thanks to our latest LLM, you can now have a near human-like conversation with Alexa.”
Amazon outlined five ‘foundational capabilities’ of AI, which Alexa has been reimagined, titled Conversational, Real-world applications, Personalization, Personality, and Trust. Alexa will now use sensory input from Echo devices to infer non-verbal cues for more natural conversations and new speech recognition abilities so it can comprehend pauses and hesitations in real-time speech, meaning there isn’t a need to constantly repeat the wake word to have an ongoing conversation with Alexa, while also minimizing latency and cutting response time by as much as 40%. Alexa’s responses will also be more personalized, with recognition of user preferences and previous conversations used by Alexa to change how it responds to a user. As usual, Amazon stressed privacy protections remain a priority and that users will have oversight of their data and interactions.
“What we’re working on now is also a new model which we call speech to speech model, which instead of all these discrete tasks of first converting your audio into text through speech recognition and then having an LLM convert that text into a national response and then render the response in audio form. All of these tasks are unified into a single model such that the interaction becomes far more natural,” senior vice president and head scientist of Amazon Artificial General Intelligence Rohit Prasad said in his presentation. “In fact, Alexa can now exhibit attributes like laughter, surprise, and even others and “uh-huhs” to continue the conversation just like you and I do.”
The voice assistant will also have more of a personality of its own, with opinions and ideas unique to each user as a way to make exchanges more engaging. That includes new text-to-speech advancements to enable more expressive and conversational vocal tones. Here’s the difference between today and the updated version of Alexa’s voice:
“Having been a scientist for more than 2.5 decades in AI, every five to ten years, we witnessed a seismic shift. Ten years back, it was deep learning and we brought, with Alexa, deep learning into the hands of customers at that time,” Prasad said. “With generative AI, I feel like we are doing the same. We are remaking an assistant that is not only the world’s best but making it truly indispensable for our customers using the powers of generative AI.”