The 2022 Voice Tech and Conversational AI Trends That Mattered Most: Deepfakes and Deep Cuts to Alexa and Google Assistant
The voice and conversational AI worlds look much different at the end of 2022 than they did 12 months ago. Technology advances and business practice changes have led to a very different view of what AI can do and how it should be used. Google and Amazon took a look at their voice AI efforts and decided to shake things up in staff and resource allocation, even as businesses of all stripes made similar technology a default feature for their work. Deepfake technology hit critical mass for text, images, and videos, and excitement is warring with concern among many who are only beginning to probe its limits. Here’s what 2022 meant for Voicebot’s coverage, with a year of trends to look back on.
Google Assistant Conversational Action Sunset
Google’s overall strategy around AI this year centered on Android. That includes all of its Google Assistant work. As a result, the company decided to sunset Google Conversational Actions in favor of Android apps. Google prefers to bring voice services to Android devices, including mobile, wearables, and cars. That includes allowing the voice assistant to connect users to App Actions even if they don’t say the name of the service. Instead, users will be sent to the Play Store to install the app. Google’s vision is to help developers streamline their efforts by keeping it all based on Android. In other words, Google has ended third-party voice experiences on smart speakers and smart displays. And the company recently ended support for the web version of Duplex, the Google Assistant service for automated reservations and ticket purchases, sealing the choice of Android over everything.
Alexa Team Cuts
Meanwhile, Amazon’s Alexa Skills world remains intact, but the tech giant took its own approach to trimming its voice AI in November with major layoffs in the devices division, where Alexa and its associated products reside. Alexa Ambient and Health and Wellness are to be shut down, while many Alexa divisions saw at least some reductions in size. Amazon’s cuts included a few products as well, with 2022 marking the end of the Amazon Microwave, the Amazon Glow interactive projector and video call screen for kids, and the Cloud Cam, an early Alexa smart home device since overtaken by Blink and Ring. All of this happened as Amazon promoted senior vice president and Alexa head scientist Rohit Prasad to the leader for all of Alexa, the first AI scientist in the role. Prasad took up the reins of Alexa following the departure of Alexa COO Debra Chrapaty and the retirement of Alexa senior vice president Tom Taylor this summer but has been intimately involved in developing the technology behind voice assistants, particularly Alexa, before being put in charge of it. He has said his vision of Alexa and its future focus on making Alexa more proactive, with universal access through “ambient AI” and a multi-modal interface where Alexa can understand visual as well as vocal interactions.
Generative AI and Text-to-Image Creation
The power of text-to-image tools, like those used for the cookbook, was a close competitor for the biggest generative AI story of the year. The release and opening of tools like DALL-E, Midjourney, and Stable Diffusion have drawn huge crowds interested in seeing just how flexible such AI tools can be and whether they can give visual life to brief text prompts. Plenty of controversy over text-to-image tools accompanies the technology. A project for such art was banned from Kickstarter, and there were a lot of arguments and furious debates over whether AI art could be used to get a copyright, be used on a stock art platform, or win an art contest.
Synthetic media advances hit a particular high point this year with deepfakes, including voice clones and virtual humans. Deepfakes even reached the finals of America’s Got Talent (AGT), where synthetic media developer Metaphysic showed off a series of deepfake performances, culminating with Elvis Presley taking the stage. The finale performance went beyond visuals and included an audio performance enhanced by synthetic voice startup Respeecher, best known for de-aging and synthesizing the voices of Darth Vader and Luke Skywalker for recent Star Wars TV shows. Money to fund and buy poured into startups with the capacity for deepfakes, including Deep Voodoo, from the creators of South Park and Korean synthetic voice startup Supertone. The same concerns about synthetic media crystallized in stories about deepfakes like The Capture, even as real-life media stars enjoyed playing with the possibilities. Fully virtual beings with synthetic visuals, voices, and personalities are quickly becoming a trend too, aided by celebrities like Howie Mandel and set to become massive entertainment engines thanks to Disney working with companies like Inworld AI.