Amazon ReMars 2022

Alexa Deepfakes Deceased Grandmother’s Voice to Read to a Child for Feature Preview


Alexa head scientist Rohit Prasad showcased an upcoming synthetic speech feature at re:Mars conference with a video of a child having Alexa read a story to him in the voice of his grandmother, who had passed away. The voice clone only requires a minute of audio compared to the hours of the industry standard, according to Prasad, opening the door to allowing people to produce their own personal voice for Alexa.

Custom Alexa Voices

The video demonstrates how a child could prompt Alexa to read a story in a unique voice. He asks the voice assistant, “if grandma could finish reading me The Wizard of Oz.” Alexa presumably had bookmarked where in the book the child had reached and understood the phrasing well enough to activate the deepfake version of the woman’s voice to begin reading. It’s not dissimilar to how AWSproduces custom Alexa voices for brands with Amazon Polly to give KFC’s Alexa skill the voice of Colonel Sanders. The difference in time and resources spent and the interested audience for the specific voices is pretty wide, however. The feature is also a lot like the feature in Japanese toy maker Takara Tomy’s new Coemo smart speaker for children. The Coemo can replicate voices so that parents who can’t be home can still read stories to their kids from afar, though Takara Tomy is pitching the device more for homes where both parents work, not to listen to those who have passed away. The demonstration finished Prasad’s talk with some metaphorical fireworks, but the feature doesn’t even have a name, let alone a timeline for rollout as of yet. Still, Amazon’s result is impressive, especially if it only took a minute of audio to generate the audio algorithm.

“This required inventions where we had to learn to produce a high-quality voice with less than a minute of recording versus hours of recording in the studio,” Prasad said. “The way we made it happen is by framing the problem as a voice conversion task and not a speech generation path.  “We are unquestionably living in the golden era of AI, where our dreams and science fictions are becoming a reality.

Past Voices

Channeling the voices of people no longer with us is an obvious consideration when voice replication has become both increasingly powerful and cheap. Amazon’s experiment is most reminiscent of Hereafter AI, the startup which produces a conversational AI chatbot of a customer’s deceased loved one using a wide range of information and records and then applying to a chatbot platform.

Synthetic voices of those who have died are mushrooming in art and entertainment as well. For instance, Netflix’s documentary, The Andy Warhol Diaries, includes the digitally produced voice of the artist provided by voice cloning startup Resemble AI. The startup synthesized Warhol’s voice to perform excerpts from his memoir but avoided the controversy surrounding a similar move in last year’s Anthony Bourdain documentary Roadrunner by securing the Andy Warhol Foundation’s approval first. The tech is also increasingly combined with visual avatars so that someone could converse with Albert Einstein about his life and work or talk to (still living but preparing for the future) William Shatner about Star Trek.

“Human attributes of empathy and affect are key for building trust,” Prasad said. “They have become even more important in these times of the ongoing pandemic when so many of us have lost someone we love. While AI can’t eliminate that pain of loss, it can definitely make their memories last.”

  

Amazon is Helping Brands Create Custom Alexa Voices Starting with KFC

New Japanese Toy Synthesizes Parent Voices to Read Stories to Kids

‘Digital Einstein’ Virtual Human Celebrates Nobel Prize Centennial