Alexa Expands Emotion and Speaking Style Options to New Accents and Languages
Amazon has extended Alexa’s emotional range and stylistic speaking ability to new accents and languages a year after first introducing the feature for its default United States English accent. Voice app developers in the right countries can now program the voice assistant to speak with a more nuanced one that fits with how a human speaker would say the same thing.
Alexa usually speaks in a very calm, neutral tone, regardless of the language or accent. The new models use Amazon’s neural text-to-speech (NTTS) technology to incorporate human nuances of feeling and occasion. The AI analyzes how humans speak when feeling the appropriate emotion and generates a variation of Alexa’s synthetic voice to mimic it.
“Speaking styles enable you to tailor Alexa’s voice to the respective content being delivered such as news, music, conversations, and long-form content,” Amazon explained in a blog post. “With emotions, you can have Alexa respond in a happy/excited tone when a customer answers a trivia question correctly or wins a game. Similarly, you can have Alexa respond in a disappointed/empathic tone when the weather forecast is gloomy.”
The new options include a new conversational style for American English accents, as well as Japanese and Italian. Alexa can also talk like a radio DJ with a Canadian or British English accent and impart new emotion to conversations in Japanese and British English. When the music speaking style debuted last year, Amazon claimed it made the voice assistant sound 84% more natural, and presumably, it has a similar effect in Canadian or British English. As with the U.S. English version, Alexa’s emotional choices include three levels of excitement or disappointment to infuse the voice.
Though artificial emotions for artificial intelligence might seem automatically awkward, giving Alexa feelings actually ups overall customer satisfaction by 30%. That fits with Voicebot’s collected data, which shows that people prefer human to synthetic voices and remember them better. Voicebot’s survey found that people remember a website or other call-to-action correctly in an audio clip from a synthetic voice only 13.4% of the time. That’s less than half of the 32.5% of people who correctly recalled it in a clip of similar length spoken by a human.
Notably, the additional accents and languages don’t include the newsreader style that Amazon offered in the feature’s debut last year. Though rated as 31% more natural sounding than the baseline Alexa voice, Amazon replaced it with the new conversational style instead, including for U.S. English. The conversational style goes hand-in-hand with the long-form speaking style Amazon added to Alexa in April. The long-form model is supposed to make Alexa converse like a human when speaking for more than a few sentences. At the time, Amazon described it as useful for when Alexa reads long articles online or tells stories during a voice game. Combined with the warmer tone of the conversational style, Alexa could theoretically carry on long conversations with only minimal indications that it’s an AI speaking.