Amazon Upgrades Alexa’s Long-Form Speech and Vastly Expands Custom Voice Languages and Styles
Amazon is expanding Alexa’s voices and speech style choices for voice app developers. Text-to-speech service Polly has more than a dozen new voices and styles from which to choose, and developers can adjust the voice assistant to sound more natural when speaking for more than a few sentences.
The Alexa Monologues
A lot of interaction with Alexa involves short responses or rote lines. That starts to sound strange when the voice assistant speaks for more than a few seconds. Alexa’s new long-form speaking stye is designed to address that disconnect and make using Alexa feel as comfortable as talking to another human. Since people don’t speak the same way when uttering a sentence as they do when expounding for multiple paragraphs, the addition is likely to be popular with voice apps that read magazines, books, or transcribed conversations from a podcast out loud. For now, this style is only an option for Alexa in the United States.
“For example, you can use this speaking style for customers who want to have the content on a web page read to them or listen to a storytelling section in a game,” Alexa developer Catherine Gao explained in Amazon’s blog post about the new feature. “Powered by a deep-learning text-to-speech model, the long-form speaking style enables Alexa to speak with more natural pauses while going from one paragraph to the next or even from one dialog to another between different characters.”
Polly’s Vocal Range
Polly’s menu of voices built with neural text-to-speech (NTTS) technology is now extensive enough that Alexa might need the long-form style just to read it out. The Amazon Web Services (AWS) product already offers several voices, but some of them now offer extra style choices. Depending on what Alexa Skill the developers are creating, they can adjust the Matthew and Joanna voices to speak English conversationally or to sound like a newscaster. The newscaster style of speech is also an option for the Spanish Lupe voice. The new styles only apply only to those voices, but Polly is also adding ten entirely new voices in six new languages. Alexa Skills can now be programmed to speak Italian, Canadian French, Brazilian Portuguese, and Spanish in as spoken in Spain, Mexico, and the U.S. The Portuguese and U.S. form of Spanish include a few options each.
The new Polly voices fit into Amazon’s recent focus on attracting new clients for the product. In February, Polly introduced Brand Voice, which connects clients with Amazon engineers to generate a custom voice for their branded Alexa skill using a spokesperson’s recordings. The feature debuted with voices for an Australian bank and a facsimile of Colonel Sanders for KFC Canada. That’s on top of Amazon’s consumer-facing use of the tech to create the Samuel L. Jackson voice for Alexa.
Amazon is not the only company moving to add more choices for designing voice assistants. The timing for Amazon’s announcement is notable, coming only two weeks after Microsoft launched a bunch of new voices and emotional styles to Azure Cognitive Services. Azure-based voice apps gained newscast, customer service, and digital assistant styles of speech, along with new English and Chinese voices. Like with Polly, the variants are for developers who want to give their voice app an appropriate tone when they interact with customers. As Microsoft narrows its voice AI interests to enterprise services, Amazon appears to be working to maintain a competitive edge with its new options.