Microsoft Adds New Speech Styles and ‘Lyrical’ Emotion to Azure AI Toolkit
Microsoft is adding new voice styles and emotional variations to the neural text-to-speech (TTS) feature of Azure Cognitive Services. The software tools for voice app and service developers now include newscast, customer service, and digital assistant voices, with emotional approach options depending on the language used.
Developers can apply any of the three new voice options to either English or Chinese-speaking AIs. According to Microsoft, the names reflect what they might be used for fairly accurately. The newscast style might be good for reading articles in a way that mimics how a news anchor might speak, for instance. The customer support option tries for a friendlier tone that a business might want an employee to use for incoming calls. The digital assistant style comes in two forms, with formal and casual variants applicable for wherever a client wants to include them.
“With the new styles—newscast, customer service, and digital assistant—developers can tailor the voice of their apps and services to fit their brand or unique scenario,” Microsoft explained in a blog post announcing the new choices. “Built on a powerful base model, our neural TTS voices are very natural, reliable, and expressive. Through transfer learning, the neural TTS model can learn different speaking styles from various speakers, enabling nuanced voices.”
The custom emotional options also aim to make the AI voice sounds more human in a way appropriate to their use. For the Chinese voice, there’s now a ‘lyrical’ option, which Microsoft claims sounds heartfelt and can handle prose or poetry. The English voice’s emotions are a little more straightforward and can speak either empathetically or cheerfully. Microsoft also gave its Brazilian Portuguese voice the cheerful emotional option.
The new features fit into Microsoft’s focus on enterprise voice AI. The company has almost finished shifting away from the consumer side of things, ending most of its Cortana voice assistant’s services. The Cortana app for Android and iOS is also now shuttered everywhere except the U.S., where its usefulness for experimentation and support for the Surface Headphones is keeping it alive. Cortana now acts as an aspect of Office 365, but Microsoft is investing in bringing its voice AI tech to other companies who want to use a voice platform to interact with customers.
Microsoft’s new features will need to compete with the already rapidly expanding menu of options offered by Google and Amazon. Both companies have created ways for developers to design the voice they want customers to hear on Google Assistant and Alexa, respectively. Amazon recently jumped into the ring with Brand Voice, a service offering several voices and emotional ranges and works with brands to make signature voices like a Colonel Sanders voice for KFC. Amazon has also been showcasing what it can do for custom voices by developing the option for Samuel L. Jackson to voice Alexa.
Google’s TTS service has several dozen voice options, both standard and synthesized, using its WaveNet technology. The company has also been expanding its voice options in languages besides American English, including German, Korean, Italian, and both British and Indian English. Google Assistant has often been at the forefront of understanding spoken accents, so improving the variety of voices in different languages could lead to even more regional variation.