Microsoft Debuts Synthetic Speech Platform ‘Custom Neural Voice’
Microsoft has unveiled the Custom Neural Voice service and begun accepting applications from customers interested in a synthetic voice built on the neural text-to-speech (TTS) feature of Azure Cognitive Services. Custom Neural Voice is already in use by brands like Warner Bros. and Progressive, but any company can now seek to build their own artificial voice actors.
Custom Neural Voice applies machine learning models to reproduce the prosody of a human voice, the duration, pitch, and tone of each sound. Actors record scripts that the models translate into a digital sound file that it can use to predict how that voice would make other kinds of sounds then integrates those predictions into new scripts. The more natural sound of the artificial voice comes from the AI projecting prosody while generating the voice, according to Microsoft. Microsoft is retaining approval rights for Custom Neural Voice partly to prevent potential abuse of the tech. Voice Actors have to agree that they understand what will be done with their voice and the recording is compared to a synthetic version to make sure the real actor is agreeing. And customers have to abide by a code of conduct, including telling customers explicitly that it is a synthetic voice. The next step will be some kind of digital marker built into the synthetic voice itself so that there’s never any confusion.
There are a number of ways high-quality synthetic voices might be used by a brand. Training new hires, audio advertising, and customer service by phone are all likely to take advantage of this technology. Microsoft highlighted some of the ways the tech is being applied already in its announcement. The AT&T Experience Store in Dallas now hosts an interactive Bugs Bunny that people can converse with and hear respond in the rabbit’s voice, while Progressive has synthesized the voice of Flo, its long-time spokesperson, to create a vocal chatbot on Facebook Messenger that interacts with customers and create the illusion of a personal conversation with the face and, more to the point, voice, of the insurance company. Language education app Duolingo, meanwhile, has augmented its curriculum with an entire cast of synthetically voice characters as a way to make learning languages more engaging and fun. The BBC uses the same technology to create the voice of its evolving Beeb voice assistant.
“One of the things we hear from our customers is they like the idea of communicating with their customers through speech,” Azure AI corporate vice president Eric Boyd, said in a statement. “Speech has been very robotic over the years. Neural voice is a big leap forward to make it sound really natural.”
Microsoft has a strong position in synthetic speech but is far from alone in the field. Amazon applied its neural text-to-speech technology to make Samuel L. Jackson an alternate voice for Alexa, while its Brand Voice service enables things like a Colonol Sanders voice for a KFC Alexa Skill. Google’s WaveNet offers similar functions to its clients. Custom voices don’t even need new recordings and it’s becoming easier and cheaper than ever. VocaliD, which began as a voice prosthesis developer, but now offers synthetic voices for call centers and voice apps. Voice cloning startups like Replica Studios and Resemble AI, which recently debuted a home tool for generating an artificial voice from recorded audio, are also crowding in. There are even specialized options like car AI developer Cerence’s platform for people to record themselves or someone else to be the voice for the AI in their vehicle.
“One of the really exciting things about AI is that we’re constantly surprised by the ways that you can use it that are way beyond what we originally envisioned,” Boyd said. “It’s just really exciting to see what people can do with it.”