Google Cloud Adds Custom Voice Feature to Synthetic Speech API
Google Cloud is giving business clients the option to train and deploy their own custom voices synthesized by its text-to-speech (TTS) API. The voice models can be trained on recordings supplied by the client and integrated anywhere the business wants a branded voice AI, such as customer service or phone-based commerce.
Custom Voice Branding
Synthesizing voices using AI enables businesses to save time and resources on call centers and other interactions with customers. But, Google’s library of voice models can’t include an ideal choice for every client wanting a voice identifiable as a kind of audio logo. Now, the clients can have the branded voice as part of the conversational AI platform, saying the lines it chooses without having to have a voice actor or spokesperson record new lines all of the time. The custom voice feature started beta testing a year and a half ago as part of the Google Cloud Contact Center AI release. Google appears to have worked out any technical or procedural issues with the feature
“It’s important for all companies to build a strong identity and brand association with their conversational AI systems and this starts with the synthetic voice,” Google Cloud speech product manager Calum Barnes explained in a blog post. “For businesses looking to build a strong brand identity, establishing a unique voice can help turn mobile app interactions or customer service based on interactive voice responses (IVR) into differentiated customer experiences. Our TTS API has included a speech synthesis service with a static list of voices for some time, but now, with Custom Voice, moving beyond these predefined options is easier than ever.”
Interested businesses can submit recordings of the voice they want to use to Google Cloud, and the AI will train a voice model that will then work with the TTS API much as it has with the Google-created options. The website offers guidance on what to submit but won’t approve the voice model until Google confirms the model doesn’t violate Google’s AI Principles and that the person recording the audio knows what their voice will be used for and approves.
Google regularly upgrades its transaction and speech synthesis services as enterprise conversational AI continues to grow. The tech giant reworked the visual interface for speech-to-text API last month to make it easier for non-technical developers to design and manage the API for their business. DeepMind’s WaveNet technology makes transforming audio into text and back feasible, and it’s for more than just customer service. For instance, Wendy’s and Google Cloud are adding AI and voice tools to the restaurant chain’s drive-thrus, so that customer voices will translate into visual text and anything the Wendy’s AI will sound like whatever voice the brand chooses, presumably the restaurant’s fictional namesake.