New Nvidia Riva Synthesizes Custom AI Brand Voices
Nvidia revealed a new synthetic voice generator designed to produce a realistic-sounding human voice from as little as half an hour’s worth of recorded audio speech at this week’s 2021 GPU Technology Conference (GTC). The Riva Custom Voice offers a set of tools for companies looking to add an AI-powered “brand voice” to their marketing, customer relations, and other services.
Riva Custom Voice is part of the larger Riva speech AI software development kit (SDK), which includes speech recognition and text-to-speech functions, scalable to an enterprise level. Nvidia boasted of more than a quarter-million downloads for its conversational AI software since it debuted three years ago and described the Custom Voice facet as the next stage of its evolution. Riva’s most significant selling point for brands is how fast it is at producing a custom voice and from relatively little data. Nvidia laid out how Riva Custom Voice could support a branded virtual assistant with its own unique voice. The voice AI could then be integrated into call centers to talk to customers in half a dozen languages or augment apps for those with hearing or other disabilities. A car company could use it to give a unique voice to the Nvidia-built virtual assistant that autonomously parks cars. It’s also applicable to the virtual humans in Nvidia’s new Omniverse platform. Avatars for companies could rely on it to speak to human visitors much as the AI would talk to them on a phone call.
“Human-like interactions have long been one of the greatest challenges of artificial intelligence, especially for companies with industry-specific jargon,” Nvidia vice president of product management for AI software Kari Briski said. “Now these companies can use speech AI to listen and respond to customers with an expressive voice that’s unique to their brand and that drives more engaging and delightful interactions.”
The latest Riva tool relies on previous improvements Nvidia has showcased over the last year. More natural-sounding speech already arrived when Nvidia debuted its RAD-TTS model, which lets users train a voice AI to speak like them by mimicking how they talk. The AI learns how to speak from audio recordings, eventually learning to match the intonation and rhythm of the human while reading a new script. The model even won a National Association of Broadcasters competition for its realism. Nvidia’s pride in its synthetic voice production led the company to create an entirely virtual version of CEO Jensen Huang to give a small part, 14 seconds, of a major speech at its conference earlier this year. Nvidia’s eye on the enterprise aspect and connection with its many other products help it stand out, but using synthetic voices is becoming a very popular idea. Companies like Veritone and Lovo have been rolling out new variations on the technology, while startups are attracting big investments, including $3.6 million by Supertone and $10 million by Wellsaid. Companies are finding more ways to implement artificial voices, Nvidia is making sure it has a spot among the choices.