Nvidia Upgrades Speech AI to Pursue Enterprise Ambitions
Nvidia has unveiled a revamped slate of AI services, including a new version of the Riva speech AI SDK. Riva 2.0 upgrades the speech recognition and text-to-speech models and arrives in tandem with Riva Enterprise for companies looking to deploy the software on a larger scale.
The new version of Riva adds and improves the features of the original iteration. The automatic speech recognition functions in seven languages and offers domain-specific customization. The software package also includes the neural-based text-to-speech service for creating lifelike human voices. Introduced as Riva Custom Voice in November, it’s designed to give companies an AI-powered “brand voice” for their marketing and customer relations. The improved version has been trained with voice actors so the AI sounds even more like a real person. There’s been a rise in demand for speech AI that Riva and related products have been developed to service, especially on a larger scale, according to Nvidia.
“We see a huge range of speech applications and hundreds of thousands of companies interested,” Nvidia’s head of AI and deep learning product marketing Siddharth Sharma told Voicebot in an interview. “That has spurred Nvidia over the last couple of years to invest in speech and speech applications.”
Sharma pointed to companies like Ring Central, which uses Riva for speech recognition and transcription as one enterprise example. Social media developer Snap also uses Riva, specifically for its developer platform, as a tool to create speech-based experiences. That’s notable since for user experiences like automated video captioning, Snap uses SoundHound’s AI platform.
“On Snapchat, our community plays with Lenses over 6 billion times per day,” Snap head of conversational AI Alan Bekker said in a statement. “Snap is using Nvidia Riva to optimize our AI-based speech capabilities and offer them to Lens Studio creators to build a new generation of compelling AR experiences.”
The Riva news was part of a massive blitz of new and updated AI announcements from Nvidia at its GTC event this year. The company showcased a new version of its NeMo Megatron framework for training large language models using trillions of parameters. The models are used for building conversational AI, recommenders, and genomics tools. The new Megatron framework has a considerable history at Nvidia, including the Bio-Megatron medical speech transcription service, a healthcare-focused cousin to the NeMo Megatron. Nvidia’s audio AI is also being applied with the new version of its Maxine SDK, which enhances real-time communications with AI-powered audio resolution and echo cancellation. Riva and its enterprise option are clearly the centerpieces for Nvidia’s AI work, however. Sharma outlined how it will become a pipeline for new languages and other features in the near future, along with ongoing accuracy improvements. The ability to operate within other platforms, both in the cloud and on devices further cements its utility.
“Over the last three years, the models have become four times more accurate. Riva can run on any cloud, [while] others might be more restrictive,” Sharma said. “[It’s] more attractive to maintain flexibility and deploy on any cloud.”