Nvidia Invests $1.5M in Mozilla Common Voice Open-Source Project
Nvidia is partnering with Mozilla to advance voice AI and speech recognition, beginning with a $1.5 million investment in the Mozilla Common Voice program. The open-source database of cataloged and transcribed voice recordings supports those developing voice tech who didn’t have access to the tech giants’ proprietary data.
Uncommonly Common Access
Mozilla Common Voice began in 2017 as an alternative to major corporations for finding the huge amount of data needed to train a voice AI. Common Voice also formed as a potential resource for making voice assistants more accessible to the people whose languages and speech habits may be difficult for the more widely used variants like Google Assistant and Alexa assistants to understand. And the database continues to grow at a rapid pace. The 1,400 hours in 18 languages sorted in February 2019 grew to 7,226 hours in 54 languages, as Voicebot reported last summer when Mozilla released a massive data set of voices and now stands at more than 9,000 hours of 60 different languages, the largest public domain voice dataset in the world. Mozilla said it plans to use the new money to further expand the database and staff list while making it easier for people to donate voice recordings to the project. Part of that growth includes moving Common Voice more fully under the Mozilla Foundation, contributing to the group of projects working on more trustworthy AI.
“Language is a powerful part of who we are, and people, not profit-making companies, are the right guardians of how language appears in our digital lives,” Mozilla Foundation executive director Mark Surman said. “By making it easy to donate voice data, Common Voice empowers people to play a direct role in creating technology that helps rather than harms humanity. Mozilla and NVIDIA both see voice as a prime opportunity where people can take back control of technology and unlock its full potential.”
The investment isn’t a lot of money by Mozilla and Nvidia standards, but it does serve as a starting point for a larger arrangement between the two organizations. Mozilla announced the partnership in tandem with Nvidia’s general release of its conversational AI framework Jarvis. First released for testing almost a year ago, Jarvis uses Nvidia’s GPU hardware to create pre-trained model AIs for its interactive experiences. It can understand and translate speech, shifting between text and spoken words to the point that it can transform an audio recording into a virtual human face or other 3D animated image and mimic the mouth movements someone saying those words would make. The potential benefits of Common Voice to Jarvis and vice-versa are very clear.
“The demand for conversational AI is growing, with chatbots and virtual assistants impacting nearly every industry,” Nvidia senior director of accelerated computing product management Kari Briski said in a statement. “With Common Voice’s large and open datasets, we’re able to develop pre-trained models and offer them back to the community for free. Together, we’re working toward a shared goal of supporting and building communities — particularly for under-resourced and under-served languages.”
Worth noting is that Mozilla always presented Common Voice as a companion to its DeepSpeech toolkit of voice and text models. With the new partnership, DeepSpeech appears to be somewhat redundant to Mozilla, which has decided to stop developing it and just be an advisor for the product, as first reported by VentureBeat. The company won’t try to take down the open-source software but is instead planning to provide grants to developers who come up with particularly great applications for DeepSpeech. Mozilla sees its voice AI destiny in Nvidia instead.
“We launched Common Voice to teach machines how real people speak in their unique languages, accents and speech patterns,” Surman said. “Nvidia and Mozilla have a common vision of democratizing voice technology — and ensuring that it reflects the rich diversity of people and voices that make up the internet.”