Mozilla Common Voice Launches Grant for Kiswahili Language Voice Tech
Mozilla Common Voice has begun a $400,000 grant program for developing voice technology in the Kiswahili language of East Africa. The open-source voice database organization will offer chosen projects in the region up to $50,000 to leverage the Kiswahili data for voice tech that can help people in East Africa.
Mozilla Common Voice began five years ago as an alternative to major corporations for those looking for the data needed to train a voice AI. The data encompasses a lot that the giants in voice tech might not have, such as audio of people with speech impairments and languages that the bigger voice assistants might not speak yet. About 100 million people speak Kiswahili, but it hasn’t yet been added as an option to the best-known voice assistants. The applicants for the award must have some base in Kenya, Tanzania, and the Democratic Republic of Congo (DRC), but Kiswahili speakers exist all over the world. For the grant, Mozilla Common Voice said it’s looking for agriculture and finance-related voice services, ideally, ones that help more marginalized groups and regions. The applications are due March 22.
“These grants will fuel products powered by open voice data and technology, and built with local community involvement and engagement,” Mozilla Foundation Special Advisor for Africa Innovation Mradi Chenai Chair explained. “And in the long-term, they can help shift power and improve social and economic opportunities for marginalized groups, particularly women, in Kenya, Tanzania, and Kiswahili-speaking DRC.”
The grant announcement comes almost a year after Nvidia made a $1.5 million investment in Mozilla Common Voice and nine months after a coalition of international non-profits invested $3.4 million in boosting Mozilla Common Voice’s Kiswahili efforts. The money was aimed at making Kiswahili a significant part of the project by hiring people to focus specifically on the Kiswahili database and with the outreach and technical assistance needed to streamline getting volunteers to contribute their voice samples to the database.
“Language is a powerful part of who we are, and people, not profit-making companies, are the right guardians of how language appears in our digital lives,” Chair said at the time. “By making it easy to donate voice data in Kiswahili, Common Voice will support East Africans to play a direct role in creating technology that helps rather than harms their communities. We are thrilled to join with partners who share Mozilla’s vision for helping more people in more places to access voice technology.”
The Common Voice Database is enormous already, with more than 9,000 hours of 60 different languages. It’s the largest public domain voice dataset in the world. Common Voice had also proven it could grow its datasets quickly, as it was at 7,226 hours in 54 languages in 2020 when Mozilla released a massive data set of voices. The Kiswahili grant program may set a template for other languages unrepresented by the major voice assistants as well.