Facebook Makes Multilingual Speech Recognition Model Open Source
Facebook AI has released a massive speech recognition database and training tool called Multilingual LibriSpeech (MLS) as an open-source data set. MLS combines more than 50,000 hours of audio in eight languages from public domain audiobooks with pre-trained language models and other data useful for automatic speech recognition development.
Read and Speak
MLS draws its audio data from the LibriVox project of audiobooks, using the LibriSpeech automatic speech recognition system as a foundation and widening it to include more English audio and speakers in German, Dutch, French, Spanish, Italian, Portuguese, and Polish as well. Because the project is public domain, there are no licensing issues involved. Facebook organizes this data and augments it with language models, training data, and baselines for comparing speech recognition models. The data set’s English segment is itself about 47 times larger than the original LibriSpeech, which held 1,000 hours of recorded audiobooks. According to the researchers, a model trained on MLS in English improves on LibriSpeech’s error rate by 20%.
“Open data sets and benchmarks have been key drivers of recent advances across AI. MLS provides a valuable resource for research in large-scale training of ASR systems,” the researchers behind the project explained in a blog post. “While there are data sets and benchmarks for non-English languages, they are often relatively small or scattered around different places and rarely available under an open, permissive license. We believe that by providing a large multilingual data set with a nonrestrictive license and establishing a common benchmark, MLS will promote open and collaborative research in multilingual ASR and improve speech recognition systems in more languages around the world.”
Facebook Open Source
MLS is Facebook’s latest open-source AI project. In October, Facebook released a language translation model that could shift between any two of 100 languages called M2M-100. Notably, it was built without using English, making it unique among machine learning models yet able to outperform standard, English-derived rivals, according to Facebook. M2M-100 and MLS are both likely to benefit from Facebook’s ASR model from last summer that is capable of understanding 51 languages, built on more than 16,000 hours of voice recordings. The goal is to make it possible for voice assistants to grasp both what someone is saying and what language they are speaking and to do so reliably no matter who is speaking. All of these AI projects could make it simpler for any independent developer to do just that.