Google Demonstrates New Speech Recognition Model for Mobile Use
Google researchers have designed an improved voice recognition system that can identify and isolate a specific speaker in real-time when multiple people are talking by comparing it to an existing voice sample. The new VoiceFilter-Lite model slims down the existing VoiceFilter platform, according to a new academic paper, speeding up the process of speech recognition while reducing the amount of computing power necessary.
Listen and Filter
When VoiceFilter came out in 2018, it enabled Google Assistant to record and mark and remember a person’s voice allowing for more personalization. But, it’s not a perfect feature, with mismatching voices a possibility and limits to how widely it could be integrated into mobile devices because of the amount of power the feature used. VoiceFilter-Lite solves that power issue with real-time filtering of the sounds not made by the account holder’s voice before processing the audio, as opposed to processing the audio before cleaning it up. This shrinks the model significantly and makes it suitable for devices that aren’t plugged into an outlet while improving how well the AI understands what the user is saying.
“Importantly, this model can be easily integrated with existing on-device speech recognition applications, allowing the user to access voice assistive features under boisterous conditions even if an internet connection is unavailable,” the researchers explained. “Our experiments show that a 2.2MB VoiceFilter-Lite model provides a 25.1% improvement to the word error rate (WER) on overlapping speech.”
Keeping the process on the device would be a natural expansion of the way Google has been expanding Voice Match. Google started adding the feature to any device with Google Assistant in June, not long after enhancing the setup process to make it more secure and accurate. The first version of the feature only required saying “Hey, Google” a few times to generate a voice model. Now, it takes four sentences that work as potential commands. The improved Voice Match profile no longer needs the cloud, so Google Assistant on a device can identify a user even when there’s no internet access, just like the mobile-friendly model described in the new paper.
Noise and Voice
Google and Amazon both look to third-party creations as well as their own research to improve the voice recognition capabilities of their voice assistants, especially on mobile devices. That’s why Amazon qualifies products like DSP Concepts’ noise-filtering software TalkTo for Alexa built-in devices. Manufacturers can more easily add Alexa to their products and ensure they are at Amazon’s standard to filter sounds and make sure Alexa can understand what someone is saying.
Helping voice assistants isn’t the only reason to include improved voice recognition models into AI platforms. Video and audio communications platforms can be difficult to use when not seated in a quiet area or connected by a mobile device. That’s been especially true during the ongoing COVID-19 health crisis when lockdowns keep many people home together doing disparate, sometimes loud activities at the same time. Every conference call platform has to address it in some way. Amazon created the Voice Focus feature for its Chime platform to spot and counter irrelevant sounds during meetings. Similarly, the De-Noiser feature for Google Meet is designed to suppress sounds users didn’t want to hear by teaching an AI to distinguish voices from other audio input. It may also help explain Apple’s acquisition of edge-based artificial intelligence startup Xnor.ai in January for a reported $200 million.