Amazon Brings AI-Based Noise Suppression to Chime Video Conferencing Platform
Amazon is applying artificial intelligence to solve the issue of unwanted noise during audio and video calls on its Chime platform. Amazon Voice Focus uses machine-learning to train AI to spot and counter irrelevant sounds during meetings on Chime, similar to the De-Noiser feature Google Meet added in April to suppress sounds users didn’t want to hear.
Amazon Voice Focus basically cancels out any non-speech sounds during meetings on the Chime platform. It can tag environmental and other noises and reduce their volume without affecting the speech sounds. Unlike the noise-suppression used in standard cases, Voice Focus uses AI to adjust for noises as they arise, functioning both for real-time conversations and recorded audio. Amazon submitted its creation to the Deep Noise Suppression Challenge at the Interspeech conference this year. Voice Focus came in first in the non-real-time track and second place in the real-time track in the 19-team competition. The same technology is already in use for Alexa’s announcements and Drop In Everywhere features.
Though Amazon built Voice Focus in-house, it has opened up to other developers for similar options in third-party devices that use Alexa. For instance, Amazon recently qualified TalkTo, the noise filtering software created by audio technology startup DSP Concepts, for use with Alexa built-in devices. The certification makes it easier for other manufacturers who want to have Alexa in their devices to include TalkTo as a feature, making it easier for the voice assistant to understand what people say. As Amazon decides what devices are allowed to include Alexa, the improved quality that comes with intelligent noise filtering can be a helpful boost.
Amazon launched Chime as a digital meeting platform a few years ago, but it has seen a spike in use since the COVID-19 health crisis began earlier this year. Alternatives like Zoom, Microsoft Teams, and Google Meet have all rolled out upgrades and improvements, likely in response to their numbers rising. That includes Meet’s ability to filter out noise. The AI isn’t perfectly discerning, which is why it can be turned on and off in the platform’s settings. In Google’s case, that included an inability to realize musical instruments should not be muted.
“Speech enhancement requires not only extracting the original speech from the noise and reverberation but doing so in a way that the human ear perceives as natural and pleasant. This makes automated regression testing difficult and complicates the design of deep-learning speech enhancement systems,” Amazon Web Services’ principal scientist Arvindh Krishnaswamy explained in a blog post about the feature. “The general consensus is that deep learning is finally having a profound impact on audio processing. There are still many challenges — including data augmentation, perceptually relevant loss functions, and dealing with unseen conditions — but the future looks very exciting.”