Audio Analytic’s Sound Recognition Lands in Iliad’s Freebox Delta
Sound Recognition vs Speech Recognition
When we deal with all the sounds in the universe – babies crying, dogs barking, windows being broken – none are governed by language models. You need very different types of artificial intelligence to deal with these and it’s a very new field.
Audio Analytic breaks down the difference between speech and sound recognition on their website, pointing out that speech recognition is limited by the types of sounds that the human mouth can make. And while exhaustive, these sounds can be extensively mapped due to the conditioning of the communicative structure languages hold. In a similar fashion, music is conditioned by rules of music genres. Because machines were originally designed to process repeatable tasks, teaching a machine how to recognize speech or music greatly benefits from the predefined rules and prior knowledge of speech and music.
Sound is a different story. It is capable of being much more diverse and unstructured than speech or music. Take for example the sound of a window being smashed. There is an incredible number of different ways that glass shards can hit the floor and that the objects being used can hit the window. There is a tremendous amount of variability.
Audio Analytic Offers Sound Recognition Beyond Speech and Music
Audio Analytic, started in 2010, is described by Bloomberg as “An AI start-up like no other…a Shazam for real-world sounds.” In a pursuit to develop sound recognition, Audio Analytic developed an expertise by collecting in-house sound data. The sound recognition technology that Audio Analytic offers is a unique focus, first venturing into the connected home market with products including a light bulb that detects the sound of windows being smashed.
The company has also worked on home security cameras and speakers, even headphones to embed listening software to alert pedestrians and cyclists to hazards such as an emergency vehicle. The software can be integrated into self-driving cars and health devices that listen for a patients’ audible symptoms, for example, coughs and sneezes are in development right now. Audio Analytic develops its software by first deciding the sounds it wants to be able to recognize. Six common sounds are currently licensed to clients, and fifteen more are in development.
Amazon recently filed a patent application related to identifying sounds from Alexa users that indicate illness such as a cough. However, Mitchell told Voicebot in a telephone interview that he is skeptical of the application design. The sounds of coughs and other audible indicators of illness are not always present after a wake word such as Alexa is spoken. For the Alexa application to work, a user would have to cough after waking Alexa and starting to address the device with a request. By contrast, Audio Analytic’s solution could run continuously in a health monitoring device and identify coughs and sound that are far more likely to be made at times when not addressing a smart speaker.
Broadening the Voice Assistant Horizon
It is easy to see the potential in sound recognition. For example, after recognizing that a baby is crying, a voice assistant could notify parents and begin playing Twinkle, Twinkle, Little Star. When the sound of a window breaking is recognized, Dr. Mitchell says a voice assistant could announce that it will “turn on lights, play the loud alarm, notify owners, and set lasers to stun.”
When a device responds to these sounds, they suddenly take on less inanimate properties. They tend to [feel] more caring, more understanding…more of the things you want from modern consumer electronics.
Speech is a category of sound, and artificial intelligence models are required for anything that includes sound. Sound recognition can be used to enhance context detection in a noisy room, detect a user that is stressed or ill, and detect whether the windows are open or closed. The added value of sound recognition is that it provides an opportunity to customize an interaction based on situational context. Dr. Mitchell commented:
We do this so context is delivered continuously and it is all captured on the device so no privacy issues. We provide the ability to tell the device what sound has happened. The audio does not need to leave your house for means of classification.