How Loudspeakers Can Hijack Voice Assistants With Disguised Audio Commands
Any ordinary audio file can send secret commands to voice assistants over standard speakers without humans noticing, according to a new paper published by the Horst Görtz Institute for IT Security at Ruhr-Universität Bochum. That speech recognition software can detect and respond to these hidden commands is a potential security problem that developers will need to address.
Secret Code in Open Air
The researchers first demonstrated that they could break down a message and hide it from human ears within an audio file last year. Those hidden commands had to be transmitted directly as data in order for the software to process them. Now, any speaker playing the altered audio can successfully direct secret commands over the air. The altered audio sounds just slightly distorted to human ears, as in the example video at the top. But, the open-source speech recognition system Kaldi used for the experiment and integrated into the Amazon Alexa voice assistant hears and understands the message underneath.
The researchers hid the message by applying psychoacoustics, how people perceive and understand what they hear. The human ear and brain are capable of amazing feats, but they have limits that the altered message exploits. When processing sounds at certain frequencies, there are a few milliseconds where the ears ignore low-volume sounds. Machines lack that limitation. If the secret command plays at those frequencies at the right moments in an audio file, people hear only a little static, while the software hears an altered message. The main audio is irrelevant and could be natural sounds or an orchestra as easily as a human voice. The effect is the same.
On top of incorporating all of the psychoacoustic elements into the altered audio, the researchers needed to consider how space shapes sound to keep the message intact when broadcast. They developed programs that adjust the audio to work in specific rooms before coming up with a way to pass along the secret message over the air regardless of the shape of the space where the loudspeaker is placed.
“Compared to prior work on this topic, which used a fixed setup only, our approach takes the characteristics of the room and the position of the microphone and the loudspeaker into account,” the researchers wrote in the paper. “[W]e can create robust adversarial examples, which can be played over the air. The examples can be tailored to specific rooms, but also work, if a more general setup is used or the room situation does change.”
Hidden Audio Attacks
Describing the hidden messages as adversarial alludes to the way they are a lot like a computer virus. A malicious command slipped into a program and going unnoticed by human users until it is too late describes perfectly how hackers might use this technology. Ordering a voice assistant to send private information to a hacker or committing fraud through a voice app are both feasible crimes with the correct ‘static.’
This vulnerability could be significantly worse than previously discovered voice assistant security gaps. The hack that allowed Alexa developers to transcribe audio spoken around a smart speaker for an extra 16 seconds was quickly closed, while the DolphinAttack, which showed that commands in ultrasonic frequencies could activate and partially control voice assistants, only works when the attacker is already close to the device. including Siri, Alexa, and Google Assistant.
To combat the security problem of the messages hidden beyond human hearing, the researchers developed potential countermeasures in tandem with the audio manipulation research. One potential defense is to have the voice assistant turn all of the audio it hears into an MP3. Turning an audio file into an MP3 deletes everything beyond the frequencies most humans can hear. When the researchers compressed the manipulated audio into an MP3, the speech recognition system could no longer understand the hidden code. The only way to transmit the secret message as an MP3 is to fit it into the narrower range of frequencies used by the format. When the hidden message is crammed into an MP3, however, the ‘static’ in the background is noticeable to the point that anyone listening would know something is off about the audio. Imposing the limits of human hearing on machines might actually be the best way to free voice assistants from hidden orders.