Resemble AI Creates Synthetic Audio Watermark to Tag Deepfake Speech
Synthetic speech and voice cloning startup Resemble AI has introduced an “audio watermark” to tag AI-generated speech without compromising sound quality. The new PerTh Perceptual Threshold) Watermarker embeds the sonic signature of Resemble’s synthetic media engine into a recording to mark its AI origin regardless of future audio manipulation, yet subtle enough that no human can hear it.
Visual watermarking hides one image within another, invisible without a computer scanner in the case of particularly high-security documents. The same principle applies to audio watermarks, except it’s a very soft sound that people won’t notice but encoded with information that a computer could decipher. The concept isn’t new, but Resemble has leveraged its audio AI to make PerTh more reliable without compromising the realism of its synthetic speech creation.
Quiet sounds can be obliterated easily in most cases, but Resemble figured out a way to hide its identification tones within the sounds of speech. As people talking is the point of Resemble’s services, the audio watermark is much more likely to come through an edit unscathed. Resemble takes advantage of how humans tend to focus on specific frequencies and how louder sounds can hide quieter noises that are close in frequency. The combination masks and protects the watermark sound from humans noticing or being able to extract the audio watermark. Resemble’s machine learning model can determine where to embed the quiet sonic tag, generate the appropriate sound, and put it in place. The diagram below illustrates how the watermark hides in plain sight, or sound in this case.
“PerTh is built for protecting synthetic voices from data manipulation. It embeds imperceptible data into the speech and provides a way to verify genuine content,” Resemble AI CEO Zohaib Ahmed explained on Twitter. “The Watermarker has been tested against various “attacks,” such as resampling, re-encoding, adding audible noise, and applying time-stretching and time-shifting. The watermark resists such attacks and maintains a nearly 100% data recovery rate. This transparency has no effect on customer experience as the watermark is embedded in an imperceptible manner. We’ll be rolling this out to all users shortly and also open sourcing the model for the community!”
Resemble AI’s speech software for replicating or synthesizing voices has expanded quickly since it launched in 2019. Improved technology has led to better-sounding voices from smaller recording samples, and the company has branched into applying its voice cloning for translation. The company’s ambitions for the entertainment industry have borne fruit as well. Last year’s Netflix documentary, The Andy Warhol Diaries, included Resmble’s AI-generated voice of the artist reading excerpts from his memoir for the film.