Synthesia and ElevenLabs Team Up to Augment Deepfake Videos With Generative AI Voice Models
Synthetic media startup Synthesia has teamed up with generative AI voice platform ElevenLabs to augment its deepfake video hosts with voice clones and AI-produced audio. Synthesia uses generative AI to produce and realistically mimic people’s appearances, which ElevenLabs’ artificial audio can further enhance.
Synthesia ElevenLabs
Synthesia’s platform employs AI to analyze real people in videos and then generate simulacra or entirely imaginary humans to perform from a script, with background details and other elements fine-tuned by the user. Though that has included speech and other audio, the partnership will see Synthesia integrate ElevenLabs’ more advanced generative voice creation models for businesses subscribed to its Enterprise plan. Synthesia users will then be able to produce high-quality, natural-sounding vocals for their video projects. The process will also be faster and more efficient than the standard options.
“This collaboration represents a step forward in our ongoing efforts to add more innovative features to our generative AI video platform, and an example of how technology companies in the United Kingdom are pioneering adoption of responsible generative AI to unlock business value for businesses around the world,” Synthesia head of product Guillaume Boniface-Chang explained in a blog post. “A custom avatar’s voice, including cloned ones, will also be able to speak multiple languages, which has been a frequently requested feature from businesses wishing to use the Synthesia platform to localize their content for a global audience.”
Boniface-Chang noted that the collaboration will also provide users the chance to enhance their custom avatars to perform in multiple languages. The idea is to help businesses seeking to localize their content for international audiences, which has been a frequent request from Synthesia’s clients.
Synthesia also highlighted how the company will address the enhanced potential for deepfakes pretending to be real people with less-than-benevolent intentions. Customers looking to create custom avatars and use voice cloning must align with Synthesia’s rules around consent and control. This means that avatars can only be created with the explicit permission of the individuals whose likenesses are used, and users will retain control over how their avatars are utilized. Additionally, Synthesia promises to address any request to remove user data and likenesses from its databases.
It’s not just a theoretical concern. ElevenLabs’ tech is widely in demand for its fidelity to real voices, including being used in speeches from prison by Pakistan’s former Prime Minister Imran Khan, who employed ElevenLabs in a victory speech as well as during the campaign. And robocalls to New Hampshire voters earlier this year used ElevenLabs to make a deepfake version of President Biden in an attempt to suppress turnout in the state’s primary election, which is against ElevenLabs’ own rules. An investigation traced the calls back to a telecom provider, Lingo, which transmitted them on behalf of Life Corporation. The FCC issued a cease and desist over it, followed by an outright ban on deepfake robocalls. That’s led to a similar rush to come up with deepfake detectors by companies like Pindrop as well as internal detectors from ElevenLabs and others.
Follow @voicebotaiFollow @erichschwartz
ElevenLabs Raises $80M And Shares Generative AI Voice Models, Tools and Deepfake Voice Marketplace
Pindrop Launches Real-Time Audio Deepfake Detection Tool Pindrop Pulse