Generative AI Speech Startup ElevenLabs Raises $19M and Releases Deepfake Voice Detector
Synthetic voice AI startup ElevenLabs has raised $19 million in a Series A funding round co-led by Andreessen Horowitz and independent venture capitalists Nat Friedman and Daniel Gross. The year-old generative AI-powered voice startup claims to have signed up more than a million people since launching its platform and raising $2 million in January.
ElevenLabs leverages generative AI voice models to translate text into audio. The startup offers both clones of existing voices and the capacity to design new ones. All of them are built to mimic human emotion, relying on context clues to judge the mood of the piece and automatically adjusting tone and inflection to match. The startup boasts that the voices are all but identical to real humans, with under a second of latency. ElevenLabs has enticed plenty of partners with its ambitions, including audiobook publisher Storytel, digital content host TheSoul Publishing, and game companies Embark Studios and Paradox Interactive.
“Over the last five months, we’ve seen our technology embraced by millions of creators, companies, and curious minds. We are right at the beginning of this journey and now with Nat, Daniel, and Andreessen Horowitz joining, we have the best partners as we continue on the ambitious path forward,” ElevenLabs CEO Mati Staniszewski said.
The duo joining Andreeson in leading the round has clear connections to the world of AI. Friedman is the former GitHub CEO Nat Friedman, while Gross founded the Apple-acquired search engine Cue. Several other big names in tech participated, including a handful of generative AI notables like Deepmind and Inflection co-founder Mustafa Suleyman, Runway co-founder Siqi Chen, Perplexity AI co-founder Aravind Srinivas, and Vercel founder Guillermo Rauch. ElevenLabs already has more features and products in the works that the new money will support. The startup is testing an AI-dubbing tool for translating audio tracks, as well as a long-form text-to-speech platform that it claims will record entire audiobooks in minutes.
Concerns about deepfakes and synthetic media indistinguishable from real people have prompted ElevenLabs to come up with ways to know when people are hearing voices produced by its models. The initial debut of the platform didn’t have many guardrails, leading to a lot of use on 4chan and other forums, very much in violation of the terms and conditions. ElevenLabs clamped down, but the story is cemented as a meme. The new AI Speech Classifier processes audio uploaded by the user and hunts for telltale signs of synthetic media that may be too subtle for humans to hear.
Though the ElevenLabs tool is still in early testing, the idea of a synthetic speech identification tool is rapidly becoming more popular among generative AI developers. Just last week, Meta unveiled a generative AI speech engine named Voicebox. Meta’s experiments and the investment in ElevenLabs are signals that the accelerating synthetic voice market isn’t vanishing any time soon. The list of AI-generated voices for songs and singers, fake podcast episodes, parody commercials, and even romantic partners only gets longer.
Like ElevenLabs, Meta built its own generative AI audio detective to spot the use of its models. Meanwhile, Resemble AI has designed an audio watermark that could be inserted into any deepfake audio. ElevenLabs is very bullish about the AI Speech Classifier, but a voluntary API is not necessarily going to appease those worried about identity theft. That’s before even touching the issues raised by voice actors and other audio-based performers.
“Our mission is to be the ultimate tool for storytelling, dissolving language barriers and putting all audiences in the reach of all content creators in a safe and responsible way,” Staniszewski said. “With an incredible, growing team and these exceptional investors, ElevenLabs is now ever closer to realizing its long-term goal of making all content universally accessible in any language and in any voice.”