Resemble AI Turns Short Records into Synthetic Voices

Synthetic speech technology startup Resemble AI can put words in your mouth. In fact, Resemble can put words in anyone’s mouth if they have just a few minutes of their voice thanks to its proprietary deep learning software. The Toronto and San Francisco-based company can customize speech based on real voices and synthesize entirely new ones, with potential applications for a wide array of industries.

Vox Populi

“People are surprised as to how much we can make it sound like them,” said Resemble AI co-founder Zohaib Ahmed in an interview with Voicebot. “There’s sheer astonishment that we can make it sound like them.”

With a recording of someone speaking, Resemble can create new audio with the same voice. The more audio they have, the more natural the resulting voice sounds. Creating the synthetic voice doesn’t require a very long recording.  Resemble AI’s other co-founder, Saqib Muhammad, told Voicebot.

“We can make it with very little data. It can sound very real with less than ten minutes recorded.”

While the result is not yet at the point of being indistinguishable from natural speech, the founders said the results are improving as more and more data is added. Each voice is broken down into a list of identifiers, a kind of vocal fingerprint, which can then be applied to words other than the ones from the original recording.

Ahmed and Muhammad said that the recordings can come from anywhere, even very old audio, when the sound quality was much worse, can be used. The synthesized speech can then even be made to sound like it also comes from the same time period as the original. With enough audio, Resemble can create new voices entirely by combining different elements from its voice database.

Vocal Talent

Resemble has already started attracting investors, including AET and Craft Ventures, and is closing a seed round for an undisclosed amount soon. The company was also chosen for the Betaworks Ventures Synthetic Camp accelerator program in New York this year, which came with a $200,000 investment.

The co-founders see major enterprise clients as the main users of their platform, such as in voice acting for movies, television, and video games.

“We think gaming is the gold standard of the technology. It needs high-quality synthetic speech and high-quality emotions,” Ahmed said. “But, it’s a shift in paradigm for [all] voice acting. It’s like how with advances in visual technology, actors do less work.”

Synthetic Symphony

Of course, Resemble isn’t the only company working in this space. VocaliD began as a voice prosthesis developer, but now offers synthetic voices for call centers, and voice apps, while Baidu has been investigating what it calls voice cloning for some time. Resemble has its own ways of standing out, however.

“They can use [the platform] with very little technical knowledge. That allowed us to get a lot of traction,” Ahmed said. “But, one of the things that make us very unique is that we are able to control the emotion. We have an engine of tunable emotions. It’s not just buckets of happy or sad, because there’s not just one way of being happy or one way of being sad. You can be happy and excited, or sad and nervous, or so many other combinations.”

Resemble AI is still a young startup, but with plans to expand quickly once the seed round is closed. Ahmed and Muhammad plan to hit ten team members by the end of the year. With the basic platform in place, the next steps are to refine and adjust it for what they hope will be a long list of interested clients.

“We can capture the essence of how someone speaks,” Ahmed said. “And it can be shared in brand new ways.”


John Legend’s Voice Makes its Debut on Google Home, But it’s a Recording, Not Generative Speech Synthesis

Rupal Patel CEO of VocaliD Discusses Customized Synthetic Voices – Voicebot Podcast Ep 102