Resemble AI Can Now Clone Your Voice to Speak New Languages
Synthetic voices clones by speech technology startup Resemble AI can now be taught to speak a half-dozen languages. The new Localize function translates the digital voice to speak in other languages, much as it replicates the voice to speak new sentences in the original speaker’s native tongue.
Resemble AI developed a platform for generating synthetic speech profiles with minimal data, theoretically turning even five minutes of a person’s voice into an AI capable of saying whatever the user wants as though it were a new recording. Usually, a voice assistant or similar platform may be able to translate what someone says into a different language but will still use the standard artificial voice. Localize layers the translation over the artificial voice generator so that you might say a sentence in English, then hear your own voice say it in French a moment later.
“We have a lot of customers who are multinational and extremely interested in translating languages,” Resemble CEO Zohaib Ahmed said in an interview with Voicebot. “We started building different voices in different languages, but we were getting challenged with words in English appearing in other languages, like Burger King in a sentence in Spanish. Burger King is English; you don’t want it to be in a Spanish accent. So we started building out this functionality.”
Ahmed explained that voice talent translation usually takes a few months and costs a lot of money. For those who want to do translations but lack one or both of those factors, the Localize features might be an attractive offer. Right now, Resemble can translate among English, French, German, Dutch, Italian, and Spanish. Ahmed said there are other languages in the works, with Japanese, Korean, and Mandarin next in line. The translation works as an extension of the existing platform, further simplifying matters, even if it takes more audio data than simply building a synthetic voice in one language.
“It’s baked into the platform, but the process is slightly different,” Ahmed said. It’s not the five-minute promise. It takes a little more than an hour of recording.”
Translating artificial voices is of interest to the same industries that Resemble has targeted with its platform already. The entertainment industry, in particular, should be interested, Ahmed said. Adding local languages means movies or games could be dubbed with the original actor’s voice, keeping their voice’s flavor without them needing to learn a new language. The platform’s ability to tune the emotions of the voice comes into play there too. Since the platform doesn’t need much data or technical knowledge to operate, the potential customer base is extensive, Ahmed explained. The more powerful translation features offered by Amazon, Google, and other tech giants, meanwhile, is still more for text, translating websites or taking audio and translating it into a text display.
“The localization of speech is extremely nascent, but text localization is very mature. We’re trying to follow that path,” Ahmed said. “The goal is still to minimize the amount of data needed and make it easy for the developer to control.”