New Neosapience Tool Synthesizes Any Text into Emotion for Virtual Actor Speeches – Exclusive
Synthetic media startup Neosapience has introduced a new AI-powered tool allowing users to write out the emotion they want virtual actors to use when speaking. AI-powered video and audio built with the startup’s Typecast platform won’t be limited to a limited list of emotional options; users can write more complex and subtle feelings into the text box. The emotion can even be contradictory, like “angry but a little happy,” or even reference a specific kind of event, such as “sad, like his friend just passed away.” Typecast’s AI will use natural language processing to attempt to imbue the speech with the requested emotion.
Typecast Emotional Speech
Neosapience’s Typecast platform supports the creation of synthetic audio and video based on models of a couple of hundred actors, newscasters, and other professionals. As with other synthetic media services, Typecast’s virtual beings mimic humans in speech and movement using uploaded scripts. The South Korean startup has offered around a dozen standard emotions as options for how the AI speaks. The new emotional style control opens up far more possibilities by letting users write the emotions in parentheses in the script of what the AI will actually say. The AI adjusts its speed, tone, emphasis, and other elements to attempt to match what’s written using a growing database to help define the subtle distinctions in describing emotions.
“We previously only offered tone setting. Now, it’s free-form emotional description, very similar to how a real human actor would act when receiving emotional descriptions [from a director],” Neosapience CEO Taesu Kim told Voicebot in an exclusive interview. “We’ve developed emotional speech synthesis over the last four years, with different types of emotional controls and rhythms. Now, [users] can basically control and adjust the prosody of each syllable.”
The text-based emotional style control is available in English and Korean, though the company plans on making it a part of other languages in development for the platform. There’s a lot more for the AI to learn about emotional language just with English and Korean, Kim said. But, the models are continually improving and broadening their understanding of how emotions and language interconnect.
“[The AI] now learns the relationship between natural language and emotion style in speech data,” Kim explained. “Our [AI] can learn emotional language [terms], then generate speech from the emotional description.”
The explosion of interest in synthetic media is drawing plenty of interest to companies that can offer ever-more realistic mimics of human speech and movement. That includes virtual clones like those built by Hour One, video dubbing and translation like Papercup, or D-ID, which now synthesizes videos from still pictures for corporate training. Neosapience has seen its own success in the space as well. The company nabbed $21.5 million in investment in February, while the top Korean YouTube shorts channel exclusively uses Typecast for dubbing. Neosapience has also picked up significant business from major book publishers looking to use AI to read audiobooks.