Synthetic Video Producer D-ID Unites Stable Diffusion and GPT-3 on Multimodal Generative AI Platform
Synthetic media startup D-ID has combined generative AI models for text and images to produce videos through its Creative Reality Studio. D-ID’s platform offers Stable Diffusion’s text-to-image generator along an AI script composer built with OpenAI’s GPT-3 model and its proprietary tool that transforms the static image into an animated character reciting the provided script.
D-ID’s upgraded Creative Reality Studio offers users the chance to design their video avatar from an image produced by Stable Diffusion from a text prompt. The image of the avatar is then matched to a script written by the user or composed by OpenAI’s GPT-3 text generator, found within the platform, as seen in the video above. The user can decide the language, voice, and manner of speaking they want the synthetic speech to replicate. D-ID then brings the image to life and puts the spoken script into the avatar’s mouth using Amazon’s text-to-speech toolkit. The platform could already make videos from a single photo and text script, but generative AI is now more deeply embedded in the Creative Reality Studio experience.
“We are only beginning to see the potential of creative Generative AI technologies and they are set to take the world by storm,” D-ID CEO Gil Perry said. “We are incredibly proud to be at the bleeding edge of the emergent generative AI scene, introducing the first platform to offer image and text generation together with animation, providing unlimited creative possibilities and potential. We are very much looking forward to seeing our users’ results.”
The company first garnered attention for transforming old photographs into videos under the Deep Nostalgia brand name. Since then, D-ID synthetic videos have been used for a range of training, marketing, and other business ventures. The startup claims its tech has generated more than 107 million videos in total. It’s ultimately a lot cheaper to make multiple variations of the videos, including in more than one language and is only restricted to the images and text that a company generates.
Synthetic video use is grabbing lots of interest, especially in tandem with other generative AI. Rephrase.ai recently closed a $10.6 million round to pursue its own take on the market, which counts several industry-specific startups among its rivals. For instance, virtual training startup Virti began specifically as a training tool for medical professionals but has since widened its scope to all kinds of business. Microsoft’s enterprise software is now investing heavily in simulated workspaces with AI assistance and there are a growing number of synthetic media startups with applicable technology, including Synthesia, Neosapience, Hour One, and Papercup. D-ID’s push for a multimodal generative AI setup may soon be the standard for any competitiors.