Google Debuts Imagen Video AI Text-to-Video Generator Rival for Meta’s Make-A-Video
Google has revealed a text-to-video AI engine called Imagen Video in a research paper. The synthetic media tool translates written prompts into silent videos just days after Meta showcased its own text-to-video AI, Make-A-Video.
Imagen’s videos, like the example above, use AI to interpret the words of a prompt for both content and style. The AI can pattern the video after famous artists, create three-dimensional objects, and reproduce any text in different animation formats. The multi-step process uses seven diffusion models to first produce a low-resolution video draft before upgrading the resolution repeatedly. The AI trained in the public LAION-400M image-text dataset, along with 60 million pairings of images with descriptive text and 14 million pairings of videos with text.
In tandem with the Imagen, Google researchers published information on a different text-to-video model named Phenaki. designed to process longer and more detailed text prompts into longer videos In comparison to Imagen Video, Phenaki’s focus is on creating longer videos that follow the instructions of a more detailed prompts. For instance, the Imagen video at the top’s prompt was “Teddy bear skating in Times Square” while the Phenaki video below used the prompt series, “A teddy bear diving in the ocean, A teddy bear emerges from the water, A teddy bear walks on the beach, Camera zooms out to the teddy bear in the campfire by the beach.”
“This is a dream solution for social media creators. The core content for many creators is verbal-first, while the channels they are distrubuted on are visual-first. And, all of the platforms are all biasing towards video because that is what drives views, view time, and engagement,” Voicebot founder Bret Kinsella explained in his Synthedia newsletter. “Given that these creators are often looking to spice up the visuals in their streams, the text-to-video generators will be a way to do that without incurring royalty expenses, without a significant time investment, and with a custom message depicted in the visuals that aligns with the segment.”
Despite the evident success of Imagen at producing interesting videos from text prompts, Google won’t make it available for the public to use just yet out of concerns for how the technology might be employed. Like Meta’s Make-A-Video, the code for Imagen will be restricted to researchers for now.
“The potential risks of misuse raise concerns regarding responsible open-sourcing of code and demos,” Google’s researcher explained. “At this time we have decided not to release code or a public demo. In future work we will explore a framework for responsible externalization that balances the value of external auditing with the risks of unrestricted open-access.”