Meta Jumps into Synthetic Media With Text-to-Video AI Generator ‘Make-A-Video’
Meta has introduced a new AI-powered synthetic media engine that can turn text prompts into short videos. The new Make-A-Video tool works like Midjourney or OpenAI, except instead of a still image, Meta’s platform generates silent video clips up to five seconds long.
Synthetic Video AI
You can see a few examples of Make-A-Video at the top. For context, the prompts that produced those clips were “A dog wearing a superhero cape flying through the sky,” “A spaceship landing on Mars,” “An artist’s brush painting on a canvas close up, highly detailed,” and “A horse drinking water.” The corresponding videos are self-evident, despite some blurring and distortion.
“Make-A-Video builds on Meta AI’s recent progress in generative technology research and has the potential to open new opportunities for creators and artists. The system learns what the world looks like from paired text-image data and how the world moves from video footage with no associated text,” Meta explained in a blog post. “Generative AI research is pushing creative expression forward by giving people tools to quickly and easily create new content. With just a few words or lines of text, Make-A-Video can bring imagination to life and create one-of-a-kind videos full of vivid colors, characters, and landscapes. The system can also create videos from images or take existing videos and create new ones that are similar.”
The videos do well at matching some wildly different prompts. Meta hasn’t let anyone else experiment with Make-A-Video as of yet, so theoretically, the company could only be showing off the very best examples. Still, Meta has said it will start sharing its research with developers at some point soon, which would dissolve that kind of deception quickly. And the researchers aren’t shy about admitting to technical limits and inadvertently generating problematic videos because of the data scraped pulled from hundreds of thousands of hours of video, including plenty scraped from the web. That also exposes how the AI can’t make the logical leaps a human would make instinctively in watching a video.
“This is pretty amazing progress. It’s much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they’ll change over time,” Meta CEO Mark Zuckerberg wrote in a Facebook post about the examples above. “Make-A-Video solves this by adding a layer of unsupervised learning that enables the system to understand motion in the physical world and apply it to traditional text-to-image generation.”