Stability AI Debuts New Text-to-Image Model DeepFloyd IF
Synthetic media startup Stability AI has introduced a new text-to-image generative AI model called DeepFloyd IF that doesn’t use the Stable Diffusion large language model (LLM) the company is best known for. The “cascaded pixel diffusion model” arrives on the heels of Stability’s release of the open-source LLM StableLM, with an open-source version of DeepFloyd IF also in the works.
Instead of Stable Diffusion, DeepFloyd IF relies on the T5-XXL-1.1 model. The more flexible foundation model gives DeepFloyd IF more features and often performs better than the standard version of Stability’s more famous model. For instance, it can generate legible text in various forms and fonts and produces more photorealistic images than many of the current text-to-image engines. The images can also be customized in the text prompt to match non-standard aspect ratios instead of always starting as a square. DeepFloyd IF is also designed for image-to-image manipulation as seen at the top of the page. The model resizes the initial image, then delibleately adds noise before processing the new prompt to alter the style and complete the modification without repeated fine-tuning and tinkering.
“DeepFloyd IF is a state-of-the-art text-to-image model released on a non-commercial, research-permissible license that provides an opportunity for research labs to examine and experiment with advanced text-to-image generation approaches,” Stability AI explained in its announcement. “Incorporating the intelligence of the T5 model, DeepFloyd IF generates coherent and clear text alongside objects of different properties appearing in various spatial relations. Until now, these use cases have been challenging for most text-to-image models.”
While DeepFloyd IF’s features are ahead of the current consumer version of Stable Diffusion, it appears to be the seed of an open-source version of Stability AI’s enterprise-focused Stable Diffusion XL (SDXL) model unveiled last month. SDXL also boasts of being able to embed readable text and a high degree of photorealism. Stability’s expanding generative AI research is fueled in part by the $101 million it raised last year. Stability has also made acquisition part of its strategy, starting with the company behind AI image manipulation service Clipdrop, and worked with digital collectible platform Revel.xyz to release an image-to-animation tool called Animai.