OpenAI Releases New Text-to-3D Model Shap-E

OpenAI has shared its latest exploration in turning text prompts into three-dimensional objects called Shap-E. The generative AI tool offers a new way of producing 3D objects that creates better, more detailed, and accurate objects better than the Point-E model released last year.

Shap-E 3D

OpenAI built Shap-E as a text-to-3D generator capable of making fine-grained textures and complex, detailed shapes. While Point-E produces 3D point clouds based on text prompts, Shap-E directly creates the outline of the object and employs a feature called neural radiance fields (NeRFs) to overcome the fuzziness of the earlier model. NeRFs are the same technology used in virtual and augmented reality to make a three-dimensional scene look like a photorealistic environment. Shap-E applies that technology to the more common diffusion models to make the shape and texture of the object suggested by the text prompt. The process is also significantly faster than Point-E. Each Shap-E sample in the collection up top took about 13 seconds to generate on a single NVIDIA V100 GPU, which Point-E would spend as much as two minutes to render on the same hardware.

“We find that Shap·E matches or outperforms a similar explicit generative model given the same dataset, model architecture, and training compute,” the researchers explained. “We also find that our pure text-conditional models can generate diverse, interesting objects without relying on images as an intermediate representation. These results highlight the potential of generating implicit representations, especially in domains like 3D where they can offer more flexibility than explicit representations.”

Shap-E’s developers acknowledge then computational power for large-scale use might be rather high compared to the point cloud approach of Point-E. The AI also still struggles with understanding how to make some complex objects, but the overall results are notable in their successes.

