Meta’s New Segment Anything Model Can Identify and Remove Objects in Images and Videos
Meta has introduced a new AI model capable of picking out and extracting individual objects within an image or video. The Segment Anything Model can divide up an image or video into its component objects even when not previously trained to recognize them, though Meta also released what it calls the largest dataset of image annotations ever made to accompany the model.
SAM can carve up an image into separate objects and lift them individually from the whole. Users can even search for specific objects by text, meaning finding Wado could be as simple as typing “red and white striped shirt” into the tool. The “segmenting” AI essentially discerns how pixels should be assigned a tag making up a single object. SAM presents like a more robust version of the technology used by Meta to identify individuals, forbidden content, and what content might interest Facebook and Instagram users. However, Meta envisions a much more ambitious set of potential uses for segmenting AI, especially since it comes with an enormous image identification dataset and the ability to expand upon it. SAM could process a webpage full of images, processing their separate aspects as well as what they mean as a whole.
“Reducing the need for task-specific modeling expertise, training compute, and custom data annotation for image segmentation is at the core of the Segment Anything project. To realize this vision, our goal was to build a foundation model for image segmentation: a promptable model that is trained on diverse data and that can adapt to specific tasks, analogous to how prompting is used in natural language processing models,” Meta explained in a blog post about SAM. “In the AR/VR domain, SAM could enable selecting an object based on a user’s gaze and then “lifting” it into 3D. For content creators, SAM can improve creative applications such as extracting image regions for collages or video editing. SAM could also be used to aid scientific study of natural occurrences on Earth or even in space, for example, by localizing animals or objects to study and track in video. We believe the possibilities are broad, and we are excited by the many potential use cases we haven’t even imagined yet.”
Voicebot founder Bret Kinsella engaged with a demo of SAM, but there isn’t an official product yet. SAM is much like the Make-A-Video generative AI text-to-video tool it demonstrated last year. That said, the demo and dataset suggest Meta wants developers to move in that direction.
“Making it more accessible helps everyone understand better what Meta is trying to accomplish. That may influence researchers to try it out today and start looking at the model instead of waiting until a more convenient time,” Kinsella pointed out in the Synthedia newsletter. “It also will drive better understanding among other observers of AI technologies, which is very broad today, given the popularity of generative AI solutions. That, in turn, will help promote the story and awareness of Meta’s activity in the field.”