Segment Anything 2
A story of making objects from pixels
Primitive Vision
Segment Anything 2 (SAM 2) is an AI model that enables computers to see in a way more similar to how humans do. Specifically, it allows objects in videos to be selected and separated from each other. We may take for granted that a horse and the sky behind it are not the same thing. But, to a computer, they are just pixels of different colors. SAM 2 changes this, and lets almost any object be precisely selected and tracked throughout a video, in real time.
So What?
When beginning work on SAM 2, we did not know what exactly about it would click most for people, beyond the scientific achievement. I was responsible for developing and conducting research on what the model should prioritize making possible, and what story we would tell. This took shape through me studying the roles of video in peoples' lives today, imagining what experiences would both capture imaginations, and show the newness of SAM 2. I identified a need for simple tools, both for personal expression and data annotation, to lower barriers presented by the complexity of video editing.
Competing Priorities
The core challenge of designing an experience around SAM 2 was in the tension between showcasing novel research and building a relatable tool. As a concrete example, I had to study closely how people conceive of objects. For example, when looking at the fish above, should a click on a white stripe segment the stripe, the fish, or something else? In some cases, an object part may be preferred, and in others, the whole object. The ability to select object parts was a strong technical achievement, but a seemingly less frequent behavior. We designed methods of interaction to balance these competing priorities, highlighting the innovation and utility of SAM 2.
Signs of Success
Model releases at FAIR largely realize success through open sourcing. New models need to both advance science and produce value for people. For SAM 2, as with many other projects, this required testing demo concepts, but also advocating for better methods of model interaction, improved data curation, and human-aligned metrics with which to evaluate success. Painting a picture of how new technology can positively fit into the world is one of my favorite things about working on frontier AI research.