Instance segmentation is a computer vision model type that lets you find the exact location of objects in an image.
Whereas object detection models return a box that corresponds to the region in which an object appears in an image, instance segmentation models return a pixel-level "mask" that precisely encapsulates a specific object in an image.
Instance segmentation models are ideal for use cases when you need to know the precise location of an object in an image. For example, if you are trying to analyze the color of objects in an image, or if you are trying to compare the length or width of an object to a reference, segmentation models are essential.
There are a wide variety of instance segmentation models available. A popular choice is to use a YOLO model and fine-tune it for your use case. YOLO models such as YOLOv8 and YOLO11 are common choices because they are fast and, when fine-tuned with a high-quality dataset, accurate.
Fine-tuned models are most appropriate for edge deployment. In production, for example, you could run a YOLO model on a Jetson to monitor feeds from a camera positioned over an assembly line in real time.
Here is an example of a YOLO segmentation model that identifies debris in an image of coffee beans:
The purple highlighted regions correspond with stones that were found in the image, a defect that needs to be detected before coffee is packaged and sent to customers.
There is another category of models used for instance segmentation that are slower but allow for use without any prior training: zero-shot models.
Zero-shot segmentation models let you calculate segmentation masks around an object. An example of such a model is Meta's Segment Anything model series. With that said, zero-shot models often return only masks, not classes associated with an object. Thus, zero-shot models are commonly used in annotation as an assistant, where the zero-shot model can be used to draw a mask given a point or a bounding box.
Here is an example showing Meta's latest SAM-2.1 model segmenting regions in an image:
All of the segmentation masks above, indicated by the different colours with which objects are filled in the images, were calculated by SAM-2.1 without specific training on those images.
You may be wondering: how practical is it to run image segmentation models in production? How do you start deploying a model? Great questions!
Image segmentation models can run at several FPS on modern hardware. For image segmentation model deployment on the edge, NVIDIA Jetsons and NVIDIA GPUs are common choice for hardware. You can also deploy your model on a Raspberry Pi, in the browser, and more.
For deploying a model to the edge, you can use Roboflow Inference, an open source computer vision deployment solution. With Inference, you can deploy your model to a wide range of devices without having to worry about dependency management and environment setup.
Inference lets you run your model through a Python SDK. You can also use the Python SDK with a Docker Inference container, ideal for projects where you want to containerize your model deployments.
Use the search engine below to explore image segmentation models across a range of use cases. This search engine is powered by Roboflow Universe, a community of over 200,000 computer vision datasets and 50,000 trained models.
Evaluate the ripeness of coffee beans on a tree.
Find cracks in concrete.
Find the rims of pipes.
Below, we have curated a list of resources you can use to explore more image segmentation model architectures and get started in training your first model.
Explore model architectures from YOLOv8 to YOLO11 to SAM that are commonly used fr segmentation.
Learn how to train a segmentation model on your own data.
supervision provides a range of utilities for working with predictions from segmentation models.