Segment Anything Model: Promptable image segmentation
Transformer-based promptable segmentation with zero-shot generalization.
Learn more about segment-anything
Segment Anything Model (SAM) is a vision foundation model designed for promptable segmentation tasks. It uses a transformer-based architecture with an image encoder, prompt encoder, and mask decoder to process visual input and generate segmentation masks. The model was trained on 11 million images with 1.1 billion masks, enabling zero-shot performance across diverse segmentation tasks and visual domains. It supports multiple prompt types including points, boxes, and text, and can operate in both interactive mode with user prompts or automatic mode to segment all objects in an image.
Large-scale training dataset
Trained on 11 million images and 1.1 billion masks, providing broad coverage across visual domains and enabling zero-shot generalization to new segmentation tasks without task-specific fine-tuning.
Flexible prompt interface
Accepts multiple input prompt types including points, bounding boxes, and masks, allowing both interactive segmentation guided by user input and fully automatic mask generation for entire images.
ONNX export capability
The lightweight mask decoder can be exported to ONNX format for deployment across different runtime environments, including browser-based applications, without requiring the full model infrastructure.
from segment_anything import SamPredictor, sam_model_registry
import numpy as np
import cv2
image = cv2.imread("image.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
sam = sam_model_registry["vit_h"](checkpoint="sam_vit_h.pth")
predictor = SamPredictor(sam)
predictor.set_image(image)
input_point = np.array([[500, 375]])
input_label = np.array([1])
masks, scores, logits = predictor.predict(point_coords=input_point, point_labels=input_label)Related Repositories
Discover similar tools and frameworks used by developers
DeepSpeed
PyTorch library for training billion-parameter models efficiently.
evo2
Foundation model for DNA sequence generation and scoring.
AutoGPT
Block-based visual editor for autonomous AI agents.
gemini-cli
Access Google's powerful Gemini AI models directly from your terminal with an intuitive command-line interface for text, image, and multimodal interactions.
InvokeAI
Node-based workflow interface for local Stable Diffusion deployment.