Mask2Former: Transformer-based universal image segmentation
Unified transformer architecture for multi-task image segmentation.
Learn more about Mask2Former
Mask2Former is a computer vision model that performs image segmentation using transformer-based architecture with masked attention mechanisms. The system processes images through a backbone encoder and applies attention operations constrained by learned masks to generate segmentation outputs. It handles three segmentation task types (panoptic, instance, and semantic) through a single unified model architecture rather than task-specific variants. The codebase supports training and inference on major segmentation benchmarks including ADE20K, Cityscapes, COCO, and Mapillary Vistas, with additional support for video instance segmentation.
Unified multi-task architecture
A single model handles panoptic, instance, and semantic segmentation without task-specific modifications. This contrasts with prior approaches that typically required separate models or significant architectural changes per task.
Masked attention mechanism
The transformer uses learned masks to constrain attention operations, reducing computational overhead compared to full attention while maintaining segmentation quality. This design choice improves efficiency during both training and inference.
Multi-dataset support
The framework includes implementations for multiple major segmentation datasets and benchmarks, with pre-trained models available in the Model Zoo. Video instance segmentation is also supported through an accompanying technical report.
Related Repositories
Discover similar tools and frameworks used by developers
ALLWEONE Presentation AI
AI-powered slide generator with multi-model integration and themes.
YOLOv7
PyTorch single-stage detector with bag-of-freebies training optimizations.
whisper.cpp
Lightweight C++ port of OpenAI Whisper for cross-platform speech recognition.
Claude Code
Local CLI integrating Claude AI into development workflows.
ComfyUI-Manager
Graphical package manager for ComfyUI custom nodes.