Navigate:
All ReposMask2Former
~$MASK2F0.1%

Mask2Former: Transformer-based universal image segmentation

Unified transformer architecture for multi-task image segmentation.

LIVE RANKINGS • 06:51 AM • STEADY
OVERALL
#175
49
AI & ML
#72
12
30 DAY RANKING TREND
ovr#175
·AI#72
STARS
3.2K
FORKS
490
DOWNLOADS
7D STARS
+4
7D FORKS
0
Tags:
See Repo:
Share:

Learn more about Mask2Former

Mask2Former is a computer vision model that performs image segmentation using transformer-based architecture with masked attention mechanisms. The system processes images through a backbone encoder and applies attention operations constrained by learned masks to generate segmentation outputs. It handles three segmentation task types (panoptic, instance, and semantic) through a single unified model architecture rather than task-specific variants. The codebase supports training and inference on major segmentation benchmarks including ADE20K, Cityscapes, COCO, and Mapillary Vistas, with additional support for video instance segmentation.

Mask2Former

1

Unified multi-task architecture

A single model handles panoptic, instance, and semantic segmentation without task-specific modifications. This contrasts with prior approaches that typically required separate models or significant architectural changes per task.

2

Masked attention mechanism

The transformer uses learned masks to constrain attention operations, reducing computational overhead compared to full attention while maintaining segmentation quality. This design choice improves efficiency during both training and inference.

3

Multi-dataset support

The framework includes implementations for multiple major segmentation datasets and benchmarks, with pre-trained models available in the Model Zoo. Video instance segmentation is also supported through an accompanying technical report.


pip install git+https://github.com/facebookresearch/Mask2Former.git


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers