MMDetection: PyTorch object detection toolbox
Modular PyTorch framework for object detection research and deployment.
Learn more about mmdetection
MMDetection is a PyTorch-based detection framework developed as part of the OpenMMLab project. The codebase uses a modular architecture where detection pipelines are constructed by combining interchangeable components for backbones, necks, heads, and loss functions. It supports multiple detection paradigms including two-stage detectors (Faster R-CNN, Cascade R-CNN), single-stage detectors (RetinaNet, SSD, YOLO variants), and transformer-based approaches (DETR, Grounding DINO). The framework is commonly used for research, benchmarking detection algorithms, and deploying detection models in production applications.
Modular Component Architecture
Detection pipelines are built by combining independent modules for backbones, feature pyramids, detection heads, and loss functions. Enables customization and experimentation without modifying core framework code or requiring forks.
Multi-Task Detection Framework
Handles object detection, instance segmentation, panoptic segmentation, and semi-supervised learning within a single unified framework. Eliminates the need for separate tools or switching between different codebases for different vision tasks.
Comprehensive Model Zoo
Includes implementations of diverse detector architectures spanning two-stage methods, single-stage detectors, and transformer-based approaches. Ships with pre-trained weights for immediate benchmarking and transfer learning without training from scratch.
from mmdet.apis import init_detector, inference_detector
config_file = 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_coco.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')
result = inference_detector(model, 'demo/demo.jpg')
print(result)MMDetection v3.3.0 releases
- –MM Grounding DINO An Open and Comprehensive Pipeline for Unified Object Grounding and Detection Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC)
- –Its effectiveness has led to its widespread adoption as a mainstream architecture for various downstream applications
MMDetection v3.2.0 Release
- –Highlight v3.2.0 was released in 12/10/2023: **1
- –Detection Transformer SOTA Model Collection** (1) Supported four updated and stronger SOTA Transformer models: DDQ, CO-DETR, AlignDETR, and H-DINO
MMDetection v3.1.0 Release
- –Support ViTDet
- –Provides a gradio demo for image type tasks of MMDetection, making it easy for users to experience.
- –`select_first`: select the first text in the text list as the description to an instance.
- –`original`: use all texts in the text list as the description to an instance.
- –`concat`: concatenate all texts in the text list as the description to an instance.
Top in AI & ML
Related Repositories
Discover similar tools and frameworks used by developers
segment-anything
Transformer-based promptable segmentation with zero-shot generalization.
Pica
Unified API platform connecting AI agents to 150+ integrations with auth and tool building.
crewAI
Python framework for autonomous multi-agent AI collaboration.
gym
Standard API for reinforcement learning environment interfaces.
Triton
Domain-specific language and compiler for writing GPU deep learning primitives with higher productivity than CUDA.