MMDetection: PyTorch object detection toolbox
Modular PyTorch framework for object detection research and deployment.
Learn more about MMDetection
MMDetection is a PyTorch-based detection framework developed as part of the OpenMMLab project. The codebase uses a modular architecture where detection pipelines are constructed by combining interchangeable components for backbones, necks, heads, and loss functions. It supports multiple detection paradigms including two-stage detectors (Faster R-CNN, Cascade R-CNN), single-stage detectors (RetinaNet, SSD, YOLO variants), and transformer-based approaches (DETR, Grounding DINO). The framework is commonly used for research, benchmarking detection algorithms, and deploying detection models in production applications.
Modular Component Architecture
Detection pipelines are built by combining independent modules for backbones, feature pyramids, detection heads, and loss functions. Enables customization and experimentation without modifying core framework code or requiring forks.
Multi-Task Detection Framework
Handles object detection, instance segmentation, panoptic segmentation, and semi-supervised learning within a single unified framework. Eliminates the need for separate tools or switching between different codebases for different vision tasks.
Comprehensive Model Zoo
Includes implementations of diverse detector architectures spanning two-stage methods, single-stage detectors, and transformer-based approaches. Ships with pre-trained weights for immediate benchmarking and transfer learning without training from scratch.
from mmdet.apis import init_detector, inference_detector
config_file = 'configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py'
checkpoint_file = 'checkpoints/faster_rcnn_r50_fpn_1x_coco.pth'
model = init_detector(config_file, checkpoint_file, device='cuda:0')
result = inference_detector(model, 'demo/demo.jpg')
print(result)Related Repositories
Discover similar tools and frameworks used by developers
llama_index
Connect LLMs to external data via RAG workflows.
Whisper
Speech recognition system supporting multilingual transcription, translation, and language ID.
ComfyUI
Visual graph-based diffusion model workflow builder.
Open WebUI
Extensible multi-LLM chat platform with RAG pipeline.
OpenCV
Cross-platform C++ library for real-time computer vision algorithms.