Navigate:
All ReposGroundingDINO
~$GROUND0.1%

Grounding DINO: Open-set object detection with vision-language grounding

Zero-shot object detection from text prompts.

LIVE RANKINGS • 06:11 AM • STEADY
OVERALL
#163
90
AI & ML
#67
30
30 DAY RANKING TREND
ovr#163
·AI#67
STARS
9.5K
FORKS
983
DOWNLOADS
4
7D STARS
+9
7D FORKS
0
Tags:
See Repo:
Share:

Learn more about GroundingDINO

Grounding DINO is a vision-language transformer model for object detection that extends DINO with grounding capabilities through pre-training on image-text pairs. It uses a transformer architecture that jointly processes visual features and language embeddings to align object regions with textual descriptions. The model supports zero-shot detection by accepting arbitrary class names as text input, enabling detection of objects outside its training distribution. Common applications include open-world object detection, automated dataset annotation, and integration with segmentation models for instance-level tasks.


1

Vision-language alignment

Integrates DINO's detection backbone with grounded pre-training to directly map image regions to natural language descriptions, enabling detection based on arbitrary text queries rather than fixed class sets.

2

Zero-shot detection capability

Detects object classes not present in training data by leveraging language understanding, allowing the model to generalize to novel categories specified at inference time.

3

Transformer-based architecture

Uses a transformer encoder-decoder design that processes both visual and textual information jointly, enabling flexible reasoning about object-language relationships without separate classification heads per class.


pip install groundingdino-py

vv0.1.0-alpha2

Alpha release reports 56.7 AP on COCO using multi-dataset training; release notes do not specify breaking changes, requirements, or upgrade steps.

  • Verify model compatibility if upgrading from v0.1.0-alpha; no migration guidance is provided in the notes.
  • Expect models trained on O365, VG, RefCOCO, COCO, OpenImage, Cap4M, and ODinW-35 datasets with reported benchmark performance.
vv0.1.0-alpha

Initial alpha release provides a pretrained Grounding DINO checkpoint; release notes do not specify breaking changes, requirements, or setup steps.

  • Download the released checkpoint to begin using Grounding DINO for object detection with text prompts.
  • Expect API instability and potential breaking changes in future releases given alpha maturity status.


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers