Navigate:
GroundingDINO
~$GROUN0.3%

Grounding DINO: Open-set object detection with vision-language grounding

Zero-shot object detection from text prompts.

LIVE RANKINGS • 10:20 AM • STEADY
OVERALL
#255
132
AI & ML
#82
27
30 DAY RANKING TREND
ovr#255
·AI#82
STARS
9.8K
FORKS
992
7D STARS
+29
7D FORKS
-4
Tags:
See Repo:
Share:

Learn more about GroundingDINO

Grounding DINO is a vision-language transformer model for object detection that extends DINO with grounding capabilities through pre-training on image-text pairs. It uses a transformer architecture that jointly processes visual features and language embeddings to align object regions with textual descriptions. The model supports zero-shot detection by accepting arbitrary class names as text input, enabling detection of objects outside its training distribution. Common applications include open-world object detection, automated dataset annotation, and integration with segmentation models for instance-level tasks.

GroundingDINO

1

Vision-language alignment

Integrates DINO's detection backbone with grounded pre-training to directly map image regions to natural language descriptions, enabling detection based on arbitrary text queries rather than fixed class sets.

2

Zero-shot detection capability

Detects object classes not present in training data by leveraging language understanding, allowing the model to generalize to novel categories specified at inference time.

3

Transformer-based architecture

Uses a transformer encoder-decoder design that processes both visual and textual information jointly, enabling flexible reasoning about object-language relationships without separate classification heads per class.


import torch
from PIL import Image
from groundingdino.util.inference import load_model, load_image, predict, annotate
from groundingdino.util.slconfig import SLConfig
from groundingdino.models import build_model

# Load model configuration and weights
model_config_path = "groundingdino/config/GroundingDINO_SwinT_OGC.py"
model_checkpoint_path = "weights/groundingdino_swint_ogc.pth"
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the model
model = load_model(model_config_path, model_checkpoint_path, device=device)

# Load and process image
image_path = "path/to/your/image.jpg"
image_source, image = load_image(image_path)

# Define text prompt for detection
text_prompt = "cat . dog . person"
box_threshold = 0.35
text_threshold = 0.25

# Perform prediction
boxes, logits, phrases = predict(
    model=model,
    image=image,
    caption=text_prompt,
    box_threshold=box_threshold,
    text_threshold=text_threshold,
    device=device
)

# Annotate and save results
annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)
annotated_frame.save("annotated_result.jpg")
print(f"Detected {len(boxes)} objects: {phrases}")

See how people are using GroundingDINO

Loading tweets...


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers