Navigate:

All ReposGroundingDINO

~$GROUN↑0.3%

Grounding DINO: Open-set object detection with vision-language grounding

Zero-shot object detection from text prompts.

LIVE RANKINGS • 12:33 PM • STEADY

OVERALL

#255

132

AI & ML

#82

30 DAY RANKING TREND

ovr#255

·AI#82

STARS

9.8K

FORKS

992

7D STARS

+29

7D FORKS

-4

Tags:

AI & ML

See Repo:

Learn more about GroundingDINO

Grounding DINO is a vision-language transformer model for object detection that extends DINO with grounding capabilities through pre-training on image-text pairs. It uses a transformer architecture that jointly processes visual features and language embeddings to align object regions with textual descriptions. The model supports zero-shot detection by accepting arbitrary class names as text input, enabling detection of objects outside its training distribution. Common applications include open-world object detection, automated dataset annotation, and integration with segmentation models for instance-level tasks.

Vision-language alignment

Integrates DINO's detection backbone with grounded pre-training to directly map image regions to natural language descriptions, enabling detection based on arbitrary text queries rather than fixed class sets.

Zero-shot detection capability

Detects object classes not present in training data by leveraging language understanding, allowing the model to generalize to novel categories specified at inference time.

Transformer-based architecture

Uses a transformer encoder-decoder design that processes both visual and textual information jointly, enabling flexible reasoning about object-language relationships without separate classification heads per class.

import torch
from PIL import Image
from groundingdino.util.inference import load_model, load_image, predict, annotate
from groundingdino.util.slconfig import SLConfig
from groundingdino.models import build_model

# Load model configuration and weights
model_config_path = "groundingdino/config/GroundingDINO_SwinT_OGC.py"
model_checkpoint_path = "weights/groundingdino_swint_ogc.pth"
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the model
model = load_model(model_config_path, model_checkpoint_path, device=device)

# Load and process image
image_path = "path/to/your/image.jpg"
image_source, image = load_image(image_path)

# Define text prompt for detection
text_prompt = "cat . dog . person"
box_threshold = 0.35
text_threshold = 0.25

# Perform prediction
boxes, logits, phrases = predict(
    model=model,
    image=image,
    caption=text_prompt,
    box_threshold=box_threshold,
    text_threshold=text_threshold,
    device=device
)

# Annotate and save results
annotated_frame = annotate(image_source=image_source, boxes=boxes, logits=logits, phrases=phrases)
annotated_frame.save("annotated_result.jpg")
print(f"Detected {len(boxes)} objects: {phrases}")

See how people are using GroundingDINO

Loading tweets...

Top in AI & ML

Trending Repos

Pi Mono

17,222#1

OpenClaw

233,443#2

Zvec

8,089#3

Claude Code

70,649#4

Heretic

9,761#5

See all →

LIVE RANKINGS • 12:33 PM • STEADY

OVERALL

#255

132

AI & ML

#82

30 DAY RANKING TREND

ovr#255

·AI#82

STARS

9.8K

FORKS

992

7D STARS

+29

7D FORKS

-4

[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers

Grounding DINO: Open-set object detection with vision-language grounding

Learn more about GroundingDINO

What is GroundingDINO for?

What makes GroundingDINO different?

Vision-language alignment

Zero-shot detection capability

Transformer-based architecture

Example code snippets

See how people are using GroundingDINO

Top in AI & ML

Pi Mono

OpenClaw

Claude Code

Heretic

Rowboat

Trending Repos

Pi Mono

OpenClaw

Zvec

Claude Code

Heretic

Related Repositories

Docling

DINOv2

LangChain

FAISS

ONNX Runtime

Product

Company

Helpful Links