DINOv2: Self-supervised visual feature learning
PyTorch vision transformers pretrained on 142M unlabeled images.
Learn more about dinov2
import torch\nmodel = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')\nembedding = model(torch.randn(1, 3, 224, 224))
Large-scale unsupervised pretraining
Models are trained on 142 million unlabeled images without annotations or manual labels, producing features that generalize across domains without fine-tuning requirements.
Multiple model scales with registers
Provides four model sizes (ViT-S/14, ViT-B/14, ViT-L/14, ViT-g/14) with optional register token variants that improve feature quality and stability in transformer layers.
Patch-level feature extraction
Generates both image-level and per-patch visual features that enable pixel-level tasks like segmentation and depth estimation alongside image classification.
import torch
from PIL import Image
from torchvision import transforms
model = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')
transform = transforms.Compose([transforms.Resize(224), transforms.ToTensor()])
image = Image.open('photo.jpg')
input_tensor = transform(image).unsqueeze(0)
features = model(input_tensor)See how people are using dinov2
Related Repositories
Discover similar tools and frameworks used by developers
Mask2Former
Unified transformer architecture for multi-task image segmentation.
crewAI
Python framework for autonomous multi-agent AI collaboration.
openpose
Multi-person 135-keypoint anatomical detection in C++.
open_clip
PyTorch library for contrastive language-image pretraining.
AutoGPT
Block-based visual editor for autonomous AI agents.