DINOv2: Self-supervised visual feature learning
PyTorch vision transformers pretrained on 142M unlabeled images.
Learn more about DINOv2
import torch\nmodel = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')\nembedding = model(torch.randn(1, 3, 224, 224))
Large-scale unsupervised pretraining
Models are trained on 142 million unlabeled images without annotations or manual labels, producing features that generalize across domains without fine-tuning requirements.
Multiple model scales with registers
Provides four model sizes (ViT-S/14, ViT-B/14, ViT-L/14, ViT-g/14) with optional register token variants that improve feature quality and stability in transformer layers.
Patch-level feature extraction
Generates both image-level and per-patch visual features that enable pixel-level tasks like segmentation and depth estimation alongside image classification.
import torch
from PIL import Image
from torchvision import transforms
model = torch.hub.load('facebookresearch/dinov2', 'dinov2_vits14')
transform = transforms.Compose([transforms.Resize(224), transforms.ToTensor()])
image = Image.open('photo.jpg')
input_tensor = transform(image).unsqueeze(0)
features = model(input_tensor)See how people are using DINOv2
Related Repositories
Discover similar tools and frameworks used by developers
Qwen
Alibaba Cloud's pretrained LLMs supporting Chinese/English with up to 32K context length.
Stable Diffusion
CLIP-conditioned latent diffusion model for text-to-image synthesis.
Goose
LLM-powered agent automating local software engineering workflows.
tiktoken
Fast BPE tokenizer for OpenAI language models.
Continue
Multi-LLM coding agent with interactive and automated modes.