OpenCLIP: Open source CLIP implementation
PyTorch library for contrastive language-image pretraining.
Learn more about open_clip
OpenCLIP is a PyTorch-based library implementing contrastive language-image pretraining, a technique that learns joint embeddings of images and text by maximizing similarity between matched pairs while minimizing similarity between unmatched pairs. The implementation supports multiple vision encoders (ViT, ConvNeXt, SigLIP) and text encoders, trained on datasets ranging from LAION-400M to DataComp-1B. The library provides pretrained model checkpoints with documented zero-shot performance across 38 datasets and enables inference through simple APIs for encoding images and text into comparable embedding spaces. Common applications include zero-shot image classification, image-text retrieval, and transfer learning for vision tasks without task-specific labeled data.
Reproducible scaling studies
The project includes detailed research on scaling laws for contrastive language-image learning, with models trained across different compute budgets and dataset sizes to document how performance scales with training data and model capacity.
Multiple architecture support
Supports diverse vision encoders including Vision Transformers, ConvNeXt, and SigLIP variants, along with different training datasets, allowing users to select models optimized for specific accuracy-efficiency trade-offs.
Comprehensive model collection
Provides access to numerous pretrained models through a unified interface, with model cards on Hugging Face Hub and documented zero-shot results across 38 datasets for transparent performance comparison.
import torch
import open_clip
from PIL import Image
import numpy as np
class CLIPClassifier:
def __init__(self, model_configs):
self.models = []
for config in model_configs:
model, _, preprocess = open_clip.create_model_and_transforms(
config['arch'], pretrained=config['pretrained']
)
model.eval()
tokenizer = open_clip.get_tokenizer(config['arch'])
self.models.append({
'model': model, 'preprocess': preprocess,
'tokenizer': tokenizer, 'name': config['name']
})
def classify_image(self, image_path, labels):
results = {}
image = Image.open(image_path)
for model_info in self.models:
processed_image = model_info['preprocess'](image).unsqueeze(0)
text_tokens = model_info['tokenizer'](labels)
with torch.no_grad():
image_features = model_info['model'].encode_image(processed_image)
text_features = model_info['model'].encode_text(text_tokens)
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
results[model_info['name']] = dict(zip(labels, probs[0].tolist()))
return resultsTop in AI & ML
Related Repositories
Discover similar tools and frameworks used by developers
Mask2Former
Unified transformer architecture for multi-task image segmentation.
nanoGPT
Minimal PyTorch implementation for training GPT models.
pix2pix
Torch implementation for paired image-to-image translation using cGANs.
Continue
Multi-LLM coding agent with interactive and automated modes.
Claude Code
Local CLI integrating Claude AI into development workflows.