OpenCLIP: Open source CLIP implementation
PyTorch library for contrastive language-image pretraining.
Learn more about open_clip
OpenCLIP is a PyTorch-based library implementing contrastive language-image pretraining, a technique that learns joint embeddings of images and text by maximizing similarity between matched pairs while minimizing similarity between unmatched pairs. The implementation supports multiple vision encoders (ViT, ConvNeXt, SigLIP) and text encoders, trained on datasets ranging from LAION-400M to DataComp-1B. The library provides pretrained model checkpoints with documented zero-shot performance across 38 datasets and enables inference through simple APIs for encoding images and text into comparable embedding spaces. Common applications include zero-shot image classification, image-text retrieval, and transfer learning for vision tasks without task-specific labeled data.
Reproducible scaling studies
The project includes detailed research on scaling laws for contrastive language-image learning, with models trained across different compute budgets and dataset sizes to document how performance scales with training data and model capacity.
Multiple architecture support
Supports diverse vision encoders including Vision Transformers, ConvNeXt, and SigLIP variants, along with different training datasets, allowing users to select models optimized for specific accuracy-efficiency trade-offs.
Comprehensive model collection
Provides access to numerous pretrained models through a unified interface, with model cards on Hugging Face Hub and documented zero-shot results across 38 datasets for transparent performance comparison.
import torch
import open_clip
from PIL import Image
import numpy as np
class CLIPClassifier:
def __init__(self, model_configs):
self.models = []
for config in model_configs:
model, _, preprocess = open_clip.create_model_and_transforms(
config['arch'], pretrained=config['pretrained']
)
model.eval()
tokenizer = open_clip.get_tokenizer(config['arch'])
self.models.append({
'model': model, 'preprocess': preprocess,
'tokenizer': tokenizer, 'name': config['name']
})
def classify_image(self, image_path, labels):
results = {}
image = Image.open(image_path)
for model_info in self.models:
processed_image = model_info['preprocess'](image).unsqueeze(0)
text_tokens = model_info['tokenizer'](labels)
with torch.no_grad():
image_features = model_info['model'].encode_image(processed_image)
text_features = model_info['model'].encode_text(text_tokens)
image_features /= image_features.norm(dim=-1, keepdim=True)
text_features /= text_features.norm(dim=-1, keepdim=True)
probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
results[model_info['name']] = dict(zip(labels, probs[0].tolist()))
return resultsRelated Repositories
Discover similar tools and frameworks used by developers
Video2X
ML-powered video upscaling, frame interpolation, and restoration with multiple backend support.
Civitai
Community platform for sharing Stable Diffusion models, embeddings, and AI generation assets.
Real-ESRGAN
PyTorch framework for blind super-resolution using GANs.
FAISS
Efficient approximate nearest neighbor search for billion-scale vectors.
ControlNet
Dual-branch architecture for conditional diffusion model control.