OpenCLIP: Open source CLIP implementation
PyTorch library for contrastive language-image pretraining.
Learn more about open_clip
OpenCLIP is a PyTorch-based library implementing contrastive language-image pretraining, a technique that learns joint embeddings of images and text by maximizing similarity between matched pairs while minimizing similarity between unmatched pairs. The implementation supports multiple vision encoders (ViT, ConvNeXt, SigLIP) and text encoders, trained on datasets ranging from LAION-400M to DataComp-1B. The library provides pretrained model checkpoints with documented zero-shot performance across 38 datasets and enables inference through simple APIs for encoding images and text into comparable embedding spaces. Common applications include zero-shot image classification, image-text retrieval, and transfer learning for vision tasks without task-specific labeled data.
Reproducible scaling studies
The project includes detailed research on scaling laws for contrastive language-image learning, with models trained across different compute budgets and dataset sizes to document how performance scales with training data and model capacity.
Multiple architecture support
Supports diverse vision encoders including Vision Transformers, ConvNeXt, and SigLIP variants, along with different training datasets, allowing users to select models optimized for specific accuracy-efficiency trade-offs.
Comprehensive model collection
Provides access to numerous pretrained models through a unified interface, with model cards on Hugging Face Hub and documented zero-shot results across 38 datasets for transparent performance comparison.
pip install open_clip_torchRemoves invalid MetaCLIP 2 L/14 checkpoint and adds MobileCLIP2 model configs with pretrained weights.
- –Remove references to the non-existent MetaCLIP 2 L/14 checkpoint if your code depends on it.
- –Use new MobileCLIP2 model configs and pretrained weights now available in this release.
Adds MetaCLIP2 WorldWide model support, fixes CoCa generation masking bug, and introduces unified text-locking across CLIP variants.
- –Use MetaCLIP2 WorldWide models now available in the model registry for improved multilingual capabilities.
- –Upgrade if using CoCa generation; mask handling was corrected to prevent inference errors.
Major release updates minimum Python to 3.10 and adds local model loading, custom tokenizers, and configurable attention blocks.
- –Upgrade to Python 3.10 or later as minimum requirement has changed.
- –Use `local-dir:` schema to load models and tokenizers from local folders instead of remote sources.
Related Repositories
Discover similar tools and frameworks used by developers
OpenHands
LLM agent framework automating development in sandboxed containers.
text-generation-webui
Feature-rich Gradio-based UI for running and interacting with LLMs locally, supporting multiple model formats and extensions.
mlx
Lazy-evaluated NumPy-like arrays optimized for Apple silicon.
openpose
Multi-person 135-keypoint anatomical detection in C++.
ByteTrack
Multi-object tracker associating low-confidence detections across frames.