Navigate:
open_clip
~$OPEN0.3%

OpenCLIP: Open source CLIP implementation

PyTorch library for contrastive language-image pretraining.

LIVE RANKINGS • 12:32 PM • STEADY
OVERALL
#239
98
AI & ML
#77
16
30 DAY RANKING TREND
ovr#239
·AI#77
STARS
13.4K
FORKS
1.2K
7D STARS
+39
7D FORKS
+3
Tags:
See Repo:
Share:

Learn more about open_clip

OpenCLIP is a PyTorch-based library implementing contrastive language-image pretraining, a technique that learns joint embeddings of images and text by maximizing similarity between matched pairs while minimizing similarity between unmatched pairs. The implementation supports multiple vision encoders (ViT, ConvNeXt, SigLIP) and text encoders, trained on datasets ranging from LAION-400M to DataComp-1B. The library provides pretrained model checkpoints with documented zero-shot performance across 38 datasets and enables inference through simple APIs for encoding images and text into comparable embedding spaces. Common applications include zero-shot image classification, image-text retrieval, and transfer learning for vision tasks without task-specific labeled data.

open_clip

1

Reproducible scaling studies

The project includes detailed research on scaling laws for contrastive language-image learning, with models trained across different compute budgets and dataset sizes to document how performance scales with training data and model capacity.

2

Multiple architecture support

Supports diverse vision encoders including Vision Transformers, ConvNeXt, and SigLIP variants, along with different training datasets, allowing users to select models optimized for specific accuracy-efficiency trade-offs.

3

Comprehensive model collection

Provides access to numerous pretrained models through a unified interface, with model cards on Hugging Face Hub and documented zero-shot results across 38 datasets for transparent performance comparison.


import torch
import open_clip
from PIL import Image
import numpy as np

class CLIPClassifier:
    def __init__(self, model_configs):
        self.models = []
        for config in model_configs:
            model, _, preprocess = open_clip.create_model_and_transforms(
                config['arch'], pretrained=config['pretrained']
            )
            model.eval()
            tokenizer = open_clip.get_tokenizer(config['arch'])
            self.models.append({
                'model': model, 'preprocess': preprocess, 
                'tokenizer': tokenizer, 'name': config['name']
            })
    
    def classify_image(self, image_path, labels):
        results = {}
        image = Image.open(image_path)
        
        for model_info in self.models:
            processed_image = model_info['preprocess'](image).unsqueeze(0)
            text_tokens = model_info['tokenizer'](labels)
            
            with torch.no_grad():
                image_features = model_info['model'].encode_image(processed_image)
                text_features = model_info['model'].encode_text(text_tokens)
                
                image_features /= image_features.norm(dim=-1, keepdim=True)
                text_features /= text_features.norm(dim=-1, keepdim=True)
                
                probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
                results[model_info['name']] = dict(zip(labels, probs[0].tolist()))
        
        return results


[ EXPLORE MORE ]

Related Repositories

Discover similar tools and frameworks used by developers