Discover and explore top open-source AI tools and projects—updated daily.
mlfoundationsOpenCLIP: open-source CLIP implementation for vision-language representation learning
Top 3.9% on SourcePulse
This repository provides an open-source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training) model, enabling researchers and developers to train and utilize powerful vision-language models. It offers a comprehensive suite of tools for training, fine-tuning, and evaluating CLIP-style models on large datasets, with pre-trained models achieving state-of-the-art zero-shot accuracy on benchmarks like ImageNet.
How It Works
OpenCLIP implements the contrastive language-image pre-training objective, learning to align image and text embeddings. It supports various vision backbones (e.g., ViT, ConvNeXt, SigLIP) and text encoders, allowing for flexible model architectures. The codebase is optimized for large-scale distributed training, featuring efficient data loading (WebDataset), gradient accumulation, and mixed-precision training.
Quick Start & Requirements
pip install open_clip_torchtimm (latest recommended), transformers (if using transformer tokenizers). GPU with CUDA is highly recommended for training and inference.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
QuickGELU, which is less efficient than native torch.nn.GELU; newer models default to nn.GELU.16 hours ago
1 day
facebookresearch
airsplay
EvolvingLMMs-Lab
mlfoundations
tensorflow
huggingface