open_clip  by mlfoundations

OpenCLIP: open-source CLIP implementation for vision-language representation learning

created 4 years ago
12,297 stars

Top 4.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an open-source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training) model, enabling researchers and developers to train and utilize powerful vision-language models. It offers a comprehensive suite of tools for training, fine-tuning, and evaluating CLIP-style models on large datasets, with pre-trained models achieving state-of-the-art zero-shot accuracy on benchmarks like ImageNet.

How It Works

OpenCLIP implements the contrastive language-image pre-training objective, learning to align image and text embeddings. It supports various vision backbones (e.g., ViT, ConvNeXt, SigLIP) and text encoders, allowing for flexible model architectures. The codebase is optimized for large-scale distributed training, featuring efficient data loading (WebDataset), gradient accumulation, and mixed-precision training.

Quick Start & Requirements

  • Install: pip install open_clip_torch
  • Requirements: PyTorch, timm (latest recommended), transformers (if using transformer tokenizers). GPU with CUDA is highly recommended for training and inference.
  • Usage example and pretrained model loading details are available in the README.

Highlighted Details

  • Offers a wide range of pre-trained models with varying architectures and training datasets (LAION-2B, DataComp-1B), achieving high zero-shot ImageNet accuracy.
  • Supports training of CoCa (Contrastive Captioner) models for generative tasks.
  • Includes robust distributed training capabilities, tested up to 1024 A100 GPUs, with native SLURM support.
  • Features advanced training techniques like patch dropout for faster training and int8 support for inference/training speedups.

Maintenance & Community

  • Led by prominent researchers in the field (Ross Wightman, Romain Beaumont, etc.).
  • Acknowledges contributions from various institutions and individuals.
  • Encourages community contributions via issues and pull requests.

Licensing & Compatibility

  • The repository is primarily licensed under the MIT License, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

  • Some older checkpoints may use QuickGELU, which is less efficient than native torch.nn.GELU; newer models default to nn.GELU.
  • Beta support for int8 training is available, with potential for further optimization.
Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
10
Issues (30d)
3
Star History
713 stars in the last 90 days

Explore Similar Projects

Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
created 3 years ago
updated 7 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), and
3 more.

pixel-cnn by openai

0.1%
2k
TensorFlow implementation for PixelCNN++ research paper
created 9 years ago
updated 5 years ago
Feedback? Help us improve.