Inf-CLIP  by DAMO-NLP-SG

CLIP training with near-infinite batch size scaling

Created 11 months ago
267 stars

Top 95.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official training codebase for Inf-CLIP, a novel contrastive learning scheme designed to overcome memory limitations and enable near-infinite batch size scaling. It is targeted at researchers and practitioners working with large-scale multimodal models, offering significant memory efficiency benefits for contrastive loss computations.

How It Works

Inf-CLIP implements the Inf-CL loss, which leverages techniques like Ring-CL and gradient accumulation/caching to drastically reduce memory footprint. This approach allows for training with effectively massive batch sizes, which is crucial for achieving state-of-the-art performance in contrastive learning tasks like CLIP training, without requiring prohibitive hardware resources.

Quick Start & Requirements

  • Installation: pip install inf_cl (remote) or pip install -e . (local).
  • Prerequisites: Python >= 3.8, PyTorch >= 2.0.0, CUDA >= 11.8.
  • Usage: Example training commands are provided for datasets like CC3M, CC12M, and LAION400M, e.g., bash scripts/cc3m/lit_vit-b-32_bs16k.sh. Evaluation scripts for ImageNet and CLIP benchmarks are also available.
  • Documentation: arXiv, Hugging Face Papers.

Highlighted Details

  • CVPR 2025 Highlight paper: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss".
  • Implements Ring-CL and Inf-CL loss functions using Triton.
  • Supports gradient accumulation and gradient caching for enhanced memory efficiency.
  • Codebase adapted from OpenCLIP, with acknowledgements to OpenAI CLIP, img2dataset, and FlashAttention variants.

Maintenance & Community

The project is actively maintained by DAMO-NLP-SG. Links to relevant social media (Twitter) and community discussions (Zhihu) are provided.

Licensing & Compatibility

Released under the Apache 2.0 license. However, the service is intended for non-commercial use ONLY, subject to the model licenses of CLIP, OpenAI's data terms, and Laion.

Limitations & Caveats

The project's usage is restricted to non-commercial purposes due to underlying data and model licenses. Specific hardware requirements (CUDA >= 11.8) and Python versions must be met.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

VeOmni by ByteDance-Seed

3.4%
1k
Framework for scaling multimodal model training across accelerators
Created 5 months ago
Updated 3 weeks ago
Starred by Guy Gur-Ari Guy Gur-Ari(Cofounder of Augment), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
1 more.

pytorch_image_classification by hysts

0%
1k
PyTorch image classification for various datasets (CIFAR, MNIST, ImageNet)
Created 7 years ago
Updated 3 years ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Alexey Milovidov Alexey Milovidov(Cofounder of Clickhouse), and
29 more.

llm.c by karpathy

0.2%
28k
LLM training in pure C/CUDA, no PyTorch needed
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.