Inf-CLIP by DAMO-NLP-SG

CLIP training with near-infinite batch size scaling

Created 1 year ago

275 stars

Top 94.1% on SourcePulse

Project Summary

This repository provides the official training codebase for Inf-CLIP, a novel contrastive learning scheme designed to overcome memory limitations and enable near-infinite batch size scaling. It is targeted at researchers and practitioners working with large-scale multimodal models, offering significant memory efficiency benefits for contrastive loss computations.

How It Works

Inf-CLIP implements the Inf-CL loss, which leverages techniques like Ring-CL and gradient accumulation/caching to drastically reduce memory footprint. This approach allows for training with effectively massive batch sizes, which is crucial for achieving state-of-the-art performance in contrastive learning tasks like CLIP training, without requiring prohibitive hardware resources.

Quick Start & Requirements

Installation: pip install inf_cl (remote) or pip install -e . (local).
Prerequisites: Python >= 3.8, PyTorch >= 2.0.0, CUDA >= 11.8.
Usage: Example training commands are provided for datasets like CC3M, CC12M, and LAION400M, e.g., bash scripts/cc3m/lit_vit-b-32_bs16k.sh. Evaluation scripts for ImageNet and CLIP benchmarks are also available.
Documentation: arXiv, Hugging Face Papers.

Highlighted Details

CVPR 2025 Highlight paper: "Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss".
Implements Ring-CL and Inf-CL loss functions using Triton.
Supports gradient accumulation and gradient caching for enhanced memory efficiency.
Codebase adapted from OpenCLIP, with acknowledgements to OpenAI CLIP, img2dataset, and FlashAttention variants.

Maintenance & Community

The project is actively maintained by DAMO-NLP-SG. Links to relevant social media (Twitter) and community discussions (Zhihu) are provided.

Licensing & Compatibility

Released under the Apache 2.0 license. However, the service is intended for non-commercial use ONLY, subject to the model licenses of CLIP, OpenAI's data terms, and Laion.

Limitations & Caveats

The project's usage is restricted to non-commercial purposes due to underlying data and model licenses. Specific hardware requirements (CUDA >= 11.8) and Python versions must be met.

Inf-CLIP by DAMO-NLP-SG

Explore Similar Projects

Long-RL by NVlabs

Long-VITA by VITA-MLLM

pytorch_optimizer by kozistr

segformer-pytorch by bubbliiiing

X-VLM by zengyan-97

train-CLIP by Zasder3

lightning-thunder by Lightning-AI

PyTorchTricks by lartpang

VeOmni by ByteDance-Seed

yolov4-tiny-pytorch by bubbliiiing

pytorch_image_classification by hysts

llm.c by karpathy