MetaCLIP by facebookresearch

CLIP data curation and training code for MetaCLIP research paper

Created 2 years ago

1,796 stars

Top 23.8% on SourcePulse

View on GitHub

4 Experts Love This Project

DevRel at Google DeepMind

Project Summary

MetaCLIP provides code, metadata, and pre-trained models for a novel approach to CLIP data curation, focusing on preserving signal and mitigating noise rather than aggressive filtering. It targets researchers and practitioners in vision-language modeling seeking more transparent and scalable data preparation methods, offering improved performance through its curated datasets.

How It Works

MetaCLIP formalizes data curation as a scalable algorithm, processing vast amounts of image-text pairs from sources like CommonCrawl. Unlike methods relying on pre-trained models for filtering, MetaCLIP's approach emphasizes preserving data signal and managing noise through techniques like substring matching and balancing. This method is designed to be simpler, more scalable, and results in a more transparent data distribution, as evidenced by its "data card" releases.

Quick Start & Requirements

Install/Run: Pre-trained models are available via Hugging Face transformers (facebook/metaclip-b32-400m) or directly integrated with OpenCLIP (ViT-B-32-quickgelu with pretrained='metaclip_400m').
Prerequisites: Python 3.10+, PyTorch, submitit, tqdm, ftfy, braceexpand, regex, pandas. CUDA 11.7 is recommended for the provided conda environment.
Resources: Training requires significant GPU resources (e.g., 64x V100 for ViT-B-32, 256x A100 for ViT-bigG).
Links: Quick Start, Pre-trained Models, Huggingface Space demo, Colab

Highlighted Details

Curates data from scratch without prior model filtering, contrasting with many open-source efforts.
Releases training data distribution via "data cards" for transparency.
Scalable algorithm capable of processing the entire CommonCrawl (300B+ pairs).
Achieves strong zero-shot ImageNet performance, with models like ViT-bigG-14 reaching 82.1% accuracy.

Maintenance & Community

The project is actively developed by Meta AI researchers, with recent updates including new model versions (v1.2 with Altogether synthetic captions) and accepted papers at CVPR 2024 and EMNLP 2024. Contact: Hu Xu (huxu@meta.com).

Licensing & Compatibility

The majority of MetaCLIP is licensed under CC-BY-NC (Non-Commercial). The underlying open_clip codebase is available under its own permissive license. The CC-BY-NC license restricts commercial use.

Limitations & Caveats

The primary limitation is the CC-BY-NC license, which prohibits commercial applications. The codebase is customized from OpenCLIP and maintained separately, potentially leading to divergence.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

26 stars in the last 30 days