MetaCLIP  by facebookresearch

CLIP data curation and training code for MetaCLIP research paper

created 1 year ago
1,538 stars

Top 27.5% on sourcepulse

GitHubView on GitHub
Project Summary

MetaCLIP provides code, metadata, and pre-trained models for a novel approach to CLIP data curation, focusing on preserving signal and mitigating noise rather than aggressive filtering. It targets researchers and practitioners in vision-language modeling seeking more transparent and scalable data preparation methods, offering improved performance through its curated datasets.

How It Works

MetaCLIP formalizes data curation as a scalable algorithm, processing vast amounts of image-text pairs from sources like CommonCrawl. Unlike methods relying on pre-trained models for filtering, MetaCLIP's approach emphasizes preserving data signal and managing noise through techniques like substring matching and balancing. This method is designed to be simpler, more scalable, and results in a more transparent data distribution, as evidenced by its "data card" releases.

Quick Start & Requirements

  • Install/Run: Pre-trained models are available via Hugging Face transformers (facebook/metaclip-b32-400m) or directly integrated with OpenCLIP (ViT-B-32-quickgelu with pretrained='metaclip_400m').
  • Prerequisites: Python 3.10+, PyTorch, submitit, tqdm, ftfy, braceexpand, regex, pandas. CUDA 11.7 is recommended for the provided conda environment.
  • Resources: Training requires significant GPU resources (e.g., 64x V100 for ViT-B-32, 256x A100 for ViT-bigG).
  • Links: Quick Start, Pre-trained Models, Huggingface Space demo, Colab

Highlighted Details

  • Curates data from scratch without prior model filtering, contrasting with many open-source efforts.
  • Releases training data distribution via "data cards" for transparency.
  • Scalable algorithm capable of processing the entire CommonCrawl (300B+ pairs).
  • Achieves strong zero-shot ImageNet performance, with models like ViT-bigG-14 reaching 82.1% accuracy.

Maintenance & Community

The project is actively developed by Meta AI researchers, with recent updates including new model versions (v1.2 with Altogether synthetic captions) and accepted papers at CVPR 2024 and EMNLP 2024. Contact: Hu Xu (huxu@meta.com).

Licensing & Compatibility

The majority of MetaCLIP is licensed under CC-BY-NC (Non-Commercial). The underlying open_clip codebase is available under its own permissive license. The CC-BY-NC license restricts commercial use.

Limitations & Caveats

The primary limitation is the CC-BY-NC license, which prohibits commercial applications. The codebase is customized from OpenCLIP and maintained separately, potentially leading to divergence.

Health Check
Last commit

4 days ago

Responsiveness

1 week

Pull Requests (30d)
3
Issues (30d)
0
Star History
160 stars in the last 90 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Chenlin Meng Chenlin Meng(Cofounder of Pika), and
4 more.

clip-retrieval by rom1504

0.3%
3k
CLIP retrieval system for semantic search
created 4 years ago
updated 1 year ago
Feedback? Help us improve.