CLIP data curation and training code for MetaCLIP research paper
Top 27.5% on sourcepulse
MetaCLIP provides code, metadata, and pre-trained models for a novel approach to CLIP data curation, focusing on preserving signal and mitigating noise rather than aggressive filtering. It targets researchers and practitioners in vision-language modeling seeking more transparent and scalable data preparation methods, offering improved performance through its curated datasets.
How It Works
MetaCLIP formalizes data curation as a scalable algorithm, processing vast amounts of image-text pairs from sources like CommonCrawl. Unlike methods relying on pre-trained models for filtering, MetaCLIP's approach emphasizes preserving data signal and managing noise through techniques like substring matching and balancing. This method is designed to be simpler, more scalable, and results in a more transparent data distribution, as evidenced by its "data card" releases.
Quick Start & Requirements
transformers
(facebook/metaclip-b32-400m
) or directly integrated with OpenCLIP (ViT-B-32-quickgelu
with pretrained='metaclip_400m'
).submitit
, tqdm
, ftfy
, braceexpand
, regex
, pandas
. CUDA 11.7 is recommended for the provided conda environment.Highlighted Details
Maintenance & Community
The project is actively developed by Meta AI researchers, with recent updates including new model versions (v1.2 with Altogether synthetic captions) and accepted papers at CVPR 2024 and EMNLP 2024. Contact: Hu Xu (huxu@meta.com).
Licensing & Compatibility
The majority of MetaCLIP is licensed under CC-BY-NC (Non-Commercial). The underlying open_clip
codebase is available under its own permissive license. The CC-BY-NC license restricts commercial use.
Limitations & Caveats
The primary limitation is the CC-BY-NC license, which prohibits commercial applications. The codebase is customized from OpenCLIP and maintained separately, potentially leading to divergence.
4 days ago
1 week