metaquery  by facebookresearch

MetaQueries enable multimodal transfer learning

Created 4 months ago
261 stars

Top 97.4% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MetaQuery enables the creation of state-of-the-art unified multimodal understanding and generation models. It allows researchers and engineers to achieve high performance by fine-tuning existing models, unlocking novel capabilities such as visual association and logo design. The project facilitates advanced AI systems that seamlessly integrate and process information across modalities.

How It Works

The core innovation uses "MetaQueries" for efficient transfer learning between modalities with a frozen Multimodal Large Language Model (MLLM). This approach simplifies training, making it comparable to fine-tuning a diffusion model. The project leverages a curated 2.4 million instruction tuning dataset, MetaQuery-Instruct-2.4M, derived from web corpora and MLLM generation. This dataset is key to achieving impressive zero-shot generation performance and enabling capabilities beyond simple content replication.

Quick Start & Requirements

Installation uses Conda: conda env create -f environment.yml then conda activate metaquery. Training: torchrun --nproc-per-node=8 train.py --run_name test --config_file llavaov0p5_sana.yaml --base_dir /path/to/metaquery. For text-to-image pretraining, cc12m is supported, but paper-level performance may require code modifications for datasets like BLIP3o. A demo runs via python app.py --checkpoint_path /path/to/checkpoint.

Highlighted Details

MetaQuery demonstrates impressive zero-shot subject-driven generation and unlocks novel creative capabilities like visual association and logo design. Benchmarks show state-of-the-art performance across multimodal tasks (MME-P, MMB, SEED, MMMU, FID), particularly with larger base models like Qwen2.5-VL 7B.

Maintenance & Community

No specific details regarding maintenance, community channels, or contributors were found in the provided README snippet.

Licensing & Compatibility

The MetaQuery dataset is licensed under CC-by-NC, ODC-BY, and Common Crawl terms. The CC-by-NC license strictly prohibits commercial use. Users must also be aware of potential legal obligations from third-party content.

Limitations & Caveats

The CC-by-NC license restricts use to non-commercial applications. Optimal performance may require code modifications for datasets beyond cc12m. The 2025 citation year suggests a very recent or upcoming publication.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
21 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

NExT-GPT by NExT-GPT

0.1%
4k
Any-to-any multimodal LLM research paper
Created 2 years ago
Updated 5 months ago
Feedback? Help us improve.