multimodal  by facebookresearch

PyTorch library for multimodal multi-task model training

created 3 years ago
1,635 stars

Top 26.3% on sourcepulse

GitHubView on GitHub
Project Summary

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal, multi-task models at scale. It provides modular building blocks, pre-trained model classes, and example scripts for researchers and engineers working with content understanding and generative multimodal tasks. The library aims to facilitate replication of SOTA models and serve as a foundation for future research.

How It Works

The library is structured around composable components, including fusion layers, loss functions, datasets, and utilities. These are used to construct common multimodal architectures like ALBEF, BLIP-2, CLIP, CoCa, DALL-E 2, FLAVA, MAE, and MDETR. The modular design allows users to mix and match components to build custom models or adapt existing ones, leveraging the broader PyTorch ecosystem.

Quick Start & Requirements

  • Installation: pip install torchmultimodal-nightly (Linux only for PyPI). Building from source is also supported.
  • Prerequisites: Python >= 3.8, PyTorch with CUDA support (specific version depends on PyTorch installation).
  • Resources: Requires PyTorch and potentially large datasets for training.
  • Links: Getting started, Examples

Highlighted Details

  • Supports a wide range of SOTA multimodal models including ALBEF, BLIP-2, CLIP, CoCa, DALL-E 2, FLAVA, MAE, and MDETR.
  • Offers modular building blocks for custom architecture construction.
  • Includes example scripts for training, fine-tuning, and evaluation on common multimodal tasks.
  • Provides pretrained weights for canonical model configurations.

Maintenance & Community

  • Developed by Facebook Research.
  • Open to community contributions via pull requests and bug reports.
  • CONTRIBUTING file available for guidance.

Licensing & Compatibility

  • BSD licensed.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The library is currently in a Beta Release, indicating potential for ongoing changes and API instability. PyPI installation is limited to Linux platforms.

Health Check
Last commit

5 days ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
52 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
2 more.

maestro by roboflow

0.1%
3k
CLI/SDK for fine-tuning multimodal models
created 1 year ago
updated 5 days ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Travis Fischer Travis Fischer(Founder of Agentic).

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
created 9 months ago
updated 2 weeks ago
Feedback? Help us improve.