multimodal by facebookresearch

PyTorch library for multimodal multi-task model training

Created 4 years ago

1,684 stars

Top 25.0% on SourcePulse

View on GitHub

6 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Wing Lian

Founder of Axolotl AI

Jeremy Howard

Cofounder of fast.ai

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

and 2 more!

Project Summary

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal, multi-task models at scale. It provides modular building blocks, pre-trained model classes, and example scripts for researchers and engineers working with content understanding and generative multimodal tasks. The library aims to facilitate replication of SOTA models and serve as a foundation for future research.

How It Works

The library is structured around composable components, including fusion layers, loss functions, datasets, and utilities. These are used to construct common multimodal architectures like ALBEF, BLIP-2, CLIP, CoCa, DALL-E 2, FLAVA, MAE, and MDETR. The modular design allows users to mix and match components to build custom models or adapt existing ones, leveraging the broader PyTorch ecosystem.

Quick Start & Requirements

Installation: pip install torchmultimodal-nightly (Linux only for PyPI). Building from source is also supported.
Prerequisites: Python >= 3.8, PyTorch with CUDA support (specific version depends on PyTorch installation).
Resources: Requires PyTorch and potentially large datasets for training.
Links: Getting started, Examples

Highlighted Details

Supports a wide range of SOTA multimodal models including ALBEF, BLIP-2, CLIP, CoCa, DALL-E 2, FLAVA, MAE, and MDETR.
Offers modular building blocks for custom architecture construction.
Includes example scripts for training, fine-tuning, and evaluation on common multimodal tasks.
Provides pretrained weights for canonical model configurations.

Maintenance & Community

Developed by Facebook Research.
Open to community contributions via pull requests and bug reports.
CONTRIBUTING file available for guidance.

Licensing & Compatibility

BSD licensed.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The library is currently in a Beta Release, indicating potential for ongoing changes and API instability. PyPI installation is limited to Linux platforms.

Health Check

Last Commit

6 days ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days