multimodal  by facebookresearch

PyTorch library for multimodal multi-task model training

Created 3 years ago
1,648 stars

Top 25.6% on SourcePulse

GitHubView on GitHub
Project Summary

TorchMultimodal is a PyTorch library for training state-of-the-art multimodal, multi-task models at scale. It provides modular building blocks, pre-trained model classes, and example scripts for researchers and engineers working with content understanding and generative multimodal tasks. The library aims to facilitate replication of SOTA models and serve as a foundation for future research.

How It Works

The library is structured around composable components, including fusion layers, loss functions, datasets, and utilities. These are used to construct common multimodal architectures like ALBEF, BLIP-2, CLIP, CoCa, DALL-E 2, FLAVA, MAE, and MDETR. The modular design allows users to mix and match components to build custom models or adapt existing ones, leveraging the broader PyTorch ecosystem.

Quick Start & Requirements

  • Installation: pip install torchmultimodal-nightly (Linux only for PyPI). Building from source is also supported.
  • Prerequisites: Python >= 3.8, PyTorch with CUDA support (specific version depends on PyTorch installation).
  • Resources: Requires PyTorch and potentially large datasets for training.
  • Links: Getting started, Examples

Highlighted Details

  • Supports a wide range of SOTA multimodal models including ALBEF, BLIP-2, CLIP, CoCa, DALL-E 2, FLAVA, MAE, and MDETR.
  • Offers modular building blocks for custom architecture construction.
  • Includes example scripts for training, fine-tuning, and evaluation on common multimodal tasks.
  • Provides pretrained weights for canonical model configurations.

Maintenance & Community

  • Developed by Facebook Research.
  • Open to community contributions via pull requests and bug reports.
  • CONTRIBUTING file available for guidance.

Licensing & Compatibility

  • BSD licensed.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The library is currently in a Beta Release, indicating potential for ongoing changes and API instability. PyPI installation is limited to Linux platforms.

Health Check
Last Commit

3 days ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
6 more.

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
Created 11 months ago
Updated 2 months ago
Feedback? Help us improve.