transfusion-pytorch by lucidrains

Pytorch implementation for multimodal model research

Created 1 year ago

1,308 stars

Top 30.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Phil Wang

Prolific Research Paper Implementer

Ross Wightman

Author of timm; CV at Hugging Face

Project Summary

This repository provides a PyTorch implementation of Transfusion, a multi-modal model capable of predicting the next token and diffusing images. It targets researchers and practitioners working with multi-modal AI, offering a unified architecture for diverse data types. The key benefit is its flexibility in handling various modalities, including text, images, and audio, within a single transformer-based framework.

How It Works

The core of Transfusion is a transformer architecture that unifies different modalities. Instead of traditional diffusion, it utilizes flow matching, inspired by the success of Flux. This approach allows the model to learn continuous transformations between data representations. The implementation supports handling multiple modalities by allowing specification of different latent dimensions and default shapes for each, enabling flexible data integration.

Quick Start & Requirements

Install: pip install transfusion-pytorch
Dependencies: PyTorch. For examples, pip install .[examples] which includes diffusers, transformers, accelerate, scipy, ftfy, safetensors.
Usage examples are provided for single and multiple modalities, including image encoding/decoding.

Highlighted Details

Implements Transfusion, a multi-modal model using flow matching instead of diffusion.
Supports flexible integration of multiple modalities (text, images, audio) with configurable latent dimensions.
Includes optional modality encoders/decoders for direct image processing.
Can be pre-trained on text-only data.

Maintenance & Community

The project is associated with the original Transfusion paper by MetaAI. No specific community channels or active maintainer information are detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The included citations are for research papers, not project licensing.

Limitations & Caveats

The README does not specify a license, which may impact commercial use or integration into closed-source projects. The project appears to be a research implementation, and stability or production-readiness is not guaranteed.

Health Check

Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

32 stars in the last 30 days