transfusion-pytorch  by lucidrains

Pytorch implementation for multimodal model research

created 11 months ago
1,183 stars

Top 33.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of Transfusion, a multi-modal model capable of predicting the next token and diffusing images. It targets researchers and practitioners working with multi-modal AI, offering a unified architecture for diverse data types. The key benefit is its flexibility in handling various modalities, including text, images, and audio, within a single transformer-based framework.

How It Works

The core of Transfusion is a transformer architecture that unifies different modalities. Instead of traditional diffusion, it utilizes flow matching, inspired by the success of Flux. This approach allows the model to learn continuous transformations between data representations. The implementation supports handling multiple modalities by allowing specification of different latent dimensions and default shapes for each, enabling flexible data integration.

Quick Start & Requirements

  • Install: pip install transfusion-pytorch
  • Dependencies: PyTorch. For examples, pip install .[examples] which includes diffusers, transformers, accelerate, scipy, ftfy, safetensors.
  • Usage examples are provided for single and multiple modalities, including image encoding/decoding.

Highlighted Details

  • Implements Transfusion, a multi-modal model using flow matching instead of diffusion.
  • Supports flexible integration of multiple modalities (text, images, audio) with configurable latent dimensions.
  • Includes optional modality encoders/decoders for direct image processing.
  • Can be pre-trained on text-only data.

Maintenance & Community

The project is associated with the original Transfusion paper by MetaAI. No specific community channels or active maintainer information are detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The included citations are for research papers, not project licensing.

Limitations & Caveats

The README does not specify a license, which may impact commercial use or integration into closed-source projects. The project appears to be a research implementation, and stability or production-readiness is not guaranteed.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
109 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Phil Wang Phil Wang(Prolific Research Paper Implementer), and
4 more.

vit-pytorch by lucidrains

0.2%
24k
PyTorch library for Vision Transformer variants and related techniques
created 4 years ago
updated 6 days ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.