Minimal implementation of rectified flow transformers, based on SD3
Top 55.1% on sourcepulse
This repository provides a minimal, self-contained implementation of Rectified Flow (RF) transformers, inspired by the SD3 approach and LLaMA-DiT architecture. It's designed for researchers and practitioners looking to understand and experiment with scalable RF models, offering simplified code for beginners and advanced features for more experienced users.
How It Works
The project implements Rectified Flow, a diffusion model variant that learns a direct mapping between data distributions by solving an ordinary differential equation (ODE). It utilizes a logit-normal time-sampling strategy for improved training efficiency and scalability, drawing architectural inspiration from LLaMA-DiT. The code is structured to be easily understandable and modifiable, separating model implementation from training logic.
Quick Start & Requirements
pip install torch torchvision pillow
then python rf.py
python rf.py --cifar
hf_transfer
(pip install hf_transfer
), download dataset via cd advanced && bash download.sh
, then run bash run.sh
.hf_transfer
for ImageNet.Highlighted Details
Maintenance & Community
The project is maintained by Simo Ryu. Further community or roadmap information is not detailed in the README.
Licensing & Compatibility
The repository does not explicitly state a license. Users should verify licensing for commercial or closed-source use.
Limitations & Caveats
The "Massive Rectified Flow" section is marked for "gigachads" and requires downloading a custom ImageNet dataset, suggesting a higher barrier to entry for advanced features. The project is presented as a minimal implementation, implying potential missing features or optimizations found in more comprehensive RF libraries.
1 year ago
1 day