FastGen by NVlabs

Fast generation from diffusion models

Created 5 months ago

841 stars

Top 41.6% on SourcePulse

Project Summary

NVIDIA FastGen is a PyTorch-based framework designed to accelerate generative models through various distillation and acceleration techniques. It targets researchers and engineers working with large-scale generative AI, offering a flexible platform for developing and training models for diverse tasks like text-to-image and video generation, with a focus on speed and efficiency.

How It Works

FastGen employs a modular design, supporting multiple distillation methods such as consistency models, distribution matching, and self-forcing, alongside other acceleration techniques. This approach allows for efficient training of generative models, including those with over 10 billion parameters. The framework is built to be agnostic to specific network architectures and datasets, enabling users to integrate their own custom components.

Quick Start & Requirements

Installation: Recommended setup uses a provided Docker container. Alternatively, create a conda environment (conda create -y -n fastgen python=3.12.3 pip; conda activate fastgen), clone the repository, navigate to the directory, and run pip install -e ..
Prerequisites: Python 3.12.3. For W&B logging, an API key is optional.
Data/Models: Download CIFAR-10 and pretrained EDM models using python scripts/download_data.py --dataset cifar10. Further details on datasets and models are in fastgen/networks/README.md and fastgen/datasets/README.md.
Links: Detailed documentation is available per component, including Methods (fastgen/methods/README.md), Networks (fastgen/networks/README.md), Configs (fastgen/configs/README.md), Datasets (fastgen/datasets/README.md), Callbacks (fastgen/callbacks/README.md), and Inference (scripts/README.md).

Highlighted Details

Supports large-scale training with ≥10B parameters.
Handles multiple tasks: Text-to-Image (T2I), Image-to-Video (I2V), and Video-to-Video (V2V).
Implements a wide range of distillation methods: Consistency Models (CM, sCM, TCM, MeanFlow), Distribution Matching (DMD2, f-Distill, LADD, CausVid, Self-Forcing), Fine-Tuning (SFT, CausalSFT), and Knowledge Distillation (KD, CausalKD).
Provides implementations for various network architectures including EDM, DiT, SD 1.5, SDXL, Flux, WAN, CogVideoX, and Cosmos Predict2.

Maintenance & Community

Core contributors include Weili Nie, Julius Berner, and Chao Liu, with Arash Vahdat as the project lead. Specific community channels like Discord or Slack are not detailed in the README.

Licensing & Compatibility

This project is licensed under the Apache License 2.0, which is generally permissive for commercial use and integration into closed-source projects. Third-party licenses are documented separately.

Limitations & Caveats

Not all combinations of supported methods and networks are guaranteed to be functional. The project plans to release distilled student checkpoints for CIFAR-10 and ImageNet in the future.

FastGen by NVlabs

Explore Similar Projects

Scale-RAE by ZitengWangNYU

TCD by jabir-zheng

DC-Gen by dc-ai-projects

Awesome-Generation-Acceleration by xuyang-liu16

LakonLab by Lakonik

CCSR by csslc

hart by mit-han-lab

StableDiffusion-PyTorch by explainingai-code

verl-omni by verl-project

pytorch-stable-diffusion by hkproj

consistency_models by openai

Dreambooth-Stable-Diffusion by XavierXiao