FastGen  by NVlabs

Fast generation from diffusion models

Created 1 month ago
594 stars

Top 54.9% on SourcePulse

GitHubView on GitHub
Project Summary

NVIDIA FastGen is a PyTorch-based framework designed to accelerate generative models through various distillation and acceleration techniques. It targets researchers and engineers working with large-scale generative AI, offering a flexible platform for developing and training models for diverse tasks like text-to-image and video generation, with a focus on speed and efficiency.

How It Works

FastGen employs a modular design, supporting multiple distillation methods such as consistency models, distribution matching, and self-forcing, alongside other acceleration techniques. This approach allows for efficient training of generative models, including those with over 10 billion parameters. The framework is built to be agnostic to specific network architectures and datasets, enabling users to integrate their own custom components.

Quick Start & Requirements

  • Installation: Recommended setup uses a provided Docker container. Alternatively, create a conda environment (conda create -y -n fastgen python=3.12.3 pip; conda activate fastgen), clone the repository, navigate to the directory, and run pip install -e ..
  • Prerequisites: Python 3.12.3. For W&B logging, an API key is optional.
  • Data/Models: Download CIFAR-10 and pretrained EDM models using python scripts/download_data.py --dataset cifar10. Further details on datasets and models are in fastgen/networks/README.md and fastgen/datasets/README.md.
  • Links: Detailed documentation is available per component, including Methods (fastgen/methods/README.md), Networks (fastgen/networks/README.md), Configs (fastgen/configs/README.md), Datasets (fastgen/datasets/README.md), Callbacks (fastgen/callbacks/README.md), and Inference (scripts/README.md).

Highlighted Details

  • Supports large-scale training with ≥10B parameters.
  • Handles multiple tasks: Text-to-Image (T2I), Image-to-Video (I2V), and Video-to-Video (V2V).
  • Implements a wide range of distillation methods: Consistency Models (CM, sCM, TCM, MeanFlow), Distribution Matching (DMD2, f-Distill, LADD, CausVid, Self-Forcing), Fine-Tuning (SFT, CausalSFT), and Knowledge Distillation (KD, CausalKD).
  • Provides implementations for various network architectures including EDM, DiT, SD 1.5, SDXL, Flux, WAN, CogVideoX, and Cosmos Predict2.

Maintenance & Community

Core contributors include Weili Nie, Julius Berner, and Chao Liu, with Arash Vahdat as the project lead. Specific community channels like Discord or Slack are not detailed in the README.

Licensing & Compatibility

This project is licensed under the Apache License 2.0, which is generally permissive for commercial use and integration into closed-source projects. Third-party licenses are documented separately.

Limitations & Caveats

Not all combinations of supported methods and networks are guaranteed to be functional. The project plans to release distilled student checkpoints for CIFAR-10 and ImageNet in the future.

Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
7
Star History
600 stars in the last 30 days

Explore Similar Projects

Starred by Benjamin Bolte Benjamin Bolte(Cofounder of K-Scale Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
10 more.

consistency_models by openai

0.0%
6k
PyTorch code for consistency models research paper
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.