InstaFlow  by gnobitab

One-step image generator using Rectified Flow (ICLR 2024)

created 1 year ago
1,340 stars

Top 30.6% on sourcepulse

GitHubView on GitHub
Project Summary

InstaFlow offers ultra-fast, one-step text-to-image generation, achieving quality comparable to Stable Diffusion while drastically reducing inference time. It is designed for researchers and users seeking efficient high-quality image synthesis, leveraging the Rectified Flow technique.

How It Works

InstaFlow utilizes Rectified Flow, which trains probability flows with straight trajectories. This inherent linearity allows for direct mapping from noise to images in a single step, bypassing the iterative sampling required by traditional diffusion models. The process involves generating text-image triplets from pre-trained Stable Diffusion, applying text-conditioned reflow to straighten the generative flow, and then distilling this straight flow into a one-step model.

Quick Start & Requirements

  • Install/Run: Pre-trained models and inference code are available. A Colab notebook is provided for easy experimentation.
  • Prerequisites: Requires a GPU (A100 mentioned for benchmarks). Compatibility with LoRAs and ControlNets is supported. ONNX support is also available.
  • Resources: Inference time is approximately 0.1 seconds on an A100 GPU.
  • Links: Paper, Hugging Face Demo, Colab Notebook

Highlighted Details

  • Achieves ~90% inference time reduction compared to standard Stable Diffusion.
  • Generates images with FID scores comparable to state-of-the-art GANs like StyleGAN-T.
  • Compatible with pre-trained LoRAs and ControlNets.
  • Offers ONNX support for broader deployment.

Maintenance & Community

The project is associated with ICLR 2024 and has contributions from researchers at multiple institutions. Updates include extensions to text-to-3D and image editing, as well as new few-step models. The project relies heavily on the 🤗 Diffusers library.

Licensing & Compatibility

The project's README does not explicitly state a license. However, it mentions that training scripts are modified from Diffusers examples, which typically use Apache 2.0. Compatibility for commercial use is not specified.

Limitations & Caveats

The training process for InstaFlow-0.9B required 199 A100 GPU days, indicating a significant computational cost for training custom models. While one-step generation is highlighted, compatibility with specific pre-trained models or fine-tuning techniques may require further verification.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
41 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify) and Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers).

taesd by madebyollin

0.5%
758
Tiny AutoEncoder for Stable Diffusion latents
created 2 years ago
updated 3 months ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Travis Fischer Travis Fischer(Founder of Agentic), and
3 more.

consistency_models by openai

0.0%
6k
PyTorch code for consistency models research paper
created 2 years ago
updated 1 year ago
Feedback? Help us improve.