piFlow  by Lakonik

Policy-based flow models for fast, high-quality image generation

Created 3 months ago
256 stars

Top 98.7% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

pi-Flow presents a novel policy-based flow model framework designed for efficient, few-step generative tasks, targeting researchers and practitioners in generative AI. It accelerates image generation by outputting a fast policy that guides ODE substeps, enabling high-quality, diverse outputs with minimal inference steps while maintaining faithfulness to teacher models.

How It Works

The core innovation is pi-Flow's policy-based approach, where the network predicts a policy rather than a direct denoised state. This policy orchestrates multiple ODE substeps for generation. It employs policy-based imitation distillation (pi-ID), a simplified training method using only an L2 loss against a teacher model, eschewing complex techniques like JVPs or GANs. This design effectively balances quality and diversity, excels at fine-grained texture generation, and scales to large text-to-image models.

Quick Start & Requirements

Installation requires cloning the repository and running pip install -e . --no-build-isolation within a Python 3.10 conda environment. Key prerequisites include PyTorch 2.6 (specific version noted), Linux OS (Ubuntu 20+), and ninja. Accessing FLUX models necessitates huggingface-cli login. Official demos are available on HuggingFace Spaces for pi-Qwen, pi-FLUX, and pi-FLUX.2.

Highlighted Details

  • Enables 4-step generation for Qwen-Image and FLUX models, with elastic inference for Qwen-Image and pi-FLUX.2.
  • Scales effectively from ImageNet DiT to 20-billion-parameter models like Qwen-Image.
  • Built on the high-performance LakonLab codebase, featuring optimized distributed training (DDP, FSDP, FSDP2), mixed precision, and weight tying.
  • Supports advanced flow solvers and flexible storage backends (local, S3, HuggingFace, HTTP/S).
  • Demonstrates superior performance in generating fine-grained texture details and mitigating the quality-diversity trade-off.

Maintenance & Community

The project is associated with authors from Stanford University and Adobe Research. No specific community channels (e.g., Discord, Slack) or roadmap details are provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README, which presents a significant ambiguity for adoption, particularly for commercial use. Windows compatibility is untested.

Limitations & Caveats

The primary limitation is the unstated license, posing a barrier to commercial adoption. Windows support is not guaranteed due to lack of testing. The specified PyTorch 2.6 version may require a specific or future environment setup.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
4
Star History
23 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

RPG-DiffusionMaster by YangLing0818

0%
2k
Training-free paradigm for text-to-image generation/editing
Created 2 years ago
Updated 1 year ago
Starred by Benjamin Bolte Benjamin Bolte(Cofounder of K-Scale Labs), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
10 more.

consistency_models by openai

0.0%
6k
PyTorch code for consistency models research paper
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.