Pyramid-Flow  by jy0205

Video generation method based on flow matching

created 10 months ago
3,015 stars

Top 16.2% on sourcepulse

GitHubView on GitHub
Project Summary

Pyramid Flow is an ICLR 2025 paper presenting a training-efficient autoregressive video generation method based on Flow Matching. It targets researchers and developers in generative modeling, enabling the creation of high-quality, long-duration videos (up to 10 seconds at 768p, 24 FPS) using only open-source datasets, with support for image-to-video generation.

How It Works

Pyramid Flow leverages Flow Matching to interpolate between latents of different resolutions and noise levels. This approach allows for simultaneous generation and decompression, improving computational efficiency compared to models operating solely at full resolution. The framework uses a single DiT (Diffusion Transformer) and is end-to-end optimized, achieving its performance targets within a reasonable training budget.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment (python==3.8.10, pytorch==2.1.2), and install requirements (pip install -r requirements.txt).
  • Model Download: Download checkpoints from Hugging Face (rain1011/pyramid-flow-miniflux or rain1011/pyramid-flow-sd3).
  • Inference: Run python app.py for a Gradio demo or use the provided inference code (video_generation_demo.ipynb).
  • Hardware: Supports multi-GPU inference and CPU offloading for low GPU memory usage (<8GB with model.enable_sequential_cpu_offload()). MPS backend is also supported for Apple Silicon.
  • Links: Paper, Project Page, miniFLUX Model, SD3 Model, Hugging Face Demo.

Highlighted Details

  • Generates 10-second videos at 768p resolution and 24 FPS.
  • Supports image-to-video generation.
  • Achieves comparable performance to commercial models like Kling and Gen-3 Alpha on VBench.
  • Offers significant speedups with multi-GPU inference (e.g., 2.5 minutes for 5s, 768p, 24fps on 4 A100s).
  • Low GPU memory inference (<8GB) is possible via CPU offloading.

Maintenance & Community

The project is actively updated, with recent releases including a 768p miniFLUX checkpoint, training code for VAE and DiT finetuning, and improved inference capabilities. Community contributions are acknowledged, and a Hugging Face Space demo is available.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking would require clarification of the license.

Limitations & Caveats

The README mentions that the bf16 dtype is supported, but fp16 is not yet supported for the model. Training VAE and finetuning DiT require substantial hardware resources (at least 8 A100 GPUs).

Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
122 stars in the last 90 days

Explore Similar Projects

Starred by Ying Sheng Ying Sheng(Author of SGLang), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
1 more.

Open-Sora-Plan by PKU-YuanGroup

0.1%
12k
Open-source project aiming to reproduce Sora-like T2V model
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.