Video generation method based on flow matching
Top 16.2% on sourcepulse
Pyramid Flow is an ICLR 2025 paper presenting a training-efficient autoregressive video generation method based on Flow Matching. It targets researchers and developers in generative modeling, enabling the creation of high-quality, long-duration videos (up to 10 seconds at 768p, 24 FPS) using only open-source datasets, with support for image-to-video generation.
How It Works
Pyramid Flow leverages Flow Matching to interpolate between latents of different resolutions and noise levels. This approach allows for simultaneous generation and decompression, improving computational efficiency compared to models operating solely at full resolution. The framework uses a single DiT (Diffusion Transformer) and is end-to-end optimized, achieving its performance targets within a reasonable training budget.
Quick Start & Requirements
python==3.8.10
, pytorch==2.1.2
), and install requirements (pip install -r requirements.txt
).rain1011/pyramid-flow-miniflux
or rain1011/pyramid-flow-sd3
).python app.py
for a Gradio demo or use the provided inference code (video_generation_demo.ipynb
).model.enable_sequential_cpu_offload()
). MPS backend is also supported for Apple Silicon.Highlighted Details
Maintenance & Community
The project is actively updated, with recent releases including a 768p miniFLUX checkpoint, training code for VAE and DiT finetuning, and improved inference capabilities. Community contributions are acknowledged, and a Hugging Face Space demo is available.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking would require clarification of the license.
Limitations & Caveats
The README mentions that the bf16
dtype is supported, but fp16
is not yet supported for the model. Training VAE and finetuning DiT require substantial hardware resources (at least 8 A100 GPUs).
7 months ago
1 day