Pusa-VidGen  by Yaofang-Liu

Video diffusion model with vectorized timestep adaptation

Created 5 months ago
631 stars

Top 52.5% on SourcePulse

GitHubView on GitHub
Project Summary

Pusa-VidGen introduces a novel vectorized timestep adaptation (VTA) technique for video diffusion models, enabling fine-grained temporal control and multi-task capabilities with unprecedented efficiency. Targeting researchers and developers in AI video generation, it offers significant cost and dataset reductions compared to existing state-of-the-art models.

How It Works

Pusa employs frame-level noise control via vectorized timesteps, a departure from traditional scalar timestep methods. This approach, detailed in the FVDM paper, allows for non-destructive adaptation of base models like Wan-Video and Mochi, preserving their original capabilities while enabling new functionalities such as image-to-video, start-end frame generation, video extension, and transitions without task-specific training.

Quick Start & Requirements

  • Install: Clone the repository, cd into it, and use uv for installation:
    git clone https://github.com/genmoai/models
    cd models
    pip install uv
    uv venv .venv
    source .venv/bin/activate
    uv pip install setuptools
    uv pip install -e . --no-build-isolation
    
    For Flash Attention: uv pip install -e .[flash] --no-build-isolation
  • Weights: Download from Hugging Face CLI (huggingface-cli download RaphaelLiu/Pusa-V0.5 --local-dir) or directly.
  • Prerequisites: Python, uv, and potentially multiple GPUs for example usage.
  • Docs: Pusa V1.0 README

Highlighted Details

  • Achieves 87.32% VBench-I2V score, surpassing Wan-I2V-14B.
  • Training cost: ≤ $500 vs. ≥ $100,000 for Wan-I2V-14B.
  • Dataset size: ≤ 4K samples vs. ≥ 10M samples.
  • Supports Text-to-Video, Image-to-Video, Start-End Frames, Video Extension, and Video Transition.

Maintenance & Community

The project released V1.0 in July 2025, based on Wan-Video models, with code, technical report, and dataset. V0.5, based on Mochi, was released earlier with inference scripts. The project welcomes collaboration.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Video generation quality is dependent on the base model used (e.g., Wan-T2V-14B for V1.0). The project anticipates further quality improvements with more advanced base models and welcomes community contributions.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
5
Star History
46 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
1 more.

SkyReels-V2 by SkyworkAI

3.3%
4k
Film generation model for infinite-length videos using diffusion forcing
Created 5 months ago
Updated 1 month ago
Feedback? Help us improve.