Video diffusion model training/inference
Top 20.1% on sourcepulse
This repository implements "Self-Forcing," a technique to bridge the train-test distribution gap in autoregressive video diffusion models. It enables real-time, streaming video generation with quality comparable to state-of-the-art models, targeting researchers and developers working on advanced video synthesis.
How It Works
Self-Forcing simulates the inference process during training by performing autoregressive rollouts with KV caching. This approach directly addresses the mismatch between how models are trained and how they are used for generation, leading to more stable and efficient inference.
Quick Start & Requirements
conda create -n self_forcing python=3.10 -y
), activate it (conda activate self_forcing
), and install dependencies (pip install -r requirements.txt
, pip install flash-attn --no-build-isolation
, python setup.py develop
).python demo.py
python inference.py --config_path configs/self_forcing_dmd.yaml --output_folder videos/self_forcing_dmd --checkpoint_path checkpoints/self_forcing_dmd.pt --data_path prompts/MovieGenVideoBench_extended.txt --use_ema
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir-use-symlinks False --local-dir wan_models/Wan2.1-T2V-1.3B
and huggingface-cli download gdhe17/Self-Forcing checkpoints/self_forcing_dmd.pt --local-dir .
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
torch.compile
, TAEHV-VAE, or FP8 Linear layers, with potential quality trade-offs.3 weeks ago
Inactive