Causal-Forcing  by thu-ml

Real-time, high-quality video generation framework

Created 3 weeks ago

New!

378 stars

Top 75.6% on SourcePulse

GitHubView on GitHub
Project Summary

Causal Forcing addresses high-quality, real-time interactive video generation using an autoregressive diffusion distillation approach. It targets researchers and engineers in video synthesis, offering significant improvements in visual quality and motion dynamics over prior methods like Self Forcing, enabling streaming generation on consumer hardware.

How It Works

The core innovation is "Causal Forcing," an autoregressive diffusion distillation technique that enhances visual fidelity and motion coherence. It offers both frame-wise and chunk-wise model variants, catering to different needs for expressiveness versus stability. A recent development introduces "causal consistency distillation" as a more data-efficient alternative to ODE distillation, simplifying the training pipeline by removing the need for ODE-paired data.

Quick Start & Requirements

Installation involves creating a Python 3.10 Conda environment, installing dependencies from requirements.txt, and specific packages like CLIP and flash-attn. Inference requires downloading pre-trained checkpoints for frame-wise or chunk-wise models via Hugging Face CLI. Training involves multiple stages, including autoregressive diffusion, ODE initialization (or the newer Causal CD), and DMD, often requiring distributed training setups (torchrun) and substantial datasets (~300GB for ODE data). Real-time inference is demonstrated on an RTX 4090.

Highlighted Details

  • Enables real-time, streaming video generation on a single RTX 4090.
  • Achieves superior visual quality and motion dynamics compared to Self Forcing.
  • Provides distinct frame-wise (high expressiveness) and chunk-wise (high stability) model options.
  • Introduced causal consistency distillation, reducing data preparation complexity.

Maintenance & Community

The project is associated with Tsinghua University and UT Austin, with key contributors listed. A Chinese-language blog and QA are available. No explicit community channels (Slack/Discord) or roadmap links are provided.

Licensing & Compatibility

The repository's license is not specified in the README, making its terms for commercial use or closed-source integration indeterminate.

Limitations & Caveats

The recently introduced "Causal CD" is noted as an early-stage preview with potential suboptimal implementations. Training requires significant computational resources and large datasets. The absence of a clear license is a primary adoption blocker for commercial applications.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
12
Star History
382 stars in the last 24 days

Explore Similar Projects

Starred by Zhuohan Li Zhuohan Li(Coauthor of vLLM), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
2 more.

FastVideo by hao-ai-lab

0.5%
3k
Framework for accelerated video generation
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.