DiT-Extrapolation by thu-ml

Enhancing video diffusion transformers for extended temporal generation

Created 1 year ago

785 stars

Top 44.7% on SourcePulse

Project Summary

This repository provides open-source implementations for RIFLEx and UltraViCo, addressing the challenge of length extrapolation in Video Diffusion Transformers. It enables plug-and-play enhancement of existing SOTA video generation models, allowing them to produce longer videos than they were originally trained for, benefiting researchers and practitioners seeking to extend video generation capabilities without extensive retraining.

How It Works

RIFLEx achieves length extrapolation by modifying the Rotary Positional Embeddings (RoPE) with a single line of code. The core innovation involves adjusting the intrinsic frequency of RoPE to ensure that extrapolated segments remain within a single period, preventing repetition artifacts. This approach is integrated as a lightweight, plug-and-play module, offering a "free lunch" for extending video length.

Quick Start & Requirements

Installation involves creating a conda environment (python=3.10), installing dependencies via pip install -r requirements.txt, and pip install -U bitsandbytes. Example prompts and inference scripts are provided for HunyuanVideo and CogVideoX, supporting both single-GPU (via Diffusers) and multi-GPU inference. Links to fine-tuned models are available on Hugging Face.

Highlighted Details

RIFLEx is accepted to ICML 2025.
Successfully applied to SOTA models like HunyuanVideo (5s -> 11s extrapolation) and CogVideoX-5B (6s -> 12s extrapolation).
Integrated into official repositories (HunyuanVideo-I2V) and community projects (ComfyUI-HunyuanVideoWrapper).
Supports generation of longer videos (e.g., 10.5s at 1280x720 on RTX 4090 with RIFLEx).

Maintenance & Community

The project originates from Tsinghua University, with key contributors listed. Code for RIFLEx and UltraViCo is available on separate branches. Development is ongoing, with UltraImage support planned. No specific community channels (Discord/Slack) or roadmaps are detailed in the provided text.

Licensing & Compatibility

The repository states the code is "fully open source," but a specific license type (e.g., MIT, Apache 2.0) is not explicitly mentioned. This lack of explicit licensing requires clarification for commercial use or integration into closed-source projects.

Limitations & Caveats

Support for UltraImage is marked as "to do." Single-GPU inference using Diffusers with bitsandbytes may impact performance. The absence of a clearly defined open-source license is a significant adoption blocker requiring further investigation.

DiT-Extrapolation by thu-ml

Explore Similar Projects

TATS by songweige

LongVie by Vchitect

Tora by alibaba

SD-CN-Animation by volotat

Allegro by rhymes-ai

Awesome-Video-Diffusion-Models by ChenHsing

VideoX-Fun by aigc-apps

VGen by ali-vilab

Awesome-Video-Diffusion by showlab

LTX-Video by Lightricks

SkyReels-V2 by SkyworkAI

Wan2.2 by Wan-Video