DiT-Extrapolation  by thu-ml

Enhancing video diffusion transformers for extended temporal generation

Created 10 months ago
769 stars

Top 45.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides open-source implementations for RIFLEx and UltraViCo, addressing the challenge of length extrapolation in Video Diffusion Transformers. It enables plug-and-play enhancement of existing SOTA video generation models, allowing them to produce longer videos than they were originally trained for, benefiting researchers and practitioners seeking to extend video generation capabilities without extensive retraining.

How It Works

RIFLEx achieves length extrapolation by modifying the Rotary Positional Embeddings (RoPE) with a single line of code. The core innovation involves adjusting the intrinsic frequency of RoPE to ensure that extrapolated segments remain within a single period, preventing repetition artifacts. This approach is integrated as a lightweight, plug-and-play module, offering a "free lunch" for extending video length.

Quick Start & Requirements

Installation involves creating a conda environment (python=3.10), installing dependencies via pip install -r requirements.txt, and pip install -U bitsandbytes. Example prompts and inference scripts are provided for HunyuanVideo and CogVideoX, supporting both single-GPU (via Diffusers) and multi-GPU inference. Links to fine-tuned models are available on Hugging Face.

Highlighted Details

  • RIFLEx is accepted to ICML 2025.
  • Successfully applied to SOTA models like HunyuanVideo (5s -> 11s extrapolation) and CogVideoX-5B (6s -> 12s extrapolation).
  • Integrated into official repositories (HunyuanVideo-I2V) and community projects (ComfyUI-HunyuanVideoWrapper).
  • Supports generation of longer videos (e.g., 10.5s at 1280x720 on RTX 4090 with RIFLEx).

Maintenance & Community

The project originates from Tsinghua University, with key contributors listed. Code for RIFLEx and UltraViCo is available on separate branches. Development is ongoing, with UltraImage support planned. No specific community channels (Discord/Slack) or roadmaps are detailed in the provided text.

Licensing & Compatibility

The repository states the code is "fully open source," but a specific license type (e.g., MIT, Apache 2.0) is not explicitly mentioned. This lack of explicit licensing requires clarification for commercial use or integration into closed-source projects.

Limitations & Caveats

Support for UltraImage is marked as "to do." Single-GPU inference using Diffusers with bitsandbytes may impact performance. The absence of a clearly defined open-source license is a significant adoption blocker requiring further investigation.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
13 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.