Discover and explore top open-source AI tools and projects—updated daily.
thu-mlEnhancing video diffusion transformers for extended temporal generation
Top 45.4% on SourcePulse
This repository provides open-source implementations for RIFLEx and UltraViCo, addressing the challenge of length extrapolation in Video Diffusion Transformers. It enables plug-and-play enhancement of existing SOTA video generation models, allowing them to produce longer videos than they were originally trained for, benefiting researchers and practitioners seeking to extend video generation capabilities without extensive retraining.
How It Works
RIFLEx achieves length extrapolation by modifying the Rotary Positional Embeddings (RoPE) with a single line of code. The core innovation involves adjusting the intrinsic frequency of RoPE to ensure that extrapolated segments remain within a single period, preventing repetition artifacts. This approach is integrated as a lightweight, plug-and-play module, offering a "free lunch" for extending video length.
Quick Start & Requirements
Installation involves creating a conda environment (python=3.10), installing dependencies via pip install -r requirements.txt, and pip install -U bitsandbytes. Example prompts and inference scripts are provided for HunyuanVideo and CogVideoX, supporting both single-GPU (via Diffusers) and multi-GPU inference. Links to fine-tuned models are available on Hugging Face.
Highlighted Details
Maintenance & Community
The project originates from Tsinghua University, with key contributors listed. Code for RIFLEx and UltraViCo is available on separate branches. Development is ongoing, with UltraImage support planned. No specific community channels (Discord/Slack) or roadmaps are detailed in the provided text.
Licensing & Compatibility
The repository states the code is "fully open source," but a specific license type (e.g., MIT, Apache 2.0) is not explicitly mentioned. This lack of explicit licensing requires clarification for commercial use or integration into closed-source projects.
Limitations & Caveats
Support for UltraImage is marked as "to do." Single-GPU inference using Diffusers with bitsandbytes may impact performance. The absence of a clearly defined open-source license is a significant adoption blocker requiring further investigation.
1 month ago
Inactive
Lightricks
SkyworkAI