Research paper for trajectory-oriented video generation using diffusion transformers
Top 33.5% on sourcepulse
Tora is a framework for trajectory-oriented diffusion transformer-based video generation, enabling concurrent control over textual, visual, and motion conditions. It targets researchers and developers in AI video generation seeking precise control over video dynamics and physical movement simulation.
How It Works
Tora integrates a Trajectory Extractor (TE) and a Motion-guidance Fuser (MGF) with a Diffusion Transformer (DiT) architecture. The TE encodes arbitrary trajectories into hierarchical spacetime motion patches using a 3D video compression network. The MGF then fuses these motion patches into DiT blocks, facilitating the generation of videos that adhere to specified trajectories, offering control over duration, aspect ratio, and resolution.
Quick Start & Requirements
pip install -e .
within the modules/SwissArmyTransformer
directory and pip install -r requirements.txt
in the sat
directory.Tora/sat/ckpts
. Note that Tora weights require adherence to the CogVideoX License.Highlighted Details
Maintenance & Community
The project is actively updated, with recent releases including Image-to-Video functionality, diffusers integration, and training code. It acknowledges contributions from CogVideo, Open-Sora, and MotionCtrl.
Licensing & Compatibility
Model weights require adherence to the CogVideoX License. The project's licensing for code is not explicitly stated in the README, but its reliance on CogVideoX suggests potential commercial use restrictions.
Limitations & Caveats
The initial release (CogVideoX version) is for academic research purposes only, with a statement indicating commercial plans may restrict full open-sourcing. Text prompts are recommended to be enhanced by GPT-4 for optimal results.
3 weeks ago
1 day