Video generation with 3D trajectory control for multiple entities
Top 80.0% on sourcepulse
3DTrajMaster enables precise control over multi-entity motion in text-to-video generation by incorporating entity-specific 3D trajectories. It allows users to define complex movements, orientations, and interactions for diverse entities (humans, animals, objects) across various backgrounds, offering fine-grained control over visual attributes and scene composition.
How It Works
The core innovation is a plug-and-play 3D-motion grounded object injector. This module integrates 6 Degrees of Freedom (DoF) pose embeddings with entity prompts. It employs a gated self-attention mechanism to fuse trajectory and entity information, allowing the diffusion model to condition video generation on specific 3D spatial and temporal movements, enabling complex occlusions and rotations.
Quick Start & Requirements
conda create -n 3dtrajmaster python=3.10
and pip install -r requirements.txt
.requirements.txt
. Requires downloading pretrained checkpoints (CogVideo-5B, LoRA, injector) and the dataset.python 3dtrajmaster_inference.py
with specified model paths, LoRA scale, and sampling steps.Highlighted Details
Maintenance & Community
The project is associated with ICLR 2025 and includes authors from The Chinese University of Hong Kong and Kuaishou Technology. Links to a project page, dataset, and evaluation code are provided.
Licensing & Compatibility
The README does not explicitly state the license for the codebase or models. The dataset is released under an internal license, with some aspects still under internal license check for broader release.
Limitations & Caveats
The proprietary internal video model is not publicly released. The publicly available codebase is based on CogVideoX-5B, which may differ in performance from the internal model. The full dataset with more entities and scenes is under internal license review.
1 month ago
1 day