3DTrajMaster  by KwaiVGI

Video generation with 3D trajectory control for multiple entities

created 8 months ago
353 stars

Top 80.0% on sourcepulse

GitHubView on GitHub
Project Summary

3DTrajMaster enables precise control over multi-entity motion in text-to-video generation by incorporating entity-specific 3D trajectories. It allows users to define complex movements, orientations, and interactions for diverse entities (humans, animals, objects) across various backgrounds, offering fine-grained control over visual attributes and scene composition.

How It Works

The core innovation is a plug-and-play 3D-motion grounded object injector. This module integrates 6 Degrees of Freedom (DoF) pose embeddings with entity prompts. It employs a gated self-attention mechanism to fuse trajectory and entity information, allowing the diffusion model to condition video generation on specific 3D spatial and temporal movements, enabling complex occlusions and rotations.

Quick Start & Requirements

  • Installation: Use conda create -n 3dtrajmaster python=3.10 and pip install -r requirements.txt.
  • Prerequisites: Python 3.10, PyTorch, and dependencies listed in requirements.txt. Requires downloading pretrained checkpoints (CogVideo-5B, LoRA, injector) and the dataset.
  • Inference: Run python 3dtrajmaster_inference.py with specified model paths, LoRA scale, and sampling steps.
  • Resources: Setup involves downloading large model weights and datasets.
  • Docs: Project page and dataset details are linked in the README.

Highlighted Details

  • Controls 6 DoF for entity location and orientation.
  • Supports diverse entities (human, animal, robot, car, abstract) and backgrounds.
  • Handles complex 3D trajectories including occlusion and continuous turns.
  • Offers fine-grained entity prompt control (hair, clothing, gender, etc.).
  • Based on CogVideoX-5B architecture.

Maintenance & Community

The project is associated with ICLR 2025 and includes authors from The Chinese University of Hong Kong and Kuaishou Technology. Links to a project page, dataset, and evaluation code are provided.

Licensing & Compatibility

The README does not explicitly state the license for the codebase or models. The dataset is released under an internal license, with some aspects still under internal license check for broader release.

Limitations & Caveats

The proprietary internal video model is not publicly released. The publicly available codebase is based on CogVideoX-5B, which may differ in performance from the internal model. The full dataset with more entities and scenes is under internal license review.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
23 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.