MotionStreamer  by zju3dv

Streaming motion generation from text

Created 11 months ago
254 stars

Top 99.0% on SourcePulse

GitHubView on GitHub
Project Summary

MotionStreamer addresses the challenge of real-time, streaming motion generation by introducing a diffusion-based autoregressive model operating within a causal latent space. Aimed at researchers and engineers in computer vision and animation, this project enables efficient generation of complex human motion sequences from text descriptions, building upon a novel 272-dimensional motion representation.

How It Works

The core innovation lies in a diffusion-based autoregressive model designed for causal latent spaces, enabling sequential motion generation. It leverages a specialized 272-dimensional motion representation and requires training intermediate components like a Causal Text-to-Motion Representation (TAE) model. The approach processes data in a custom streaming format derived from existing datasets like BABEL, facilitating continuous motion synthesis.

Quick Start & Requirements

Installation requires creating a Conda environment from environment.yaml and activating it. Prerequisites include Python and Conda. Extensive data preparation involves downloading processed 272-dim motion representations for HumanML3D and BABEL datasets via huggingface-cli download; this data is for academic use only. Training is multi-stage, requiring multiple GPUs for evaluators, Causal TAEs, text-to-motion models, and MotionStreamer. Links to datasets and checkpoints are available on Hugging Face.

Highlighted Details

  • Accepted to ICCV 2025.
  • Utilizes a novel 272-dimensional motion representation.
  • Features a diffusion-based autoregressive model for streaming motion generation.
  • Provides pre-trained checkpoints and demo inference scripts.

Maintenance & Community

The project appears to be a research output with an extensive list of academic authors. No specific community channels (e.g., Discord, Slack) or explicit roadmap links are provided in the README.

Licensing & Compatibility

The processed datasets (HumanML3D, BABEL) are explicitly stated as "solely for academic purposes." The README also directs users to read the AMASS License, suggesting potential restrictions on data usage. The software license for the code itself is not specified.

Limitations & Caveats

The README lists "complete code for MotionStreamer" as a TODO, indicating potential incompleteness. Processed data is restricted to academic use, limiting commercial applications. Setup and training are complex, demanding multiple GPUs and detailed data preparation. A clear software license for the code is not provided.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
9 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.