MotionGPT  by OpenMotionLab

Motion-language model for generating human motion and text descriptions

created 2 years ago
1,729 stars

Top 25.3% on sourcepulse

GitHubView on GitHub
Project Summary

MotionGPT is a unified motion-language generation model designed for researchers and developers working with human motion data. It addresses the challenge of modeling and generating both human motion and natural language descriptions within a single framework, enabling tasks like text-to-motion generation, motion captioning, and motion prediction.

How It Works

MotionGPT treats human motion as a form of language by discretizing 3D motion into "motion tokens" using vector quantization. This "motion vocabulary" is then used in conjunction with text tokens for language modeling. The model leverages a T5 encoder-decoder architecture, pre-trained on a mixture of motion-language data and fine-tuned on prompt-based question-and-answer tasks. This approach allows it to capture semantic couplings between motion and language, benefiting from the generative capabilities of large language models.

Quick Start & Requirements

  • Installation: Requires Python 3.10 and PyTorch 2.0. Install dependencies via pip install -r requirements.txt and download necessary models/data using provided bash scripts (prepare/download_smpl_model.sh, prepare/prepare_t5.sh, prepare/download_pretrained_models.sh).
  • Dependencies: SMPL models, T5 models, and specific evaluators for text-to-motion tasks.
  • Demo: A web UI can be launched with python app.py. Batch processing is available via python demo.py.
  • Resources: Setup involves downloading several GBs of data and models. Training requires significant computational resources.

Highlighted Details

  • Achieves state-of-the-art performance on multiple motion tasks, outperforming models like MDM and T2M-GPT in several metrics.
  • Demonstrates zero-shot capabilities, understanding unseen words in text prompts.
  • Unified model for text-to-motion, motion-to-text, motion prediction, and motion in-between tasks.
  • Utilizes a T5-770M backbone, chosen for its effectiveness in multi-modal tasks.

Maintenance & Community

The project is associated with NeurIPS 2023. Links to HuggingFace demos and the arXiv paper are provided. Further community interaction channels are not explicitly mentioned in the README.

Licensing & Compatibility

The code is distributed under an MIT License. However, it depends on libraries and datasets (SMPL, SMPL-X, PyTorch3D) which have their own licenses that must also be followed. Commercial use may be restricted by these underlying licenses.

Limitations & Caveats

MotionGPT struggles with generating unseen motions (e.g., gymnastics) even if it understands the text. The model's performance is limited by the size of available motion datasets (HumanML3D, KIT), which are significantly smaller than typical language datasets. VQ-based methods are less suitable for fine-grained body part editing compared to diffusion models.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
103 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.