T2M-GPT  by Mael-zys

PyTorch code for text-to-motion generation research paper

created 2 years ago
693 stars

Top 50.0% on sourcepulse

GitHubView on GitHub
Project Summary

T2M-GPT provides a PyTorch implementation for generating human motion from textual descriptions, based on the CVPR 2023 paper "T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations." It targets researchers and developers in animation, robotics, and AI who need to synthesize realistic human movements from natural language prompts. The project offers a novel approach to motion generation by leveraging discrete representations, aiming for improved quality and controllability.

How It Works

The core of T2M-GPT involves a two-stage process. First, a VQ-VAE (Vector Quantized Variational Autoencoder) learns to discretize human motion sequences into a set of learned codes. Second, a GPT (Generative Pre-trained Transformer) model is trained on these discrete motion codes, conditioned on textual descriptions, to generate new motion sequences. This discrete representation approach allows the GPT to model motion as a sequence of tokens, similar to language, enabling more effective generation and potentially better handling of long-range dependencies.

Quick Start & Requirements

  • Installation: Use conda env create -f environment.yml and conda activate T2M-GPT.
  • Prerequisites: Python 3.8, PyTorch 1.8.1, a single V100 GPU (32GB). Requires downloading datasets (HumanML3D, KIT-ML), motion/text feature extractors, and pre-trained models. SMPL mesh rendering requires additional dependencies like osmesa, shapely, pyrender, trimesh.
  • Resources: Setup involves downloading several GBs of data and models.
  • Links: Project Page, Paper, Notebook Demo, HuggingFace Space Demo.

Highlighted Details

  • CVPR 2023 publication.
  • Supports both skeleton and SMPL mesh visualization.
  • Offers pre-trained models for immediate use.
  • Codebase builds upon and acknowledges contributions from related projects like text-to-motion, TM2T, MDM, and MotionDiffuse.

Maintenance & Community

The project was released in February 2023. There is a Hugging Face Space demo available. Links to community channels are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is presented as a PyTorch implementation for research purposes. Compatibility with commercial or closed-source applications is not specified.

Limitations & Caveats

The implementation requires a specific older version of PyTorch (1.8.1) and Python 3.8, which may pose compatibility challenges with newer environments. The evaluation process is noted to be time-consuming due to the need to generate multiple motions per text prompt.

Health Check
Last commit

10 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
29 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.