T2M-GPT by Mael-zys

PyTorch code for text-to-motion generation research paper

Created 3 years ago

737 stars

Top 47.1% on SourcePulse

Project Summary

T2M-GPT provides a PyTorch implementation for generating human motion from textual descriptions, based on the CVPR 2023 paper "T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations." It targets researchers and developers in animation, robotics, and AI who need to synthesize realistic human movements from natural language prompts. The project offers a novel approach to motion generation by leveraging discrete representations, aiming for improved quality and controllability.

How It Works

The core of T2M-GPT involves a two-stage process. First, a VQ-VAE (Vector Quantized Variational Autoencoder) learns to discretize human motion sequences into a set of learned codes. Second, a GPT (Generative Pre-trained Transformer) model is trained on these discrete motion codes, conditioned on textual descriptions, to generate new motion sequences. This discrete representation approach allows the GPT to model motion as a sequence of tokens, similar to language, enabling more effective generation and potentially better handling of long-range dependencies.

Quick Start & Requirements

Installation: Use conda env create -f environment.yml and conda activate T2M-GPT.
Prerequisites: Python 3.8, PyTorch 1.8.1, a single V100 GPU (32GB). Requires downloading datasets (HumanML3D, KIT-ML), motion/text feature extractors, and pre-trained models. SMPL mesh rendering requires additional dependencies like osmesa, shapely, pyrender, trimesh.
Resources: Setup involves downloading several GBs of data and models.
Links: Project Page, Paper, Notebook Demo, HuggingFace Space Demo.

Highlighted Details

CVPR 2023 publication.
Supports both skeleton and SMPL mesh visualization.
Offers pre-trained models for immediate use.
Codebase builds upon and acknowledges contributions from related projects like text-to-motion, TM2T, MDM, and MotionDiffuse.

Maintenance & Community

The project was released in February 2023. There is a Hugging Face Space demo available. Links to community channels are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is presented as a PyTorch implementation for research purposes. Compatibility with commercial or closed-source applications is not specified.

Limitations & Caveats

The implementation requires a specific older version of PyTorch (1.8.1) and Python 3.8, which may pose compatibility challenges with newer environments. The evaluation process is noted to be time-consuming due to the need to generate multiple motions per text prompt.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

13 stars in the last 30 days