PyTorch code for text/action-to-human-motion generation via diffusion
Top 13.6% on sourcepulse
This repository provides the official PyTorch implementation for the Human Motion Diffusion Model (MDM), enabling text-to-motion and action-to-motion synthesis. It targets researchers and developers in animation, robotics, and AI, offering significantly faster inference times and capabilities for motion editing and training custom models.
How It Works
MDM utilizes a diffusion model architecture, specifically a Transformer encoder-decoder or encoder-only structure, to generate human motion sequences conditioned on text prompts or example actions. The model learns to denoise motion data over a series of diffusion steps. Recent improvements include a 50-step diffusion model for a 20x speedup and caching CLIP embeddings for a 2x inference boost, achieving near real-time generation.
Quick Start & Requirements
environment.yml
, activate it, and install CLIP: pip install git+https://github.com/openai/CLIP.git
.prepare/download_*.sh
).Highlighted Details
Maintenance & Community
The project is actively maintained, with recent updates including DistilBERT text encoder support, dataset caching, and WandB integration. Follow the project on GitHub.
Licensing & Compatibility
Distributed under an MIT LICENSE. However, users must adhere to the licenses of its dependencies (CLIP, SMPL, PyTorch3D) and datasets. Commercial use is permitted, provided all underlying licenses are respected.
Limitations & Caveats
The setup requires downloading multiple large datasets and pre-trained models, which can be time-consuming. While performance is significantly improved, some advanced features like motion editing require the "full data" which includes motion capture data, not just text.
2 months ago
1 week