motion-diffusion-model by GuyTevet

PyTorch code for text/action-to-human-motion generation via diffusion

Created 3 years ago

3,843 stars

Top 12.5% on SourcePulse

View on GitHub

3 Experts Love This Project

Jiaming Song

Chief Scientist at Luma AI

Omar Sanseviero

DevRel at Google DeepMind

Luis Capelo

Cofounder of Lightning AI

Project Summary

This repository provides the official PyTorch implementation for the Human Motion Diffusion Model (MDM), enabling text-to-motion and action-to-motion synthesis. It targets researchers and developers in animation, robotics, and AI, offering significantly faster inference times and capabilities for motion editing and training custom models.

How It Works

MDM utilizes a diffusion model architecture, specifically a Transformer encoder-decoder or encoder-only structure, to generate human motion sequences conditioned on text prompts or example actions. The model learns to denoise motion data over a series of diffusion steps. Recent improvements include a 50-step diffusion model for a 20x speedup and caching CLIP embeddings for a 2x inference boost, achieving near real-time generation.

Quick Start & Requirements

Install: Create a conda environment using environment.yml, activate it, and install CLIP: pip install git+https://github.com/openai/CLIP.git.
Prerequisites: Python 3.7+, CUDA-capable GPU, ffmpeg.
Data & Models: Requires downloading specific datasets (HumanML3D, KIT, UESTC, HumanAct12) and pre-trained models via provided scripts (prepare/download_*.sh).
Docs: Webpage

Highlighted Details

Achieves ~0.4 sec/sample inference with 50 diffusion steps.
Supports text-to-motion, action-to-motion, and unconstrained motion generation.
Includes features for motion editing (in-between, upper-body) and text-conditioned editing.
Integrates DiP (Ultra-fast Text-to-motion) and CLoSD (Closing the Loop between Simulation and Diffusion) for advanced control.

Maintenance & Community

The project is actively maintained, with recent updates including DistilBERT text encoder support, dataset caching, and WandB integration. Follow the project on GitHub.

Licensing & Compatibility

Distributed under an MIT LICENSE. However, users must adhere to the licenses of its dependencies (CLIP, SMPL, PyTorch3D) and datasets. Commercial use is permitted, provided all underlying licenses are respected.

Limitations & Caveats

The setup requires downloading multiple large datasets and pre-trained models, which can be time-consuming. While performance is significantly improved, some advanced features like motion editing require the "full data" which includes motion capture data, not just text.

Health Check

Last Commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

46 stars in the last 30 days