motion-diffusion-model  by GuyTevet

PyTorch code for text/action-to-human-motion generation via diffusion

created 2 years ago
3,609 stars

Top 13.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides the official PyTorch implementation for the Human Motion Diffusion Model (MDM), enabling text-to-motion and action-to-motion synthesis. It targets researchers and developers in animation, robotics, and AI, offering significantly faster inference times and capabilities for motion editing and training custom models.

How It Works

MDM utilizes a diffusion model architecture, specifically a Transformer encoder-decoder or encoder-only structure, to generate human motion sequences conditioned on text prompts or example actions. The model learns to denoise motion data over a series of diffusion steps. Recent improvements include a 50-step diffusion model for a 20x speedup and caching CLIP embeddings for a 2x inference boost, achieving near real-time generation.

Quick Start & Requirements

  • Install: Create a conda environment using environment.yml, activate it, and install CLIP: pip install git+https://github.com/openai/CLIP.git.
  • Prerequisites: Python 3.7+, CUDA-capable GPU, ffmpeg.
  • Data & Models: Requires downloading specific datasets (HumanML3D, KIT, UESTC, HumanAct12) and pre-trained models via provided scripts (prepare/download_*.sh).
  • Docs: Webpage

Highlighted Details

  • Achieves ~0.4 sec/sample inference with 50 diffusion steps.
  • Supports text-to-motion, action-to-motion, and unconstrained motion generation.
  • Includes features for motion editing (in-between, upper-body) and text-conditioned editing.
  • Integrates DiP (Ultra-fast Text-to-motion) and CLoSD (Closing the Loop between Simulation and Diffusion) for advanced control.

Maintenance & Community

The project is actively maintained, with recent updates including DistilBERT text encoder support, dataset caching, and WandB integration. Follow the project on GitHub.

Licensing & Compatibility

Distributed under an MIT LICENSE. However, users must adhere to the licenses of its dependencies (CLIP, SMPL, PyTorch3D) and datasets. Commercial use is permitted, provided all underlying licenses are respected.

Limitations & Caveats

The setup requires downloading multiple large datasets and pre-trained models, which can be time-consuming. While performance is significantly improved, some advanced features like motion editing require the "full data" which includes motion capture data, not just text.

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
3
Star History
144 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.