momask-codes  by EricGuo5513

Research paper implementation for 3D human motion generation via masked modeling

created 1 year ago
1,089 stars

Top 35.5% on sourcepulse

GitHubView on GitHub
Project Summary

MoMask provides an official implementation for generative masked modeling of 3D human motions, targeting researchers and developers in computer vision and animation. It enables text-to-motion generation and temporal inpainting of motion sequences, offering a novel approach to motion synthesis.

How It Works

MoMask employs a masked modeling strategy, inspired by advancements in natural language processing and vision transformers. It uses a Vector Quantized Variational Autoencoder (VQ-VAE) to discretize motion sequences into tokens. A transformer model is then trained to predict masked motion tokens, allowing for generative tasks like text-to-motion synthesis and inpainting. This approach effectively captures temporal dependencies and semantic meaning in human motion.

Quick Start & Requirements

  • Installation: Conda environment setup (environment.yml) or pip install (requirements.txt).
  • Dependencies: Python 3.7.13/3.10, PyTorch 1.7.1, CLIP.
  • Models: Download pre-trained models via prepare/download_models.sh.
  • Data: HumanML3D and KIT-ML datasets are required for training and evaluation.
  • Demos: Huggingface and Colab demos are available.
  • Resources: GPU required for training and generation. CPU support for WebUI demo noted.

Highlighted Details

  • Official implementation of CVPR 2024 paper "MoMask: Generative Masked Modeling of 3D Human Motions".
  • Supports text-to-motion generation and temporal inpainting.
  • Integrates with Blender as an add-on.
  • Provides visualization tools and retargeting guidance.

Maintenance & Community

The project is actively maintained by EricGuo5513. Links to Huggingface Demo and Colab Demo are provided.

Licensing & Compatibility

  • License: MIT LICENSE.
  • Compatibility: Depends on libraries like SMPL, SMPL-X, PyTorch3D, and datasets with their own licenses.

Limitations & Caveats

Source motion for temporal inpainting must be in a specific HumanML3D dim-263 feature vector format. Foot IK application is noted as sometimes failing.

Health Check
Last commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
3
Star History
84 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.