momask-codes by EricGuo5513

Research paper implementation for 3D human motion generation via masked modeling

Created 2 years ago

1,203 stars

Top 32.4% on SourcePulse

Project Summary

MoMask provides an official implementation for generative masked modeling of 3D human motions, targeting researchers and developers in computer vision and animation. It enables text-to-motion generation and temporal inpainting of motion sequences, offering a novel approach to motion synthesis.

How It Works

MoMask employs a masked modeling strategy, inspired by advancements in natural language processing and vision transformers. It uses a Vector Quantized Variational Autoencoder (VQ-VAE) to discretize motion sequences into tokens. A transformer model is then trained to predict masked motion tokens, allowing for generative tasks like text-to-motion synthesis and inpainting. This approach effectively captures temporal dependencies and semantic meaning in human motion.

Quick Start & Requirements

Installation: Conda environment setup (environment.yml) or pip install (requirements.txt).
Dependencies: Python 3.7.13/3.10, PyTorch 1.7.1, CLIP.
Models: Download pre-trained models via prepare/download_models.sh.
Data: HumanML3D and KIT-ML datasets are required for training and evaluation.
Demos: Huggingface and Colab demos are available.
Resources: GPU required for training and generation. CPU support for WebUI demo noted.

Highlighted Details

Official implementation of CVPR 2024 paper "MoMask: Generative Masked Modeling of 3D Human Motions".
Supports text-to-motion generation and temporal inpainting.
Integrates with Blender as an add-on.
Provides visualization tools and retargeting guidance.

Maintenance & Community

The project is actively maintained by EricGuo5513. Links to Huggingface Demo and Colab Demo are provided.

Licensing & Compatibility

License: MIT LICENSE.
Compatibility: Depends on libraries like SMPL, SMPL-X, PyTorch3D, and datasets with their own licenses.

Limitations & Caveats

Source motion for temporal inpainting must be in a specific HumanML3D dim-263 feature vector format. Foot IK application is noted as sometimes failing.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

31 stars in the last 30 days