Research paper implementation for 3D human motion generation via masked modeling
Top 35.5% on sourcepulse
MoMask provides an official implementation for generative masked modeling of 3D human motions, targeting researchers and developers in computer vision and animation. It enables text-to-motion generation and temporal inpainting of motion sequences, offering a novel approach to motion synthesis.
How It Works
MoMask employs a masked modeling strategy, inspired by advancements in natural language processing and vision transformers. It uses a Vector Quantized Variational Autoencoder (VQ-VAE) to discretize motion sequences into tokens. A transformer model is then trained to predict masked motion tokens, allowing for generative tasks like text-to-motion synthesis and inpainting. This approach effectively captures temporal dependencies and semantic meaning in human motion.
Quick Start & Requirements
environment.yml
) or pip install (requirements.txt
).prepare/download_models.sh
.Highlighted Details
Maintenance & Community
The project is actively maintained by EricGuo5513. Links to Huggingface Demo and Colab Demo are provided.
Licensing & Compatibility
Limitations & Caveats
Source motion for temporal inpainting must be in a specific HumanML3D dim-263 feature vector format. Foot IK application is noted as sometimes failing.
10 months ago
1 day