Text-driven whole-body motion generation
Top 81.4% on sourcepulse
HumanTOMATO addresses the novel task of text-driven whole-body motion generation, aiming to produce synchronized facial expressions, hand gestures, and body movements from textual descriptions. It targets researchers and developers in animation, AI, and robotics seeking to create more natural and expressive character animations. The primary benefit is the generation of high-quality, diverse, and coherent motions that are explicitly aligned with input text, overcoming limitations of prior work that often neglected fine-grained hand and face control.
How It Works
The framework employs a two-pronged approach: (1) a Holistic Hierarchical VQ-VAE (H²VQ) and Hierarchical-GPT for reconstructing and generating body and hand motions using structured codebooks, and (2) a pre-trained text-motion alignment model to ensure explicit alignment between the generated motion and the input text. This hierarchical VQ-VAE design allows for fine-grained control and reconstruction of complex motions, while the GPT model handles sequential generation, and the alignment model bridges the gap between semantic text and physical motion.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README does not provide specific installation instructions, hardware requirements, or detailed setup guidance. The "IDEA LICENSE" is not a standard open-source license and may have restrictions on use or distribution.
1 year ago
1 day