HumanTOMATO  by IDEA-Research

Text-driven whole-body motion generation

created 1 year ago
345 stars

Top 81.4% on sourcepulse

GitHubView on GitHub
Project Summary

HumanTOMATO addresses the novel task of text-driven whole-body motion generation, aiming to produce synchronized facial expressions, hand gestures, and body movements from textual descriptions. It targets researchers and developers in animation, AI, and robotics seeking to create more natural and expressive character animations. The primary benefit is the generation of high-quality, diverse, and coherent motions that are explicitly aligned with input text, overcoming limitations of prior work that often neglected fine-grained hand and face control.

How It Works

The framework employs a two-pronged approach: (1) a Holistic Hierarchical VQ-VAE (H²VQ) and Hierarchical-GPT for reconstructing and generating body and hand motions using structured codebooks, and (2) a pre-trained text-motion alignment model to ensure explicit alignment between the generated motion and the input text. This hierarchical VQ-VAE design allows for fine-grained control and reconstruction of complex motions, while the GPT model handles sequential generation, and the alignment model bridges the gap between semantic text and physical motion.

Quick Start & Requirements

  • Install/Run: Not explicitly detailed in the README, but likely involves Python environment setup.
  • Prerequisites: Depends on TMR, MLD, T2M-GPT, and HumanML3D, which may have their own dependencies. Specific hardware requirements (e.g., GPU, CUDA) are not listed.
  • Resources: Training and inference resource requirements are not specified.
  • Links: Project release: https://github.com/IDEA-Research/HumanTOMATO. OpenTMA project (text-motion alignment): https://github.com/IDEA-Research/OpenTMA.

Highlighted Details

  • Generates text-aligned whole-body motions including face, hand, and body.
  • Introduces a "tomato representation" for motion processing.
  • Accepted to ICML 2024.
  • Leverages hierarchical VQ-VAE and GPT for motion generation.

Maintenance & Community

  • Project released in October 2023.
  • OpenTMA project released in May 2024.
  • Contact emails provided for inquiries.

Licensing & Compatibility

  • Distributed under an "IDEA LICENSE".
  • Code depends on other libraries and datasets with their own licenses, which must also be followed. Commercial use implications are not detailed.

Limitations & Caveats

The README does not provide specific installation instructions, hardware requirements, or detailed setup guidance. The "IDEA LICENSE" is not a standard open-source license and may have restrictions on use or distribution.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
23 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.