Discover and explore top open-source AI tools and projects—updated daily.
Tencent-HunyuanText-to-3D human motion generation models
New!
Top 24.7% on SourcePulse
Summary
HY-Motion 1.0 generates 3D human motion from text prompts, integrating into 3D animation pipelines. It scales Diffusion Transformer (DiT) and Flow Matching models to billion-parameter levels, delivering state-of-the-art instruction-following and motion quality, surpassing existing open-source alternatives.
How It Works
This project utilizes Diffusion Transformer (DiT) and Flow Matching architectures for text-to-3D motion generation. Its key innovation is scaling these models to billion-parameter size, significantly improving instruction understanding and motion fidelity. Training involves three stages: extensive large-scale pre-training, high-quality fine-tuning, and reinforcement learning refinement for enhanced naturalness and instruction adherence.
Quick Start & Requirements
Installation requires cloning the repo, installing PyTorch, ensuring git-lfs is present (git lfs pull), and running pip install -r requirements.txt. The 1.0B parameter model needs 26GB VRAM minimum; the 0.46B Lite version requires 24GB. Lower VRAM usage is possible via --num_seeds=1, prompts <30 words, and motion length <5 seconds. Prompt engineering features can be disabled (DISABLE_PROMPT_ENGINEERING=True) to reduce VRAM. Local inference scripts and a Gradio app are available.
Highlighted Details
Maintenance & Community
The README acknowledges contributions from numerous open-source projects. No specific community channels, roadmaps, or direct contributor details beyond library acknowledgements are provided.
Licensing & Compatibility
No explicit software license is stated in the README. While the project utilizes and acknowledges many open-source libraries, the specific terms for HY-Motion 1.0's use, modification, and distribution are undefined, potentially impacting commercial adoption.
Limitations & Caveats
Unsupported features include non-humanoid characters, subjective visual attributes (emotions, clothing), environment/camera details, and multi-person interactions. Special modes like seamless loops are also excluded. English prompts are recommended; other languages require a separate prompter module.
1 week ago
Inactive
GuyTevet