Hunyuan3D-Omni  by Tencent-Hunyuan

Controllable 3D asset generation from diverse inputs

Created 1 month ago
432 stars

Top 68.7% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Hunyuan3D-Omni is a unified framework for controllable 3D asset generation, building upon the architecture of Hunyuan3D 2.1. It introduces a novel unified control encoder capable of integrating diverse control signals like point clouds, voxels, skeletons, and bounding boxes. This framework empowers users to generate 3D models with precise multi-modal conditional control, offering enhanced flexibility and specificity in asset creation for researchers and developers in the 3D AI space.

How It Works

The core innovation lies in its unified control encoder, which processes multiple conditional inputs to guide 3D generation. This multi-modal approach allows for precise control over the output, enabling users to specify desired shapes via bounding boxes, define poses through skeletal structures, or guide generation using point cloud or voxel data. This design facilitates more nuanced and targeted 3D asset creation compared to single-modality approaches.

Quick Start & Requirements

  • Installation: Requires Python 3.10. Install core dependencies with pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124 followed by pip install -r requirements.txt.
  • Prerequisites: CUDA 12.4 is recommended for PyTorch.
  • Resource Footprint: Generation requires at least 10 GB of VRAM.
  • Usage: Inference is performed via python3 inference.py --control_type <control_type>, with supported types including point, voxel, bbox, and pose. Options like --use_ema and --flashvdm are available for stability and speed.
  • Documentation: Links to related research papers are available via arXiv: https://arxiv.org/abs/2509.21245, https://arxiv.org/abs/2506.15442, https://arxiv.org/abs/2501.12202, https://arxiv.org/abs/2411.02293.

Highlighted Details

  • Supports multi-modal conditional control including bounding box, pose, point cloud, and voxel inputs.
  • Features a 3.3 billion parameter model for generation.
  • Offers optimization flags for inference speed (--flashvdm) and stability (--use_ema).

Maintenance & Community

The project acknowledges contributions from numerous open-source research efforts, indicating integration within a broader AI research ecosystem. Specific community channels or direct contributor details are not provided in the README.

Licensing & Compatibility

No licensing information is specified in the provided README content.

Limitations & Caveats

Generation requires a minimum of 10 GB of VRAM, which may be a barrier for users with less powerful hardware.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
3
Star History
57 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.