Hunyuan3D-Omni by Tencent-Hunyuan

Controllable 3D asset generation from diverse inputs

Created 3 months ago

480 stars

Top 63.8% on SourcePulse

Project Summary

Summary

Hunyuan3D-Omni is a unified framework for controllable 3D asset generation, building upon the architecture of Hunyuan3D 2.1. It introduces a novel unified control encoder capable of integrating diverse control signals like point clouds, voxels, skeletons, and bounding boxes. This framework empowers users to generate 3D models with precise multi-modal conditional control, offering enhanced flexibility and specificity in asset creation for researchers and developers in the 3D AI space.

How It Works

The core innovation lies in its unified control encoder, which processes multiple conditional inputs to guide 3D generation. This multi-modal approach allows for precise control over the output, enabling users to specify desired shapes via bounding boxes, define poses through skeletal structures, or guide generation using point cloud or voxel data. This design facilitates more nuanced and targeted 3D asset creation compared to single-modality approaches.

Quick Start & Requirements

Installation: Requires Python 3.10. Install core dependencies with pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124 followed by pip install -r requirements.txt.
Prerequisites: CUDA 12.4 is recommended for PyTorch.
Resource Footprint: Generation requires at least 10 GB of VRAM.
Usage: Inference is performed via python3 inference.py --control_type <control_type>, with supported types including point, voxel, bbox, and pose. Options like --use_ema and --flashvdm are available for stability and speed.
Documentation: Links to related research papers are available via arXiv: https://arxiv.org/abs/2509.21245, https://arxiv.org/abs/2506.15442, https://arxiv.org/abs/2501.12202, https://arxiv.org/abs/2411.02293.

Highlighted Details

Supports multi-modal conditional control including bounding box, pose, point cloud, and voxel inputs.
Features a 3.3 billion parameter model for generation.
Offers optimization flags for inference speed (--flashvdm) and stability (--use_ema).

Maintenance & Community

The project acknowledges contributions from numerous open-source research efforts, indicating integration within a broader AI research ecosystem. Specific community channels or direct contributor details are not provided in the README.

Hunyuan3D-Omni by Tencent-Hunyuan

Explore Similar Projects

scene-language by zzyunzhi

awesome-3DGS by qqqqqqy0227

richdreamer by modelscope

EmbodiedGen by HorizonRobotics

Awesome-3D-Scene-Generation by hzxie

PhysX-Anything by ziangcao0312

GaussianDreamer by hustvl

Step1X-3D by stepfun-ai

cube by Roblox

Awesome-LLM-3D by ActiveVisionLab

ComfyUI-3D-Pack by MrForExample

shap-e by openai