HunyuanWorld-1.0  by Tencent-Hunyuan

Generate immersive 3D worlds from text or pixels

created 2 weeks ago

New!

1,322 stars

Top 31.0% on sourcepulse

GitHubView on GitHub
Project Summary

HunyuanWorld-1.0 generates immersive, explorable, and interactive 3D worlds from text or pixel inputs. It targets developers in virtual reality, game development, and interactive content creation, offering a novel approach to 3D scene generation that combines panoramic proxies with mesh export and disentangled object representations.

How It Works

The framework utilizes a semantically layered 3D mesh representation, leveraging panoramic images as 360° world proxies. This approach facilitates semantic-aware decomposition and reconstruction, enabling the generation of diverse 3D worlds. Key advantages include 360° immersive experiences, mesh export for compatibility with existing graphics pipelines, and disentangled object representations for enhanced interactivity.

Quick Start & Requirements

  • Installation: Requires Python 3.10 and PyTorch 2.5.0+cu124. Installation involves cloning the repository, setting up a Conda environment (docker/HunyuanWorld.yaml), installing dependencies including Real-ESRGAN and ZIM, and logging into Hugging Face. Draco installation is also recommended for exporting to Draco format.
  • Prerequisites: CUDA 12.4 is specified.
  • Resources: Model checkpoints are available for download (e.g., HunyuanWorld-PanoDiT-Text: 478MB).
  • Documentation: Examples and a technical report are provided. A modelviewer.html is included for local visualization.

Highlighted Details

  • Achieves state-of-the-art performance in generating coherent, explorable, and interactive 3D worlds, outperforming baselines in visual quality and geometric consistency across various benchmarks.
  • Supports both text-to-panorama and image-to-panorama generation, followed by scene generation from panoramas.
  • Offers mesh export capabilities and disentangled object representations for interactivity.
  • The open-source version is based on Flux but adaptable to other models like Stable Diffusion.

Maintenance & Community

The project was released on July 26, 2025, with a technical report. Community channels include WeChat, Xiaohongshu, X (formerly Twitter), and Discord.

Licensing & Compatibility

The repository acknowledges contributions from various open-source projects. Specific licensing for HunyuanWorld-1.0 is not explicitly stated in the README, but compatibility with commercial use or closed-source linking would require clarification.

Limitations & Caveats

The README mentions that certain scenes may fail to load in the ModelViewer due to hardware limitations. A TensorRT version and RGBD video diffusion are listed as future open-source plans, indicating these features are not yet available.

Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
13
Star History
1,459 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
7 more.

stable-dreamfusion by ashawkey

0.1%
9k
Text-to-3D model using NeRF and diffusion
created 2 years ago
updated 1 year ago
Feedback? Help us improve.