WonderWorld  by KovenYu

Interactive 3D scene generation from a single image

created 1 year ago
608 stars

Top 54.7% on sourcepulse

GitHubView on GitHub
Project Summary

WonderWorld enables interactive 3D scene generation and exploration from a single input image. Targeting researchers and developers in computer graphics and AI, it facilitates the creation of large-scale, explorable 3D environments by iteratively generating new scene components based on user interaction and optional LLM-driven descriptions.

How It Works

The system leverages a diffusion-based approach for scene generation, guided by depth estimation and user input. It utilizes Pytorch3D for rendering and integrates models like Marigold for depth prediction and RepViT for scene understanding. Users interactively define new scene elements by navigating to novel viewpoints and providing textual prompts (either manually or via GPT-4), allowing for iterative expansion and refinement of the 3D environment.

Quick Start & Requirements

  • Installation: Clone the repository, create a mamba environment, and install dependencies including PyTorch (CUDA 12.4 tested), PyTorch3D, and submodules.
  • Prerequisites: CUDA-compatible GPU with 48GB VRAM is required. OpenAI API key is needed for GPT-4 integration. RepViT model download is necessary.
  • Setup: Installation of PyTorch3D can be time-consuming. Initial sky image generation for new examples takes approximately 20 minutes on an A6000 GPU.
  • Links: Website, arXiv

Highlighted Details

  • Interactive scene generation via novel viewpoint selection and prompt-based expansion.
  • Optional LLM integration (GPT-4) for automatic scene description generation.
  • Layer-wise scene generation and ability to load previously generated scenes.
  • Supports local visualization through a web browser via SSH tunneling.

Maintenance & Community

The project is associated with researchers from institutions like MIT and is linked to the authors' personal websites and Twitter. It acknowledges contributions from various open-source projects.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires a substantial 48GB of GPU memory, limiting accessibility. The installation process, particularly for PyTorch3D, can be complex and time-consuming. The license status requires clarification for commercial applications.

Health Check
Last commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
81 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.