scene-language by zzyunzhi

Programmatic 3D scene generation from language and images

Created 1 year ago

255 stars

Top 98.7% on SourcePulse

Project Summary

This project introduces the "Scene Language," a novel approach to representing 3D scenes using a combination of programs, natural language, and embeddings. Aimed at researchers and developers in computer vision and graphics, it enables sophisticated text- and image-conditioned 3D scene generation, offering a powerful tool for creating and manipulating complex virtual environments.

How It Works

The core innovation lies in translating high-level scene descriptions into executable programs. It leverages large language models (LLMs), recommending Claude 3.7 Sonnet, to interpret prompts and generate scene representations. These representations can then be rendered using various engines, including Mitsuba for photorealistic outputs and Minecraft for block-based environments, facilitating a flexible pipeline from concept to 3D scene.

Quick Start & Requirements

Installation involves creating a Conda environment (python=3.12), cloning the repository, and installing the package with pip install -e .. The Minecraft renderer requires spacy and the en_core_web_md model. Users need an Anthropic API key, configured in engine/key.py, for LLM access. Links to arXiv and a project page are mentioned but not directly provided. A download link for example results, including prompts and LLM responses, is available.

Highlighted Details

Supports text-conditioned 3D generation via Mitsuba and Minecraft renderers.
Enables image-conditioned 3D generation.
Provides functionality to export hierarchical scene components as .ply meshes.
Facilitates loading exported assets into physics simulators like PyBullet.
Recognized as a CVPR 2025 Highlight.

Maintenance & Community

The project encourages users to report issues by opening GitHub issues or contacting the developers via email. No specific community channels (e.g., Discord, Slack) or roadmap details are provided in the README.

Licensing & Compatibility

The provided README text does not specify a software license. This omission requires further investigation to determine usage rights, particularly for commercial applications or integration into closed-source projects.

Limitations & Caveats

The generation pipeline is noted to be sensitive to minor prompt variations, suggesting that users should experiment with prompt phrasing for optimal results. Certain tasks and renderers featured in the associated paper are marked as "coming soon," indicating that the current codebase may not encompass the full scope of the research.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days