scenesmith by nepfaff

Agentic text-to-scene generation for simulation

Created 5 months ago

506 stars

Top 60.9% on SourcePulse

Project Summary

Summary

SceneSmith addresses the challenge of fully automated text-to-scene generation for indoor environments, enabling the creation of simulation-ready scenes from natural language prompts. It targets researchers and engineers in robotics and simulation who require realistic, interactive virtual environments. The primary benefit is the ability to generate complex scenes with physically plausible objects and properties directly from text, significantly reducing manual scene creation effort.

How It Works

SceneSmith employs an agentic pipeline that iteratively constructs scenes, starting with floor plan generation, followed by furniture placement, wall-mounted objects, ceiling fixtures, and finally, small manipulable objects. It supports multiple asset generation backends, including the high-fidelity SAM3D (requiring 32GB GPU memory) and the faster but lower-quality Hunyuan3D (24GB GPU memory). The system integrates with curated datasets like ArtVIP for articulated objects and AmbientCG for PBR materials, inferring contextual placement for objects beyond those explicitly mentioned in prompts. Generated objects are separable and include estimated physical properties, making them directly usable in physics simulators.

Quick Start & Requirements

Installation is managed via uv (uv sync) or Docker. Key requirements include an OpenAI API key for agent functionality. Significant GPU memory is essential: a minimum of 24GB is needed for Hunyuan3D, 32GB for SAM3D, with 45GB recommended for the full pipeline. Data setup involves downloading SAM3D checkpoints, ArtVIP assets, and AmbientCG materials, with optional support for HSSD and Objaverse datasets.

Highlighted Details

Supports multi-GPU rendering for accelerated parallel scene generation using bubblewrap.
Offers pipeline stage control, allowing users to start, stop, or resume generation at specific stages (e.g., floor_plan, furniture).
Enables branching from previous experiments using resume_from_path for iterative development and A/B testing.
Includes a robot evaluation module for task-based scene generation and validation, converting human tasks into scene prompts and assessing task completion in simulation.
Provides experimental export capabilities to MuJoCo and USD formats.

Maintenance & Community

No specific details regarding maintenance, community channels (e.g., Discord, Slack), or notable contributors were found in the provided README.

Licensing & Compatibility

The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The system has substantial GPU memory requirements, particularly for the SAM3D backend and full pipeline execution. Exporting scenes to MuJoCo and USD is noted as experimental, with potential variations in scene quality compared to the native Drake format. The PartNet-Mobility dataset, used for articulated objects, is described as having very low mesh and joint quality.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

54 stars in the last 30 days