Discover and explore top open-source AI tools and projects—updated daily.
nepfaffAgentic text-to-scene generation for simulation
Top 79.0% on SourcePulse
Summary
SceneSmith addresses the challenge of fully automated text-to-scene generation for indoor environments, enabling the creation of simulation-ready scenes from natural language prompts. It targets researchers and engineers in robotics and simulation who require realistic, interactive virtual environments. The primary benefit is the ability to generate complex scenes with physically plausible objects and properties directly from text, significantly reducing manual scene creation effort.
How It Works
SceneSmith employs an agentic pipeline that iteratively constructs scenes, starting with floor plan generation, followed by furniture placement, wall-mounted objects, ceiling fixtures, and finally, small manipulable objects. It supports multiple asset generation backends, including the high-fidelity SAM3D (requiring 32GB GPU memory) and the faster but lower-quality Hunyuan3D (24GB GPU memory). The system integrates with curated datasets like ArtVIP for articulated objects and AmbientCG for PBR materials, inferring contextual placement for objects beyond those explicitly mentioned in prompts. Generated objects are separable and include estimated physical properties, making them directly usable in physics simulators.
Quick Start & Requirements
Installation is managed via uv (uv sync) or Docker. Key requirements include an OpenAI API key for agent functionality. Significant GPU memory is essential: a minimum of 24GB is needed for Hunyuan3D, 32GB for SAM3D, with 45GB recommended for the full pipeline. Data setup involves downloading SAM3D checkpoints, ArtVIP assets, and AmbientCG materials, with optional support for HSSD and Objaverse datasets.
Highlighted Details
bubblewrap.floor_plan, furniture).resume_from_path for iterative development and A/B testing.Maintenance & Community
No specific details regarding maintenance, community channels (e.g., Discord, Slack), or notable contributors were found in the provided README.
Licensing & Compatibility
The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
The system has substantial GPU memory requirements, particularly for the SAM3D backend and full pipeline execution. Exporting scenes to MuJoCo and USD is noted as experimental, with potential variations in scene quality compared to the native Drake format. The PartNet-Mobility dataset, used for articulated objects, is described as having very low mesh and joint quality.
4 weeks ago
Inactive