Robotic manipulation method using language models
Top 48.8% on sourcepulse
VoxPoser enables zero-shot robotic manipulation by composing 3D value maps generated by large language models (LLMs). It targets researchers and engineers seeking to leverage LLMs for complex robotic tasks without task-specific training, offering a composable and adaptable framework.
How It Works
VoxPoser utilizes Language Model Programs (LMPs) to decompose natural language instructions into a sequence of sub-tasks. For each sub-task, it generates a 3D value map representing the desirability of different spatial configurations. These value maps are then composed to create a unified plan, which a greedy planner translates into robot waypoints. This approach allows for flexible, zero-shot task synthesis by leveraging the LLM's reasoning and code generation capabilities.
Quick Start & Requirements
conda create -n voxposer-env python=3.9
), activate it, install PyRep and RLBench, then pip install -r requirements.txt
.src/playground.ipynb
.Highlighted Details
Maintenance & Community
The project is associated with Stanford University and the University of Illinois Urbana-Champaign. Core implementations are based on "Code as Policies" and "Where2Act".
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The provided codebase does not include the real-world perception pipeline (object detection, segmentation, tracking) and relies on RLBench's object masks. Adapting to real robots requires significant modifications to the environment interface and integration with custom perception and control modules.
5 months ago
Inactive