VoxPoser by huangwl18

Robotic manipulation method using language models

Created 2 years ago

773 stars

Top 45.3% on SourcePulse

Project Summary

VoxPoser enables zero-shot robotic manipulation by composing 3D value maps generated by large language models (LLMs). It targets researchers and engineers seeking to leverage LLMs for complex robotic tasks without task-specific training, offering a composable and adaptable framework.

How It Works

VoxPoser utilizes Language Model Programs (LMPs) to decompose natural language instructions into a sequence of sub-tasks. For each sub-task, it generates a 3D value map representing the desirability of different spatial configurations. These value maps are then composed to create a unified plan, which a greedy planner translates into robot waypoints. This approach allows for flexible, zero-shot task synthesis by leveraging the LLM's reasoning and code generation capabilities.

Quick Start & Requirements

Install: Create a conda environment (conda create -n voxposer-env python=3.9), activate it, install PyRep and RLBench, then pip install -r requirements.txt.
Prerequisites: Requires an OpenAI API key. Best run with a display; headless mode instructions are available in RLBench.
Demo: Run src/playground.ipynb.
Links: [Project Page] [Paper] [Video]

Highlighted Details

Zero-shot synthesis of manipulation trajectories using LLMs.
Composable 3D value maps for task decomposition and planning.
Implemented within the RLBench environment for task diversity.
Caching of LLM outputs to reduce cost and time.

Maintenance & Community

The project is associated with Stanford University and the University of Illinois Urbana-Champaign. Core implementations are based on "Code as Policies" and "Where2Act".

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The provided codebase does not include the real-world perception pipeline (object detection, segmentation, tracking) and relies on RLBench's object masks. Adapting to real robots requires significant modifications to the environment interface and integration with custom perception and control modules.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

14 stars in the last 30 days