VoxPoser  by huangwl18

Robotic manipulation method using language models

created 1 year ago
719 stars

Top 48.8% on sourcepulse

GitHubView on GitHub
Project Summary

VoxPoser enables zero-shot robotic manipulation by composing 3D value maps generated by large language models (LLMs). It targets researchers and engineers seeking to leverage LLMs for complex robotic tasks without task-specific training, offering a composable and adaptable framework.

How It Works

VoxPoser utilizes Language Model Programs (LMPs) to decompose natural language instructions into a sequence of sub-tasks. For each sub-task, it generates a 3D value map representing the desirability of different spatial configurations. These value maps are then composed to create a unified plan, which a greedy planner translates into robot waypoints. This approach allows for flexible, zero-shot task synthesis by leveraging the LLM's reasoning and code generation capabilities.

Quick Start & Requirements

  • Install: Create a conda environment (conda create -n voxposer-env python=3.9), activate it, install PyRep and RLBench, then pip install -r requirements.txt.
  • Prerequisites: Requires an OpenAI API key. Best run with a display; headless mode instructions are available in RLBench.
  • Demo: Run src/playground.ipynb.
  • Links: [Project Page] [Paper] [Video]

Highlighted Details

  • Zero-shot synthesis of manipulation trajectories using LLMs.
  • Composable 3D value maps for task decomposition and planning.
  • Implemented within the RLBench environment for task diversity.
  • Caching of LLM outputs to reduce cost and time.

Maintenance & Community

The project is associated with Stanford University and the University of Illinois Urbana-Champaign. Core implementations are based on "Code as Policies" and "Where2Act".

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The provided codebase does not include the real-world perception pipeline (object detection, segmentation, tracking) and relies on RLBench's object masks. Adapting to real robots requires significant modifications to the environment interface and integration with custom perception and control modules.

Health Check
Last commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
42 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.