Instruct2Act  by OpenGVLab

Robotics framework maps instructions to actions using LLMs

created 2 years ago
366 stars

Top 78.1% on sourcepulse

GitHubView on GitHub
Project Summary

Instruct2Act is a framework designed for robotic manipulation tasks, enabling robots to understand and execute complex, multi-modal instructions. It targets researchers and developers in robotics and AI, offering a zero-shot approach that leverages large language models (LLMs) to translate natural language and visual cues into executable Python programs for robot control.

How It Works

Instruct2Act utilizes LLMs to generate Python code that orchestrates a perception-planning-action loop. The perception stage integrates foundation models like Segment Anything Model (SAM) for object localization and CLIP for classification. This modular design allows for flexibility in instruction modalities and task requirements, translating high-level commands into precise robotic actions by combining LLM reasoning with specialized foundation models.

Quick Start & Requirements

  • Installation: Install required packages via environment.yaml. Install VIMABench separately.
  • Prerequisites: Requires SAM and OpenCLIP model checkpoints. OpenAI API key is needed for generation. CUDA device option should be configured for SAM inference speed.
  • Running: Execute robotic_anything_gpt_online.py.
  • Resources: Official documentation and VIMABench instructions are available.

Highlighted Details

  • Zero-shot performance surpasses state-of-the-art learning-based policies on several tasks.
  • Supports task-specific and task-agnostic prompts, including pointing-language enhanced prompts.
  • Offers both offline and online code generation modes for robotic manipulation.
  • Evaluated on six representative tabletop manipulation tasks from VIMABench.

Maintenance & Community

The project has seen recent updates, including the release of ManipVQA and A3VLM, with checkpoints available. Real-world demo videos are shared on YouTube and Bilibili. The project acknowledges contributions from VIMABench, OpenCLIP, and SAM.

Licensing & Compatibility

The README does not explicitly state the license type or any restrictions for commercial use or closed-source linking.

Limitations & Caveats

Network instability can affect the quality of generated code. The original VIMABench movements are fast; users may need to add delays for visualization. SAM inference speed may require manual CUDA device configuration.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.