Instruct2Act by OpenGVLab

Robotics framework maps instructions to actions using LLMs

Created 2 years ago

371 stars

Top 76.2% on SourcePulse

Project Summary

Instruct2Act is a framework designed for robotic manipulation tasks, enabling robots to understand and execute complex, multi-modal instructions. It targets researchers and developers in robotics and AI, offering a zero-shot approach that leverages large language models (LLMs) to translate natural language and visual cues into executable Python programs for robot control.

How It Works

Instruct2Act utilizes LLMs to generate Python code that orchestrates a perception-planning-action loop. The perception stage integrates foundation models like Segment Anything Model (SAM) for object localization and CLIP for classification. This modular design allows for flexibility in instruction modalities and task requirements, translating high-level commands into precise robotic actions by combining LLM reasoning with specialized foundation models.

Quick Start & Requirements

Installation: Install required packages via environment.yaml. Install VIMABench separately.
Prerequisites: Requires SAM and OpenCLIP model checkpoints. OpenAI API key is needed for generation. CUDA device option should be configured for SAM inference speed.
Running: Execute robotic_anything_gpt_online.py.
Resources: Official documentation and VIMABench instructions are available.

Highlighted Details

Zero-shot performance surpasses state-of-the-art learning-based policies on several tasks.
Supports task-specific and task-agnostic prompts, including pointing-language enhanced prompts.
Offers both offline and online code generation modes for robotic manipulation.
Evaluated on six representative tabletop manipulation tasks from VIMABench.

Maintenance & Community

The project has seen recent updates, including the release of ManipVQA and A3VLM, with checkpoints available. Real-world demo videos are shared on YouTube and Bilibili. The project acknowledges contributions from VIMABench, OpenCLIP, and SAM.

Licensing & Compatibility

The README does not explicitly state the license type or any restrictions for commercial use or closed-source linking.

Limitations & Caveats

Network instability can affect the quality of generated code. The original VIMABench movements are fast; users may need to add delays for visualization. SAM inference speed may require manual CUDA device configuration.

Instruct2Act by OpenGVLab

Explore Similar Projects

Large-VLM-based-VLA-for-Robotic-Manipulation by JiuTian-VL

vla0 by NVlabs

RDT2 by thu-ml

molmoact by allenai

CogACT by microsoft

tidybot by jimmyyhwu

ReKep by huangwl18

cliport by cliport

VIMA by vimalabs

VoxPoser by huangwl18

Awesome-Robotics-Foundation-Models by robotics-survey

Isaac-GR00T by NVIDIA