peract by peract

Robotics agent for language-conditioned manipulation tasks

Created 3 years ago

472 stars

Top 64.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Eric Jang

VP AI at 1X

Project Summary

Perceiver-Actor (PerAct) is an end-to-end behavior cloning agent designed for multi-task robotic manipulation, conditioned on natural language instructions. It leverages a Transformer architecture that processes 3D voxel patches, enabling it to learn complex manipulation tasks from a limited number of demonstrations. This approach is particularly beneficial for researchers and engineers aiming to develop versatile robotic systems capable of understanding and executing diverse commands.

How It Works

PerAct utilizes a Transformer model that processes 3D voxel grids representing the robot's environment. It employs a Perceiver-like architecture to efficiently handle high-dimensional inputs by using a small set of latent queries that interact with the voxel features via cross-attention. This allows the model to scale to complex scenes while maintaining computational tractability. The language instruction is encoded and fused with the visual features, guiding the agent's manipulation strategy.

Quick Start & Requirements

Installation: Requires Python 3.8, PyRep (with CoppeliaSim 4.1), and specific forks of RLBench and YARR. Installation involves cloning repositories, setting environment variables, and installing dependencies via pip.
Prerequisites: CoppeliaSim simulator, CUDA-compatible GPU (P100 recommended for training), Ubuntu 16.04/18.04.
Resources: Training requires significant compute (8x P100 GPUs recommended for 600K iterations), and datasets can be up to 116GB.
Guides: Colab Tutorial, Installation, Quickstart, Checkpoints, Data Generation, Training & Evaluation documentation are available.

Highlighted Details

Learns a wide variety of language-conditioned manipulation tasks.
Exploits 3D voxel structure for policy learning.
Achieves state-of-the-art results on RLBench benchmarks.
Supports multi-task learning with 18 diverse manipulation tasks.

Maintenance & Community

The project is associated with Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Updates are posted on peract.github.io. The primary community interaction point is the issue tracker on GitHub.

Licensing & Compatibility

PerAct itself is licensed under Apache 2.0. However, it depends on other repositories with varying licenses: ARM (ARM License), PyRep (MIT), Perceiver PyTorch (MIT), LAMB Optimizer (MIT), and OpenAI CLIP (MIT). These licenses are generally permissive and allow for commercial use and closed-source linking.

Limitations & Caveats

The code quality is described as "Desperate grad student." Some tasks, like push_buttons, may be unsolvable due to the lack of memory. The provided test sets are small, and data generation can be slow if not parallelized. Modifications to the YARR repository are noted as "a total mess." The LAMB optimizer implementation may have issues.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days