peract  by peract

Robotics agent for language-conditioned manipulation tasks

created 2 years ago
432 stars

Top 69.9% on sourcepulse

GitHubView on GitHub
Project Summary

Perceiver-Actor (PerAct) is an end-to-end behavior cloning agent designed for multi-task robotic manipulation, conditioned on natural language instructions. It leverages a Transformer architecture that processes 3D voxel patches, enabling it to learn complex manipulation tasks from a limited number of demonstrations. This approach is particularly beneficial for researchers and engineers aiming to develop versatile robotic systems capable of understanding and executing diverse commands.

How It Works

PerAct utilizes a Transformer model that processes 3D voxel grids representing the robot's environment. It employs a Perceiver-like architecture to efficiently handle high-dimensional inputs by using a small set of latent queries that interact with the voxel features via cross-attention. This allows the model to scale to complex scenes while maintaining computational tractability. The language instruction is encoded and fused with the visual features, guiding the agent's manipulation strategy.

Quick Start & Requirements

  • Installation: Requires Python 3.8, PyRep (with CoppeliaSim 4.1), and specific forks of RLBench and YARR. Installation involves cloning repositories, setting environment variables, and installing dependencies via pip.
  • Prerequisites: CoppeliaSim simulator, CUDA-compatible GPU (P100 recommended for training), Ubuntu 16.04/18.04.
  • Resources: Training requires significant compute (8x P100 GPUs recommended for 600K iterations), and datasets can be up to 116GB.
  • Guides: Colab Tutorial, Installation, Quickstart, Checkpoints, Data Generation, Training & Evaluation documentation are available.

Highlighted Details

  • Learns a wide variety of language-conditioned manipulation tasks.
  • Exploits 3D voxel structure for policy learning.
  • Achieves state-of-the-art results on RLBench benchmarks.
  • Supports multi-task learning with 18 diverse manipulation tasks.

Maintenance & Community

The project is associated with Mohit Shridhar, Lucas Manuelli, and Dieter Fox. Updates are posted on peract.github.io. The primary community interaction point is the issue tracker on GitHub.

Licensing & Compatibility

PerAct itself is licensed under Apache 2.0. However, it depends on other repositories with varying licenses: ARM (ARM License), PyRep (MIT), Perceiver PyTorch (MIT), LAMB Optimizer (MIT), and OpenAI CLIP (MIT). These licenses are generally permissive and allow for commercial use and closed-source linking.

Limitations & Caveats

The code quality is described as "Desperate grad student." Some tasks, like push_buttons, may be unsolvable due to the lack of memory. The provided test sets are small, and data generation can be slow if not parallelized. Modifications to the YARR repository are noted as "a total mess." The LAMB optimizer implementation may have issues.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
21 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.