openpi by Physical-Intelligence

Robotics vision-language-action models

Created 1 year ago

9,743 stars

Top 5.2% on SourcePulse

View on GitHub

8 Experts Love This Project

Brian Ichter

Cofounder of Physical Intelligence

Jeff Hammerbacher

Cofounder of Cloudera

Forrest Iandola

Author of SqueezeNet; Research Scientist at Meta

Benjamin Bolte

Cofounder of K-Scale Labs

and 4 more!

Project Summary

This repository provides open-source Vision-Language-Action (VLA) models, specifically the $\pi_0$ (diffusion-based) and $\pi_0$-FAST (autoregressive) models, for robotics applications. It offers pre-trained checkpoints and fine-tuning examples, targeting robotics researchers and practitioners looking to adapt VLA models to their own robot platforms and tasks.

How It Works

The project offers two VLA models: $\pi_0$, a flow-based diffusion model, and $\pi_0$-FAST, an autoregressive model utilizing the FAST action tokenizer. Both are trained on extensive robot data (10k+ hours) and are designed for tasks involving visual perception, language understanding, and robotic action generation. The models can be used for inference directly or fine-tuned on custom datasets, enabling adaptation to specific robot hardware and manipulation skills.

Quick Start & Requirements

Installation: Clone the repository with submodules (git clone --recurse-submodules) and use uv for dependency management (uv sync, uv pip install -e .). Docker installation is also supported.
Prerequisites: NVIDIA GPU with >8GB VRAM for inference, >22.5GB for LoRA fine-tuning, or >70GB for full fine-tuning. Tested on Ubuntu 22.04.
Resources: Checkpoints are automatically downloaded and cached.
Docs: Examples for inference and fine-tuning are provided within the repository.

Highlighted Details

Offers both diffusion ($\pi_0$) and autoregressive ($\pi_0$-FAST) VLA models.
Pre-trained on 10k+ hours of robot data.
Provides fine-tuning scripts and examples for custom datasets (e.g., Libero, ALOHA).
Supports remote inference for off-robot GPU utilization.

Maintenance & Community

The project is published by the Physical Intelligence team. Further community interaction details (Discord/Slack, roadmap) are not explicitly detailed in the README.

Licensing & Compatibility

The README does not explicitly state the license type. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is experimental, and models are developed for specific robot platforms, with no guarantee of success when adapting to different hardware. Fine-tuning requires significant GPU memory and data preparation.

Health Check

Last Commit

2 weeks ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

493 stars in the last 30 days