openpi  by Physical-Intelligence

Robotics vision-language-action models

created 9 months ago
4,147 stars

Top 12.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides open-source Vision-Language-Action (VLA) models, specifically the $\pi_0$ (diffusion-based) and $\pi_0$-FAST (autoregressive) models, for robotics applications. It offers pre-trained checkpoints and fine-tuning examples, targeting robotics researchers and practitioners looking to adapt VLA models to their own robot platforms and tasks.

How It Works

The project offers two VLA models: $\pi_0$, a flow-based diffusion model, and $\pi_0$-FAST, an autoregressive model utilizing the FAST action tokenizer. Both are trained on extensive robot data (10k+ hours) and are designed for tasks involving visual perception, language understanding, and robotic action generation. The models can be used for inference directly or fine-tuned on custom datasets, enabling adaptation to specific robot hardware and manipulation skills.

Quick Start & Requirements

  • Installation: Clone the repository with submodules (git clone --recurse-submodules) and use uv for dependency management (uv sync, uv pip install -e .). Docker installation is also supported.
  • Prerequisites: NVIDIA GPU with >8GB VRAM for inference, >22.5GB for LoRA fine-tuning, or >70GB for full fine-tuning. Tested on Ubuntu 22.04.
  • Resources: Checkpoints are automatically downloaded and cached.
  • Docs: Examples for inference and fine-tuning are provided within the repository.

Highlighted Details

  • Offers both diffusion ($\pi_0$) and autoregressive ($\pi_0$-FAST) VLA models.
  • Pre-trained on 10k+ hours of robot data.
  • Provides fine-tuning scripts and examples for custom datasets (e.g., Libero, ALOHA).
  • Supports remote inference for off-robot GPU utilization.

Maintenance & Community

The project is published by the Physical Intelligence team. Further community interaction details (Discord/Slack, roadmap) are not explicitly detailed in the README.

Licensing & Compatibility

The README does not explicitly state the license type. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is experimental, and models are developed for specific robot platforms, with no guarantee of success when adapting to different hardware. Fine-tuning requires significant GPU memory and data preparation.

Health Check
Last commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)
8
Issues (30d)
46
Star History
1,008 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.