SimpleVLA-RL  by PRIME-RL

Online RL for VLA models with minimal data

Created 3 months ago
658 stars

Top 50.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository introduces SimpleVLA-RL, an approach for online Reinforcement Learning (RL) in Vision-Language-Action (VLA) models. It enables effective training with minimal data, using only simple 0/1 outcome-level rewards, making VLA model training more data-efficient and achieving performance comparable to full-trajectory Supervised Fine-Tuning (SFT). The target audience includes researchers and practitioners working with VLA models for robotics and embodied AI.

How It Works

SimpleVLA-RL leverages outcome-level 0/1 reward signals directly from simulation environments. This approach simplifies reward engineering and significantly reduces the need for extensive, high-quality trajectory data. By using only one trajectory per task for initial SFT, it demonstrates that simple rewards can drive effective online RL, leading to substantial performance gains over baseline SFT models.

Quick Start & Requirements

  • Installation: Requires setting up the veRL environment and the OpenVLA-OFT model, following their respective guides.
  • Prerequisites: Weights and Biases (WandB) API key, Python environment, and specific NVIDIA drivers (tested with 470.161.03) and CUDA (tested with 12.4).
  • Hardware: Tested on single-node (8x A800 80GB GPUs) and multi-node (16x A800 80GB GPUs) setups.
  • Resources: Requires downloading SFT models (e.g., libero-10 traj1 SFT).
  • Training Command: bash examples/run_openvla_oft_rl.sh
  • Documentation: veRL, OpenVLA-OFT

Highlighted Details

  • Achieves 97.6 points on LIBERO-Long using OpenVLA-OFT.
  • Improves OpenVLA-OFT performance from 17.3 to 91.7 (430.1% increase) with only one trajectory per task for SFT.
  • Supports LIBERO benchmark; RoboTwin benchmark is planned.
  • Models available on Hugging Face: SimpleVLA-RL Collection.

Maintenance & Community

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the README. The underlying projects (veRL, OpenVLA-OFT) should be consulted for licensing details.

Limitations & Caveats

  • The README notes that their openvla-oft model design differs from the official one.
  • Support for additional benchmarks (e.g., RoboTwin) and tokenizers (Pi0 fast tokenizer) is listed as a TODO.
Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
7
Star History
299 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.