Discover and explore top open-source AI tools and projects—updated daily.
Online RL for VLA models with minimal data
Top 50.8% on SourcePulse
This repository introduces SimpleVLA-RL, an approach for online Reinforcement Learning (RL) in Vision-Language-Action (VLA) models. It enables effective training with minimal data, using only simple 0/1 outcome-level rewards, making VLA model training more data-efficient and achieving performance comparable to full-trajectory Supervised Fine-Tuning (SFT). The target audience includes researchers and practitioners working with VLA models for robotics and embodied AI.
How It Works
SimpleVLA-RL leverages outcome-level 0/1 reward signals directly from simulation environments. This approach simplifies reward engineering and significantly reduces the need for extensive, high-quality trajectory data. By using only one trajectory per task for initial SFT, it demonstrates that simple rewards can drive effective online RL, leading to substantial performance gains over baseline SFT models.
Quick Start & Requirements
veRL
environment and the OpenVLA-OFT
model, following their respective guides.libero-10 traj1 SFT
).bash examples/run_openvla_oft_rl.sh
Highlighted Details
Maintenance & Community
veRL
, OpenVLA-OFT
, and PRIME
.Licensing & Compatibility
veRL
, OpenVLA-OFT
) should be consulted for licensing details.Limitations & Caveats
openvla-oft
model design differs from the official one.5 days ago
Inactive