Discover and explore top open-source AI tools and projects—updated daily.
PRIME-RLOnline RL for VLA models with minimal data
Top 38.8% on SourcePulse
This repository introduces SimpleVLA-RL, an approach for online Reinforcement Learning (RL) in Vision-Language-Action (VLA) models. It enables effective training with minimal data, using only simple 0/1 outcome-level rewards, making VLA model training more data-efficient and achieving performance comparable to full-trajectory Supervised Fine-Tuning (SFT). The target audience includes researchers and practitioners working with VLA models for robotics and embodied AI.
How It Works
SimpleVLA-RL leverages outcome-level 0/1 reward signals directly from simulation environments. This approach simplifies reward engineering and significantly reduces the need for extensive, high-quality trajectory data. By using only one trajectory per task for initial SFT, it demonstrates that simple rewards can drive effective online RL, leading to substantial performance gains over baseline SFT models.
Quick Start & Requirements
veRL environment and the OpenVLA-OFT model, following their respective guides.libero-10 traj1 SFT).bash examples/run_openvla_oft_rl.shHighlighted Details
Maintenance & Community
veRL, OpenVLA-OFT, and PRIME.Licensing & Compatibility
veRL, OpenVLA-OFT) should be consulted for licensing details.Limitations & Caveats
openvla-oft model design differs from the official one.3 weeks ago
Inactive
RLHFlow
hiyouga
Physical-Intelligence