RL for LLM/VLM agent training
Top 51.4% on sourcepulse
verl-agent is a Python framework for training LLM/VLM agents using reinforcement learning, specifically designed for long-horizon, multi-turn interactions. It addresses the scalability limitations of previous methods by processing each interaction step independently with customizable input structures, enabling efficient training for complex tasks. The framework supports a wide range of LLMs, RL algorithms, and interactive environments, making it suitable for researchers and developers working on advanced agent capabilities.
How It Works
verl-agent employs a step-wise interaction design, allowing for fully customizable input structures at each turn. This contrasts with prior approaches that concatenate full interaction histories, which become computationally prohibitive for long-horizon tasks. By keeping input lengths consistent, verl-agent achieves high scalability. It also introduces "group environments" where multiple environments share identical initial states, beneficial for algorithms like GRPO and DAPO that require repeated rollouts on the same state.
Quick Start & Requirements
pip install -e .
after setting up PyTorch.Highlighted Details
Maintenance & Community
The project is associated with the paper "Group-in-Group Policy Optimization for LLM Agent Training." Recent updates include support for Qwen3, LoRA, REINFORCE++, and RLOO. The project acknowledges the veRL team and RAGEN project for foundational infrastructure and inspiration.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The AppWorld environment is experimental. To reproduce paper results for GiGPO, users must use a version released prior to the "[2025.06.03] Major Update." There are noted dependency incompatibilities (e.g., typer
) for WebShop installation, though these are flagged as ignorable.
6 days ago
Inactive