verl-agent by langfengQ

RL for LLM/VLM agent training

Created 8 months ago

1,227 stars

Top 31.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yiran Wu

Coauthor of AutoGen

Project Summary

verl-agent is a Python framework for training LLM/VLM agents using reinforcement learning, specifically designed for long-horizon, multi-turn interactions. It addresses the scalability limitations of previous methods by processing each interaction step independently with customizable input structures, enabling efficient training for complex tasks. The framework supports a wide range of LLMs, RL algorithms, and interactive environments, making it suitable for researchers and developers working on advanced agent capabilities.

How It Works

verl-agent employs a step-wise interaction design, allowing for fully customizable input structures at each turn. This contrasts with prior approaches that concatenate full interaction histories, which become computationally prohibitive for long-horizon tasks. By keeping input lengths consistent, verl-agent achieves high scalability. It also introduces "group environments" where multiple environments share identical initial states, beneficial for algorithms like GRPO and DAPO that require repeated rollouts on the same state.

Quick Start & Requirements

Installation: Requires Python 3.12, PyTorch 2.6.0 (with CUDA 12.4), flash-attn 2.7.4.post1, and vllm 0.8.5. Installation involves creating a conda environment and running pip install -e . after setting up PyTorch.
Environments: Each environment (ALFWorld, WebShop, Sokoban, Gym Cards, AppWorld) has specific installation instructions and dependencies, often requiring separate conda environments to avoid conflicts. ALFWorld requires gymnasium 0.29.1 and stable-baselines3 2.6.0. WebShop requires Python <=3.10 and manual downloads.
Resources: Training 7B models with LoRA is supported on 2 H100 GPUs.
Links: Official Docs

Highlighted Details

Supports multi-turn, step-wise interaction with customizable per-step input structures for scalability.
Implements novel GiGPO algorithm for efficient credit assignment in long-horizon LLM agent training.
Offers support for text-only and vision-language agents, including models like Qwen3, Qwen2.5-VL, and LLaMA3.1.
Includes a diverse suite of RL algorithms (GiGPO, GRPO, PPO, DAPO, RLOO, REINFORCE++) and environments (ALFWorld, WebShop, Sokoban, Gym Cards, AppWorld).
Supports LoRA fine-tuning for reduced computational costs.

Maintenance & Community

The project is associated with the paper "Group-in-Group Policy Optimization for LLM Agent Training." Recent updates include support for Qwen3, LoRA, REINFORCE++, and RLOO. The project acknowledges the veRL team and RAGEN project for foundational infrastructure and inspiration.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The AppWorld environment is experimental. To reproduce paper results for GiGPO, users must use a version released prior to the "[2025.06.03] Major Update." There are noted dependency incompatibilities (e.g., typer) for WebShop installation, though these are flagged as ignorable.

verl-agent by langfengQ

Explore Similar Projects

AgentsMeetRL by thinkwee

tonic by fabiopardo

Awesome-Papers-Autonomous-Agent by lafmdp

verl-tool by TIGER-AI-Lab

DeepEyes by Visual-Agent

LlamaGym by KhoomeiK

Agent-R1 by 0russwest0

TextArena by LeonGuertler

MADRL by sisl

Awesome-LLM-Post-training by mbzuai-oryx

RL-Factory by Simple-Efficient

ART by OpenPipe