Open-AgentRL by Gen-Verse

Reinforcement learning for LLM agents

Created 9 months ago

575 stars

Top 55.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Wing Lian

Founder of Axolotl AI

Project Summary

This project introduces RLAnything and DemyAgent, frameworks addressing dynamic optimization and agentic reasoning for Large Language Models (LLMs). It provides researchers and developers with tools to achieve state-of-the-art performance in complex agentic scenarios, even with smaller model architectures, by enhancing reinforcement learning loops and agentic reasoning strategies.

How It Works

RLAnything employs a closed-loop optimization system that dynamically refines the policy, reward model, and environment. This joint optimization amplifies learning signals by integrating step-wise feedback from an optimized reward model and critic feedback for environment adaptation, outperforming outcome-only signals. DemyAgent focuses on agentic RL by leveraging high-diversity, real end-to-end trajectories, incorporating exploration-friendly techniques like reward clipping and entropy maintenance, and favoring deliberative reasoning with selective tool calls over verbose self-reasoning.

Quick Start & Requirements

Primary Install: git clone https://github.com/Gen-Verse/Open-AgentRL.git, followed by conda environment setup (Python 3.10 for RLAnything, 3.11 for DemyAgent) and pip install -e .[vllm] or pip install -r requirements_rlanything.txt.
Prerequisites: vllm, sglang, mcore, fsdp support. Requires specific datasets (SFT, RL, benchmarks like AIME2024/2025, GPQA-Diamond, LiveCodeBench-v6) and base LLM models (e.g., Qwen2.5-7B-Instruct, Qwen3-4B-Instruct-2507). Cloud service integration (Volcano Engine Cloud FaaS for code execution sandbox) is recommended for certain tasks.
Resource Footprint: Training conducted on $8 \times$ Tesla-A100 nodes with a batch size of 64.
Links: Open-AgentRL GitHub, Datasets, DemyAgent-4B Model.

Highlighted Details

DemyAgent-4B (4B parameters) achieves state-of-the-art agentic reasoning performance, matching or exceeding larger models (14B/32B) on benchmarks like AIME2024/2025, GPQA-Diamond, and LiveCodeBench-v6.
RLAnything demonstrates consistent improvements with each added dynamic component and achieves SOTA results for GUI agents.
Step-wise signals from an optimized reward model are shown to outperform outcome signals relying solely on human labels.
The framework supports training and evaluation across GUI Agent, LLM Agent, and Coding LLM settings.

Maintenance & Community

No explicit information regarding community channels (Discord/Slack), active contributors, sponsorships, or roadmap is provided in the README.

Licensing & Compatibility

The README does not explicitly state the project's license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Training and evaluation require substantial computational resources and specific cloud infrastructure setups (e.g., Volcengine Cloud). The project relies on external codebases (VeRL, ReTool) and specific model checkpoints, necessitating careful dependency management.

Health Check

Last Commit

4 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

31 stars in the last 30 days