AgentRL by THUDM

Agentic reinforcement learning scaled via a multi-turn, multi-task framework

Created 10 months ago

313 stars

Top 86.0% on SourcePulse

Project Summary

AgentRL provides a framework for scaling agentic reinforcement learning across multi-turn and multi-task scenarios. It targets researchers and engineers developing complex LLM agents, offering a robust system for efficient training and deployment. The primary benefit is enabling the development and scaling of sophisticated agentic RL pipelines that can handle intricate, sequential interactions.

How It Works

The project is divided into a training framework and an environment deployment framework. The training framework utilizes Ray for distributed computing, employing specialized worker pools (Rollout, Actor, Reference) managed via Ray placement groups for deterministic resource allocation. It supports an asynchronous GRPO training pipeline, where tasks are generated and trajectories collected by a DistributedTaskManager, with data stored in a shared buffer. Efficient parameter synchronization between training and inference workers is achieved using NCCL for near real-time model consistency. The environment deployment framework, built upon AgentBench, features a high-performance Go-based controller managing numerous task worker sessions and a gRPC transport layer for reliable communication between the controller and task workers.

Quick Start & Requirements

Primary install / run command: pip install -e "./trainer[sglang]"
Non-default prerequisites and dependencies: A Ray cluster is required for distributed training. GPU resources are necessary for worker pools.
Links:
- Minimal example: examples/simple-calculator
- Paper reproduction: examples/training/agentrl_trainer.py
- Environment/Data: AgentBench FC
- Task documentation: docs/tasks.md
- Deployment documentation: docs/deployment.md

Highlighted Details

Asynchronous GRPO Training: Leverages specialized Ray worker pools (Rollout, Actor, Reference) and placement groups for efficient, scalable policy optimization.
Multi-Turn Task Management: Integrates with AgentBench and uses a DistributedTaskManager to handle complex, grouped multi-turn interactions.
Synchronized Model Updates: Employs NCCL for efficient parameter streaming between inference (rollout) and training (actor) workers, maintaining model lockstep.
High-Concurrency Environment: Features a Go-based controller designed to manage up to 10,000 concurrent task sessions, with gRPC for robust communication.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmaps were present in the provided README.

Licensing & Compatibility

License type: MIT License.
Compatibility notes: The MIT license generally permits commercial use and integration into closed-source projects.

Limitations & Caveats

The agentrl-eval component is noted as experimental. The framework's distributed nature, particularly the reliance on a Ray cluster and a Go-based controller, implies a non-trivial setup and infrastructure requirement for full deployment.

AgentRL by THUDM

Explore Similar Projects

swarmgo by prathyushnallamothu

Agent_Foundation_Models by OPPO-PersonalAI

agent-apprenticeship by Forsy-AI

l0 by cmriat

uni-agent by verl-project

ProRL-Agent-Server by NVIDIA-NeMo

AgentLite by SalesforceAIResearch

KwaiAgents by KwaiKEG

AgentGym by WooooDyy

AWorld by inclusionAI

AgentEvolver by modelscope

harbor by harbor-framework