AgentRL  by THUDM

Agentic reinforcement learning scaled via a multi-turn, multi-task framework

Created 7 months ago
267 stars

Top 95.9% on SourcePulse

GitHubView on GitHub
Project Summary

AgentRL provides a framework for scaling agentic reinforcement learning across multi-turn and multi-task scenarios. It targets researchers and engineers developing complex LLM agents, offering a robust system for efficient training and deployment. The primary benefit is enabling the development and scaling of sophisticated agentic RL pipelines that can handle intricate, sequential interactions.

How It Works

The project is divided into a training framework and an environment deployment framework. The training framework utilizes Ray for distributed computing, employing specialized worker pools (Rollout, Actor, Reference) managed via Ray placement groups for deterministic resource allocation. It supports an asynchronous GRPO training pipeline, where tasks are generated and trajectories collected by a DistributedTaskManager, with data stored in a shared buffer. Efficient parameter synchronization between training and inference workers is achieved using NCCL for near real-time model consistency. The environment deployment framework, built upon AgentBench, features a high-performance Go-based controller managing numerous task worker sessions and a gRPC transport layer for reliable communication between the controller and task workers.

Quick Start & Requirements

  • Primary install / run command: pip install -e "./trainer[sglang]"
  • Non-default prerequisites and dependencies: A Ray cluster is required for distributed training. GPU resources are necessary for worker pools.
  • Links:
    • Minimal example: examples/simple-calculator
    • Paper reproduction: examples/training/agentrl_trainer.py
    • Environment/Data: AgentBench FC
    • Task documentation: docs/tasks.md
    • Deployment documentation: docs/deployment.md

Highlighted Details

  • Asynchronous GRPO Training: Leverages specialized Ray worker pools (Rollout, Actor, Reference) and placement groups for efficient, scalable policy optimization.
  • Multi-Turn Task Management: Integrates with AgentBench and uses a DistributedTaskManager to handle complex, grouped multi-turn interactions.
  • Synchronized Model Updates: Employs NCCL for efficient parameter streaming between inference (rollout) and training (actor) workers, maintaining model lockstep.
  • High-Concurrency Environment: Features a Go-based controller designed to manage up to 10,000 concurrent task sessions, with gRPC for robust communication.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmaps were present in the provided README.

Licensing & Compatibility

  • License type: MIT License.
  • Compatibility notes: The MIT license generally permits commercial use and integration into closed-source projects.

Limitations & Caveats

The agentrl-eval component is noted as experimental. The framework's distributed nature, particularly the reliance on a Ray cluster and a Go-based controller, implies a non-trivial setup and infrastructure requirement for full deployment.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
24 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.