Discover and explore top open-source AI tools and projects—updated daily.
THUDMAgentic reinforcement learning scaled via a multi-turn, multi-task framework
Top 95.9% on SourcePulse
AgentRL provides a framework for scaling agentic reinforcement learning across multi-turn and multi-task scenarios. It targets researchers and engineers developing complex LLM agents, offering a robust system for efficient training and deployment. The primary benefit is enabling the development and scaling of sophisticated agentic RL pipelines that can handle intricate, sequential interactions.
How It Works
The project is divided into a training framework and an environment deployment framework. The training framework utilizes Ray for distributed computing, employing specialized worker pools (Rollout, Actor, Reference) managed via Ray placement groups for deterministic resource allocation. It supports an asynchronous GRPO training pipeline, where tasks are generated and trajectories collected by a DistributedTaskManager, with data stored in a shared buffer. Efficient parameter synchronization between training and inference workers is achieved using NCCL for near real-time model consistency. The environment deployment framework, built upon AgentBench, features a high-performance Go-based controller managing numerous task worker sessions and a gRPC transport layer for reliable communication between the controller and task workers.
Quick Start & Requirements
pip install -e "./trainer[sglang]"examples/simple-calculatorexamples/training/agentrl_trainer.pyAgentBench FCdocs/tasks.mddocs/deployment.mdHighlighted Details
DistributedTaskManager to handle complex, grouped multi-turn interactions.Maintenance & Community
No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmaps were present in the provided README.
Licensing & Compatibility
Limitations & Caveats
The agentrl-eval component is noted as experimental. The framework's distributed nature, particularly the reliance on a Ray cluster and a Go-based controller, implies a non-trivial setup and infrastructure requirement for full deployment.
2 months ago
Inactive
harbor-framework