ProRL-Agent-Server by NVIDIA-NeMo

Agentic RL rollouts for any harness, scaled as a service

Created 10 months ago

697 stars

Top 48.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Project Summary

NVIDIA-NeMo/ProRL-Agent-Server provides a scalable framework for Reinforcement Learning (RL) rollouts, enabling real-world agent harnesses to be used as RL environments with minimal code changes. It targets engineers and researchers seeking to efficiently train LLM agents, offering significant GPU savings through its "Rollout as a Service" architecture and smart parallel processing pipeline.

How It Works

The system employs a distributed architecture comprising a central Rollout Server that dispatches tasks to Gateway Nodes. These nodes asynchronously prepare agent runtimes, execute agents, construct trajectories, and perform evaluations. A proxy layer integrates agent harnesses, decoupling them from inference servers. This design facilitates trainer agnosticism, allowing flexibility with various training frameworks, and optimizes resource utilization via parallel Rollout Staging and Runtime Pooling.

Quick Start & Requirements

Installation involves installing the Rollout Server (uv pip install -e .) and a patched version of the Inference Server (uv pip install --prerelease=allow sglang==0.5.10 followed by scripts/patch/patch_sglang.sh). Optional dependencies include .[swebench] for SWE-bench integration and Node.js/npm for building the frontend dashboard UI. The system is trainer-agnostic, though specific integrations like Slime are documented. A typical local run requires executing polar serve_rollout, polar serve_gateway, polar dashboard, polar submit, and polar status commands, configured via a topology.yaml file.

Highlighted Details

"Rollout as a Service" design for scaling asynchronous RL.
"Smart Rollout Pipeline" featuring parallel Rollout Staging & Runtime Pooling to conserve GPU hours.
Trainer-agnostic core, supporting diverse training and inference frameworks.
Includes examples for Calculator, VLM (Count Stars), SWE-bench Verified, and Slime GRPO.

Maintenance & Community

The project roadmap indicates active development, with planned features including CUA (VLM/VLA) support, vLLM dual inference, and additional trainer bridges. Contributions are welcomed. No specific community channels (e.g., Discord, Slack) or social media links are provided in the README.

Licensing & Compatibility

The specific open-source license for this project is not explicitly stated in the provided README. Compatibility notes for commercial use or linking with closed-source projects are also absent.

Limitations & Caveats

The current SGLang integration requires patching, suggesting a potential dependency on specific versions or upcoming upstream support. Integration with vLLM is noted as "on the way." Several features listed on the roadmap, such as advanced evaluators and broader trainer support, are still under development. The absence of a clearly stated license may pose an adoption blocker.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

107 stars in the last 30 days