ProRL-Agent-Server  by NVIDIA-NeMo

Agentic RL rollouts for any harness, scaled as a service

Created 8 months ago
397 stars

Top 72.3% on SourcePulse

GitHubView on GitHub
Project Summary

NVIDIA-NeMo/ProRL-Agent-Server provides a scalable framework for Reinforcement Learning (RL) rollouts, enabling real-world agent harnesses to be used as RL environments with minimal code changes. It targets engineers and researchers seeking to efficiently train LLM agents, offering significant GPU savings through its "Rollout as a Service" architecture and smart parallel processing pipeline.

How It Works

The system employs a distributed architecture comprising a central Rollout Server that dispatches tasks to Gateway Nodes. These nodes asynchronously prepare agent runtimes, execute agents, construct trajectories, and perform evaluations. A proxy layer integrates agent harnesses, decoupling them from inference servers. This design facilitates trainer agnosticism, allowing flexibility with various training frameworks, and optimizes resource utilization via parallel Rollout Staging and Runtime Pooling.

Quick Start & Requirements

Installation involves installing the Rollout Server (uv pip install -e .) and a patched version of the Inference Server (uv pip install --prerelease=allow sglang==0.5.10 followed by scripts/patch/patch_sglang.sh). Optional dependencies include .[swebench] for SWE-bench integration and Node.js/npm for building the frontend dashboard UI. The system is trainer-agnostic, though specific integrations like Slime are documented. A typical local run requires executing polar serve_rollout, polar serve_gateway, polar dashboard, polar submit, and polar status commands, configured via a topology.yaml file.

Highlighted Details

  • "Rollout as a Service" design for scaling asynchronous RL.
  • "Smart Rollout Pipeline" featuring parallel Rollout Staging & Runtime Pooling to conserve GPU hours.
  • Trainer-agnostic core, supporting diverse training and inference frameworks.
  • Includes examples for Calculator, VLM (Count Stars), SWE-bench Verified, and Slime GRPO.

Maintenance & Community

The project roadmap indicates active development, with planned features including CUA (VLM/VLA) support, vLLM dual inference, and additional trainer bridges. Contributions are welcomed. No specific community channels (e.g., Discord, Slack) or social media links are provided in the README.

Licensing & Compatibility

The specific open-source license for this project is not explicitly stated in the provided README. Compatibility notes for commercial use or linking with closed-source projects are also absent.

Limitations & Caveats

The current SGLang integration requires patching, suggesting a potential dependency on specific versions or upcoming upstream support. Integration with vLLM is noted as "on the way." Several features listed on the roadmap, such as advanced evaluators and broader trainer support, are still under development. The absence of a clearly stated license may pose an adoption blocker.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
8
Issues (30d)
0
Star History
272 stars in the last 30 days

Explore Similar Projects

Starred by Will Brown Will Brown(Research Lead at Prime Intellect) and Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research).

hud-python by hud-evals

0%
255
AI agent development and evaluation toolkit
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.