qqr by Alibaba-NLP

RL training framework for open-ended agents

Created 7 months ago

261 stars

Top 97.2% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.>

qqr is an RL training framework designed for open-ended agents, addressing discriminative collapse by enabling continuous policy improvement through relative ranking. It serves researchers and engineers by providing a lightweight, non-intrusive extension to slime with seamless Model Context Protocol (MCP) integration for high-performance agent evolution. The ArenaRL algorithm, implemented within qqr, tackles stagnation in complex tasks by using tournament-based relative ranking instead of pointwise scalar scoring.

How It Works

qqr extends the slime framework, implementing the ArenaRL algorithm which leverages tournament-based relative ranking to overcome reward model stagnation in open-ended tasks. It integrates the Model Context Protocol (MCP), an open protocol for seamless integration between LLM applications and external data sources and tools, for standardized decoupling of LLM inference and tool environments. This architecture facilitates high-throughput, distributed rollout generation for large-scale agent evolution, building upon slime's capabilities for efficient training.

Quick Start & Requirements

Installation requires cloning the repository, navigating to the directory, and running pip install -e ., after ensuring slime is installed. A quick start example is provided via bash scripts/travel/run-qwen3-8B.sh, with configuration options in qqr/examples/travel/config.py. Tested compatibility requires specific qqr and slime version pairings (e.g., qqr v0.1.3 with slime v0.2.4). Refer to the slime quick start guide for initial setup.

Highlighted Details

Features the ArenaRL algorithm with built-in tournament topologies (Anchor-Based, Round-Robin, Swiss-System, Double-Elimination, Seeded Single-Elimination).
Offers seamless MCP support for environment decoupling, enabling reuse of existing MCP Servers.
Leverages slime for high-performance, distributed training, supporting large-scale agent evolution.
The ArenaRL paper detailing the algorithm was accepted to ICML 2026.

Maintenance & Community

The project acknowledges contributions from slime and the openai-agents-python library, which provides excellent MCP interfaces. Community links such as Discord or Slack, and a roadmap, are not explicitly provided in the README, though HuggingFace and ModelScope are linked.

Licensing & Compatibility

The license type is not specified in the provided README. Compatibility is explicitly managed through documented version pairings between qqr and its dependency slime.

Limitations & Caveats

No explicit limitations are detailed, beyond the necessity of adhering to specific version compatibility between qqr and slime for stable operation.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days