Discover and explore top open-source AI tools and projects—updated daily.
Alibaba-NLPRL training framework for open-ended agents
Top 99.4% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.>
qqr is an RL training framework designed for open-ended agents, addressing discriminative collapse by enabling continuous policy improvement through relative ranking. It serves researchers and engineers by providing a lightweight, non-intrusive extension to slime with seamless Model Context Protocol (MCP) integration for high-performance agent evolution. The ArenaRL algorithm, implemented within qqr, tackles stagnation in complex tasks by using tournament-based relative ranking instead of pointwise scalar scoring.
How It Works
qqr extends the slime framework, implementing the ArenaRL algorithm which leverages tournament-based relative ranking to overcome reward model stagnation in open-ended tasks. It integrates the Model Context Protocol (MCP), an open protocol for seamless integration between LLM applications and external data sources and tools, for standardized decoupling of LLM inference and tool environments. This architecture facilitates high-throughput, distributed rollout generation for large-scale agent evolution, building upon slime's capabilities for efficient training.
Quick Start & Requirements
Installation requires cloning the repository, navigating to the directory, and running pip install -e ., after ensuring slime is installed. A quick start example is provided via bash scripts/travel/run-qwen3-8B.sh, with configuration options in qqr/examples/travel/config.py. Tested compatibility requires specific qqr and slime version pairings (e.g., qqr v0.1.3 with slime v0.2.4). Refer to the slime quick start guide for initial setup.
Highlighted Details
slime for high-performance, distributed training, supporting large-scale agent evolution.Maintenance & Community
The project acknowledges contributions from slime and the openai-agents-python library, which provides excellent MCP interfaces. Community links such as Discord or Slack, and a roadmap, are not explicitly provided in the README, though HuggingFace and ModelScope are linked.
Licensing & Compatibility
The license type is not specified in the provided README. Compatibility is explicitly managed through documented version pairings between qqr and its dependency slime.
Limitations & Caveats
No explicit limitations are detailed, beyond the necessity of adhering to specific version compatibility between qqr and slime for stable operation.
1 month ago
Inactive
harbor-framework