Discover and explore top open-source AI tools and projects—updated daily.
LMIS-ORGAgentic RL algorithms for flexible framework reproduction
Top 92.4% on SourcePulse
This repository implements agentic Reinforcement Learning (RL) algorithms on the Slime framework, targeting researchers and engineers. It enhances LLM reasoning, tool use, and long-context handling by applying RL (GRPO) without manual intermediate step annotations, leading to significant benchmark performance gains.
How It Works
Agentic RL algorithms (AgentFlow, MemAgent, ToolOrchestra) integrate with Slime via custom generation/reward hooks for multi-step rollouts. GRPO optimizes decision trajectories. AgentFlow uses a Planner-Executor-Verifier loop for tool use. MemAgent learns long-document memory compression via RL over updates. ToolOrchestra employs an Orchestrator-Expert setup, training an orchestrator LLM for task routing. This autonomous learning approach focuses gradient computation on agent decisions using a loss_mask.
Quick Start & Requirements
Recommended setup uses the official Docker image (slimerl/slime:latest). Key dependencies include slime >= 0.2.2, sglang, ray, transformers, and torch >= 2.0. GPU acceleration (--gpus all) and substantial shared memory (--shm-size=16g) are required. Detailed instructions are in agentic/ subdirectories; trained weights are on HuggingFace.
Highlighted Details
Maintenance & Community
Contributions via PRs and Issues are welcomed for integrating new methods. No specific community channels or core maintainer details are provided.
Licensing & Compatibility
The README omits license information, posing a significant adoption blocker. Terms for use, modification, and distribution remain undefined, potentially restricting commercial or closed-source integration.
Limitations & Caveats
Primarily a research-focused project for reproducing/extending agentic RL. SGLang inference may add deployment complexity (separate model ports). The lack of a defined license is the most critical caveat.
23 hours ago
Inactive
KhoomeiK
AgentR1
NVlabs