slime-agentic  by LMIS-ORG

Agentic RL algorithms for flexible framework reproduction

Created 1 month ago
282 stars

Top 92.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository implements agentic Reinforcement Learning (RL) algorithms on the Slime framework, targeting researchers and engineers. It enhances LLM reasoning, tool use, and long-context handling by applying RL (GRPO) without manual intermediate step annotations, leading to significant benchmark performance gains.

How It Works

Agentic RL algorithms (AgentFlow, MemAgent, ToolOrchestra) integrate with Slime via custom generation/reward hooks for multi-step rollouts. GRPO optimizes decision trajectories. AgentFlow uses a Planner-Executor-Verifier loop for tool use. MemAgent learns long-document memory compression via RL over updates. ToolOrchestra employs an Orchestrator-Expert setup, training an orchestrator LLM for task routing. This autonomous learning approach focuses gradient computation on agent decisions using a loss_mask.

Quick Start & Requirements

Recommended setup uses the official Docker image (slimerl/slime:latest). Key dependencies include slime >= 0.2.2, sglang, ray, transformers, and torch >= 2.0. GPU acceleration (--gpus all) and substantial shared memory (--shm-size=16g) are required. Detailed instructions are in agentic/ subdirectories; trained weights are on HuggingFace.

Highlighted Details

  • AgentFlow: +20.0% improvement on AIME 2024 (Qwen2.5-7B).
  • MemAgent: Outperforms baselines across 7K-448K contexts on RULER-HQA (7B model).
  • ToolOrchestra: +0.110 improvement on τ²-Bench (Qwen3-8B).
  • Models are replaceable.
  • SGLang inference engine may require separate model ports.

Maintenance & Community

Contributions via PRs and Issues are welcomed for integrating new methods. No specific community channels or core maintainer details are provided.

Licensing & Compatibility

The README omits license information, posing a significant adoption blocker. Terms for use, modification, and distribution remain undefined, potentially restricting commercial or closed-source integration.

Limitations & Caveats

Primarily a research-focused project for reproducing/extending agentic RL. SGLang inference may add deployment complexity (separate model ports). The lack of a defined license is the most critical caveat.

Health Check
Last Commit

23 hours ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
230 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.