slime-agentic by LMIS-ORG

Agentic RL algorithms for flexible framework reproduction

Created 4 months ago

491 stars

Top 62.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Project Summary

This repository implements agentic Reinforcement Learning (RL) algorithms on the Slime framework, targeting researchers and engineers. It enhances LLM reasoning, tool use, and long-context handling by applying RL (GRPO) without manual intermediate step annotations, leading to significant benchmark performance gains.

How It Works

Agentic RL algorithms (AgentFlow, MemAgent, ToolOrchestra) integrate with Slime via custom generation/reward hooks for multi-step rollouts. GRPO optimizes decision trajectories. AgentFlow uses a Planner-Executor-Verifier loop for tool use. MemAgent learns long-document memory compression via RL over updates. ToolOrchestra employs an Orchestrator-Expert setup, training an orchestrator LLM for task routing. This autonomous learning approach focuses gradient computation on agent decisions using a loss_mask.

Quick Start & Requirements

Recommended setup uses the official Docker image (slimerl/slime:latest). Key dependencies include slime >= 0.2.2, sglang, ray, transformers, and torch >= 2.0. GPU acceleration (--gpus all) and substantial shared memory (--shm-size=16g) are required. Detailed instructions are in agentic/ subdirectories; trained weights are on HuggingFace.

Highlighted Details

AgentFlow: +20.0% improvement on AIME 2024 (Qwen2.5-7B).
MemAgent: Outperforms baselines across 7K-448K contexts on RULER-HQA (7B model).
ToolOrchestra: +0.110 improvement on τ²-Bench (Qwen3-8B).
Models are replaceable.
SGLang inference engine may require separate model ports.

Maintenance & Community

Contributions via PRs and Issues are welcomed for integrating new methods. No specific community channels or core maintainer details are provided.

Licensing & Compatibility

The README omits license information, posing a significant adoption blocker. Terms for use, modification, and distribution remain undefined, potentially restricting commercial or closed-source integration.

Limitations & Caveats

Primarily a research-focused project for reproducing/extending agentic RL. SGLang inference may add deployment complexity (separate model ports). The lack of a defined license is the most critical caveat.

slime-agentic by LMIS-ORG

Explore Similar Projects

multiagent-coaching by ltjed

ASearcher by inclusionAI

Awesome-Papers-Autonomous-Agent by lafmdp

MiMo-V2-Flash by XiaomiMiMo

LlamaGym by KhoomeiK

uni-agent by verl-project

ProRL-Agent-Server by NVIDIA-NeMo

Agent-R1 by AgentR1

ToolOrchestra by NVlabs

RL-Factory by Simple-Efficient

GLM-5 by zai-org

ART by OpenPipe