Discover and explore top open-source AI tools and projects—updated daily.
Tool-integrated RL for autonomous tool discovery and refinement
Top 91.2% on SourcePulse
ToRL (Tool-Integrated Reinforcement Learning) is a framework for enabling large language models to autonomously discover and refine tool usage strategies through reinforcement learning, targeting researchers and developers working on complex reasoning tasks. It aims to achieve state-of-the-art performance by allowing models to learn when and how to invoke tools, leading to emergent cognitive behaviors like self-correction and adaptive strategy selection.
How It Works
ToRL challenges traditional supervised fine-tuning approaches by employing exploration-driven reinforcement learning for tool integration. Models learn to invoke tools, cross-validate outputs with reasoning, and self-correct errors without explicit human supervision or predefined tool patterns. This approach allows models to adaptively select between tool-based and pure-reasoning strategies, enhancing performance on challenging mathematical benchmarks.
Quick Start & Requirements
conda
for environment creation (sandbox-runtime
), python==3.11
, and installation of dependencies via requirements.txt
and runtime/python/requirement.txt
. The SandboxFusion
tool must be installed and launched separately, with its URL configured in verl/workers/rollout/vllm_rollout/vllm_rollout_spmd.py
.scripts/torl_1.5b
) to initiate training.wandb
, jsonlines
, math-verify
, hydra-core==1.4.0.dev1
, sortedcontainers
, qwen-agent[code_interpreter]
, qwen-agent[python_executor]
.Highlighted Details
Maintenance & Community
The project acknowledges contributions from DeepSeek R1, Kimi-k1.5, Qwen-Math, VeRL, vLLM, Qwen-Agent, and Sandbox Fusion teams. Further community or roadmap information is not detailed in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project relies on external tools like SandboxFusion
and vLLM
, which require separate setup and configuration. The README indicates components were released on March 28, 2025, suggesting it is a recent project.
3 months ago
Inactive