Discover and explore top open-source AI tools and projects—updated daily.
LLMs for multi-turn tool-integrated reasoning with RL
Top 93.0% on SourcePulse
Summary:
SimpleTIR addresses the challenge of stable, multi-turn Tool-Integrated Reasoning (TIR) for Large Language Models (LLMs) using Reinforcement Learning (RL). It targets researchers and developers seeking to enhance LLM capabilities in complex problem-solving, data analysis, and multi-step reasoning. The project offers a novel RL stabilization technique, enabling diverse reasoning patterns and improved performance over supervised methods.
How It Works:
SimpleTIR employs end-to-end RL to train LLMs for iterative code generation, execution, and result analysis in multi-turn scenarios. It tackles training instability, stemming from tool output distributional drift and compounding errors, by filtering "void" turns (trajectories lacking code or final answers). This approach stabilizes training and fosters diverse reasoning patterns like self-correction and inductive reasoning, surpassing Supervised Fine-Tuning (SFT) limitations.
Quick Start & Requirements:
bash train.sh
with specified arguments for training or evaluation.vllm==0.8.5
. Recommends ray
for multi-node task submission and a sandbox (internal or firejail
) for code execution. Base model checkpoints (e.g., Qwen2.5-7B
) and datasets are necessary.Highlighted Details:
Maintenance & Community:
Licensing & Compatibility:
Limitations & Caveats:
The project explicitly addresses instability in multi-turn RL training, indicating it as a core challenge. A technical paper is still in preparation, suggesting an ongoing research and development phase. High hardware requirements (multiple H100 GPUs) present a significant adoption barrier.
3 days ago
Inactive