ToolRL by qiancheng0

Tool learning via reward optimization

Created 9 months ago

405 stars

Top 71.8% on SourcePulse

Project Summary

ToolRL provides a framework for training large language models to effectively utilize tools, addressing the challenge of aligning model behavior with desired outcomes through reward engineering. It is targeted at researchers and developers working on agent-based LLM systems and aims to simplify the process of tool-augmented LLM training.

How It Works

The project leverages reinforcement learning (RL) techniques, specifically GRPO and PPO, built upon the veRL and TinyZero frameworks. It processes raw datasets into formats suitable for RL training, enabling fine-grained control over reward functions. This approach allows for experimentation with various reward shaping strategies to improve tool-use performance.

Quick Start & Requirements

Installation: Requires PyTorch (2.4.0 with CUDA 12.1), vLLM (0.6.3), Ray, veRL (from the repo), and Flash Attention 2.
Dataset: Raw data is provided; processing is required for training. Processed RL training data is available at ./dataset/rlla_4k.
Training: Use bash train_grpo.sh or bash train_ppo.sh. Configuration for BASE_MODEL and EXPERIMENT_NAME is necessary.
Reward Variants: Activated via environment variables (e.g., export WITHLENGTH=1).
Links: Paper

Highlighted Details

Implements multiple reward variants for fine-tuning tool-use behavior.
Built on established RL frameworks (veRL, TinyZero).
Supports GRPO and PPO training algorithms.
Includes dataset processing scripts.

Maintenance & Community

No specific community channels or maintenance details are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The provided citation is for a 2025 arXiv preprint.

Limitations & Caveats

The project is presented as code for a research paper, implying it may be experimental. Specific hardware requirements (CUDA 12.1, Flash Attention 2) and the need for dataset preprocessing could present adoption hurdles. The lack of explicit licensing information raises concerns for commercial use.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days