ToolRL  by qiancheng0

Tool learning via reward optimization

created 4 months ago
310 stars

Top 86.4% on SourcePulse

GitHubView on GitHub
Project Summary

ToolRL provides a framework for training large language models to effectively utilize tools, addressing the challenge of aligning model behavior with desired outcomes through reward engineering. It is targeted at researchers and developers working on agent-based LLM systems and aims to simplify the process of tool-augmented LLM training.

How It Works

The project leverages reinforcement learning (RL) techniques, specifically GRPO and PPO, built upon the veRL and TinyZero frameworks. It processes raw datasets into formats suitable for RL training, enabling fine-grained control over reward functions. This approach allows for experimentation with various reward shaping strategies to improve tool-use performance.

Quick Start & Requirements

  • Installation: Requires PyTorch (2.4.0 with CUDA 12.1), vLLM (0.6.3), Ray, veRL (from the repo), and Flash Attention 2.
  • Dataset: Raw data is provided; processing is required for training. Processed RL training data is available at ./dataset/rlla_4k.
  • Training: Use bash train_grpo.sh or bash train_ppo.sh. Configuration for BASE_MODEL and EXPERIMENT_NAME is necessary.
  • Reward Variants: Activated via environment variables (e.g., export WITHLENGTH=1).
  • Links: Paper

Highlighted Details

  • Implements multiple reward variants for fine-tuning tool-use behavior.
  • Built on established RL frameworks (veRL, TinyZero).
  • Supports GRPO and PPO training algorithms.
  • Includes dataset processing scripts.

Maintenance & Community

No specific community channels or maintenance details are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The provided citation is for a 2025 arXiv preprint.

Limitations & Caveats

The project is presented as code for a research paper, implying it may be experimental. Specific hardware requirements (CUDA 12.1, Flash Attention 2) and the need for dataset preprocessing could present adoption hurdles. The lack of explicit licensing information raises concerns for commercial use.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
12
Star History
40 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.