ToolRL  by qiancheng0

Tool learning via reward optimization

Created 6 months ago
368 stars

Top 76.6% on SourcePulse

GitHubView on GitHub
Project Summary

ToolRL provides a framework for training large language models to effectively utilize tools, addressing the challenge of aligning model behavior with desired outcomes through reward engineering. It is targeted at researchers and developers working on agent-based LLM systems and aims to simplify the process of tool-augmented LLM training.

How It Works

The project leverages reinforcement learning (RL) techniques, specifically GRPO and PPO, built upon the veRL and TinyZero frameworks. It processes raw datasets into formats suitable for RL training, enabling fine-grained control over reward functions. This approach allows for experimentation with various reward shaping strategies to improve tool-use performance.

Quick Start & Requirements

  • Installation: Requires PyTorch (2.4.0 with CUDA 12.1), vLLM (0.6.3), Ray, veRL (from the repo), and Flash Attention 2.
  • Dataset: Raw data is provided; processing is required for training. Processed RL training data is available at ./dataset/rlla_4k.
  • Training: Use bash train_grpo.sh or bash train_ppo.sh. Configuration for BASE_MODEL and EXPERIMENT_NAME is necessary.
  • Reward Variants: Activated via environment variables (e.g., export WITHLENGTH=1).
  • Links: Paper

Highlighted Details

  • Implements multiple reward variants for fine-tuning tool-use behavior.
  • Built on established RL frameworks (veRL, TinyZero).
  • Supports GRPO and PPO training algorithms.
  • Includes dataset processing scripts.

Maintenance & Community

No specific community channels or maintenance details are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The provided citation is for a 2025 arXiv preprint.

Limitations & Caveats

The project is presented as code for a research paper, implying it may be experimental. Specific hardware requirements (CUDA 12.1, Flash Attention 2) and the need for dataset preprocessing could present adoption hurdles. The lack of explicit licensing information raises concerns for commercial use.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
5
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
4 more.

simpleRL-reason by hkust-nlp

0.2%
4k
RL recipe for reasoning ability in models
Created 9 months ago
Updated 3 months ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
19 more.

trlx by CarperAI

0%
5k
Distributed RLHF for LLMs
Created 3 years ago
Updated 1 year ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
11 more.

TinyZero by Jiayi-Pan

0.2%
12k
Minimal reproduction of DeepSeek R1 Zero for countdown/multiplication tasks
Created 9 months ago
Updated 6 months ago
Feedback? Help us improve.