verifiers  by willccbb

RL for LLMs in verifiable environments

created 6 months ago
1,652 stars

Top 26.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides tools for reinforcement learning (RL) with large language models (LLMs) in verifiable environments, specifically targeting multi-turn tool use. It is designed for researchers and practitioners working on advanced LLM-based agents that require complex interaction and validation.

How It Works

The core approach leverages Generative Reward Optimization (GRO) for RL training within custom multi-turn environments. It supports multi-agent interactions and features specialized environments like ToolEnv and CodeEnv with XML parsers for dataset formatting and rubrics for evaluating correctness. This design facilitates training LLMs to reliably use tools and engage in complex, verifiable tasks.

Quick Start & Requirements

  • Install via git clone and uv sync followed by uv pip install flash-attn --no-build-isolation. Activate the virtual environment with source .venv/bin/activate.
  • Requires Python, wandb and huggingface-cli logins (or report_to=None).
  • Recommended: 7B+ parameter models and at least 8 GPUs for optimal results.
  • See examples for multi-GPU setup with vLLM inference server and accelerate launch for training.
  • Official quick-start and examples are available within the repository.

Highlighted Details

  • Supports multi-turn tool use in CodeEnv and ToolEnv.
  • Includes environments like DoubleCheckEnv, CodeEnv, and ToolEnv.
  • Provides dataset formatting tools and basic rubrics for math/code correctness.
  • Defaults are provided for GRPO, models, and tokenizers.

Maintenance & Community

The project is presented as in-progress research code. No specific community channels or maintenance details are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The citation suggests it is intended for research purposes. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This repository is in-progress research code and is not guaranteed to yield stable or optimal training results. It is primarily for multi-turn LLM RL and may not be suitable if multi-turn tool calling or multi-agent interactions are not required.

Health Check
Last commit

23 hours ago

Responsiveness

1 day

Pull Requests (30d)
41
Issues (30d)
23
Star History
821 stars in the last 90 days

Explore Similar Projects

Starred by Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), Daniel Han Daniel Han(Cofounder of Unsloth), and
4 more.

open-instruct by allenai

0.2%
3k
Training codebase for instruction-following language models
created 2 years ago
updated 15 hours ago
Feedback? Help us improve.