verifiers  by PrimeIntellect-ai

RL for LLMs in verifiable environments

Created 11 months ago
3,725 stars

Top 12.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides tools for reinforcement learning (RL) with large language models (LLMs) in verifiable environments, specifically targeting multi-turn tool use. It is designed for researchers and practitioners working on advanced LLM-based agents that require complex interaction and validation.

How It Works

The core approach leverages Generative Reward Optimization (GRO) for RL training within custom multi-turn environments. It supports multi-agent interactions and features specialized environments like ToolEnv and CodeEnv with XML parsers for dataset formatting and rubrics for evaluating correctness. This design facilitates training LLMs to reliably use tools and engage in complex, verifiable tasks.

Quick Start & Requirements

  • Install via git clone and uv sync followed by uv pip install flash-attn --no-build-isolation. Activate the virtual environment with source .venv/bin/activate.
  • Requires Python, wandb and huggingface-cli logins (or report_to=None).
  • Recommended: 7B+ parameter models and at least 8 GPUs for optimal results.
  • See examples for multi-GPU setup with vLLM inference server and accelerate launch for training.
  • Official quick-start and examples are available within the repository.

Highlighted Details

  • Supports multi-turn tool use in CodeEnv and ToolEnv.
  • Includes environments like DoubleCheckEnv, CodeEnv, and ToolEnv.
  • Provides dataset formatting tools and basic rubrics for math/code correctness.
  • Defaults are provided for GRPO, models, and tokenizers.

Maintenance & Community

The project is presented as in-progress research code. No specific community channels or maintenance details are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The citation suggests it is intended for research purposes. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This repository is in-progress research code and is not guaranteed to yield stable or optimal training results. It is primarily for multi-turn LLM RL and may not be suitable if multi-turn tool calling or multi-agent interactions are not required.

Health Check
Last Commit

13 hours ago

Responsiveness

1 day

Pull Requests (30d)
85
Issues (30d)
13
Star History
117 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
3 more.

ROLL by alibaba

2.3%
3k
RL library for large language models
Created 7 months ago
Updated 16 hours ago
Feedback? Help us improve.