verifiers by PrimeIntellect-ai

RL for LLMs in verifiable environments

Created 11 months ago

3,725 stars

Top 12.9% on SourcePulse

View on GitHub

16 Experts Love This Project

Andrej Karpathy

Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n

Will Brown

Research Lead at Prime Intellect

Sebastian Raschka

Author of "Build a Large Language Model (From Scratch)"

Jeff Huber

Cofounder of Chroma

and 12 more!

Project Summary

This repository provides tools for reinforcement learning (RL) with large language models (LLMs) in verifiable environments, specifically targeting multi-turn tool use. It is designed for researchers and practitioners working on advanced LLM-based agents that require complex interaction and validation.

How It Works

The core approach leverages Generative Reward Optimization (GRO) for RL training within custom multi-turn environments. It supports multi-agent interactions and features specialized environments like ToolEnv and CodeEnv with XML parsers for dataset formatting and rubrics for evaluating correctness. This design facilitates training LLMs to reliably use tools and engage in complex, verifiable tasks.

Quick Start & Requirements

Install via git clone and uv sync followed by uv pip install flash-attn --no-build-isolation. Activate the virtual environment with source .venv/bin/activate.
Requires Python, wandb and huggingface-cli logins (or report_to=None).
Recommended: 7B+ parameter models and at least 8 GPUs for optimal results.
See examples for multi-GPU setup with vLLM inference server and accelerate launch for training.
Official quick-start and examples are available within the repository.

Highlighted Details

Supports multi-turn tool use in CodeEnv and ToolEnv.
Includes environments like DoubleCheckEnv, CodeEnv, and ToolEnv.
Provides dataset formatting tools and basic rubrics for math/code correctness.
Defaults are provided for GRPO, models, and tokenizers.

Maintenance & Community

The project is presented as in-progress research code. No specific community channels or maintenance details are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The citation suggests it is intended for research purposes. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This repository is in-progress research code and is not guaranteed to yield stable or optimal training results. It is primarily for multi-turn LLM RL and may not be suitable if multi-turn tool calling or multi-agent interactions are not required.

Health Check

Last Commit

13 hours ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

117 stars in the last 30 days