verifiers  by PrimeIntellect-ai

RL for LLMs in verifiable environments

Created 1 year ago
3,858 stars

Top 12.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides tools for reinforcement learning (RL) with large language models (LLMs) in verifiable environments, specifically targeting multi-turn tool use. It is designed for researchers and practitioners working on advanced LLM-based agents that require complex interaction and validation.

How It Works

The core approach leverages Generative Reward Optimization (GRO) for RL training within custom multi-turn environments. It supports multi-agent interactions and features specialized environments like ToolEnv and CodeEnv with XML parsers for dataset formatting and rubrics for evaluating correctness. This design facilitates training LLMs to reliably use tools and engage in complex, verifiable tasks.

Quick Start & Requirements

  • Install via git clone and uv sync followed by uv pip install flash-attn --no-build-isolation. Activate the virtual environment with source .venv/bin/activate.
  • Requires Python, wandb and huggingface-cli logins (or report_to=None).
  • Recommended: 7B+ parameter models and at least 8 GPUs for optimal results.
  • See examples for multi-GPU setup with vLLM inference server and accelerate launch for training.
  • Official quick-start and examples are available within the repository.

Highlighted Details

  • Supports multi-turn tool use in CodeEnv and ToolEnv.
  • Includes environments like DoubleCheckEnv, CodeEnv, and ToolEnv.
  • Provides dataset formatting tools and basic rubrics for math/code correctness.
  • Defaults are provided for GRPO, models, and tokenizers.

Maintenance & Community

The project is presented as in-progress research code. No specific community channels or maintenance details are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The citation suggests it is intended for research purposes. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

This repository is in-progress research code and is not guaranteed to yield stable or optimal training results. It is primarily for multi-turn LLM RL and may not be suitable if multi-turn tool calling or multi-agent interactions are not required.

Health Check
Last Commit

19 hours ago

Responsiveness

1 day

Pull Requests (30d)
170
Issues (30d)
8
Star History
93 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
3 more.

ROLL by alibaba

0.9%
3k
RL library for large language models
Created 9 months ago
Updated 1 day ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
16 more.

SkyRL by NovaSky-AI

1.4%
2k
RL training pipeline for multi-turn tool use LLMs, optimized for real-world tasks
Created 10 months ago
Updated 19 hours ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA) and Alex Chen Alex Chen(Cofounder of Nexa AI).

EasyR1 by hiyouga

0.5%
5k
RL training framework for multi-modality models
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.