reward-bench  by allenai

Reward model evaluation tool

Created 1 year ago
634 stars

Top 52.3% on SourcePulse

GitHubView on GitHub
Project Summary

RewardBench is an evaluation tool for assessing the capabilities and safety of reward models (RMs) and models trained with Direct Preference Optimization (DPO). It provides a standardized framework for running inference, formatting datasets, and analyzing results, benefiting researchers and developers working on AI alignment and preference learning.

How It Works

RewardBench offers a unified interface for evaluating various RMs, including Starling, PairRM, OpenAssistant, and DPO models. It standardizes dataset formatting and inference procedures to ensure fair comparisons. The tool supports both direct RM evaluation and DPO model evaluation, automatically detecting instruction datasets for logging model outputs without accuracy metrics.

Quick Start & Requirements

  • Install: pip install rewardbench
  • Run: rewardbench --model={yourmodel} --dataset={yourdataset} --batch_size=8
  • Generative RMs: pip install rewardbench[generative] then rewardbench-gen --model={yourmodel}
  • Dependencies: VLLM and API access (OpenAI, Anthropic, Together) are required for local/API generative models.
  • Docs: RewardBench Dataset, Existing Test Sets, Results, Paper

Highlighted Details

  • Supports local and API-based generative RMs (LLM-as-a-judge).
  • Includes functionality for "Best of N" rankings and offline RM ensembling.
  • Features advanced logging and results uploading to Hugging Face Hub.
  • Provides scripts for running evaluations and submitting jobs via AI2's Beaker platform.

Maintenance & Community

The project is primarily maintained by Allen Institute for AI (AI2). Docker images are available for reproducible research. Contributions are welcomed via pull requests for inference stack enhancements.

Licensing & Compatibility

The repository is licensed under the Apache-2.0 license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Support for loading local models using AutoModelForSequenceClassification.from_pretrained is marked as a TODO. Functionality for certain features, like direct metadata uploads for non-DPO models on preference datasets, may require opening an issue for enhancement.

Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Andre Zayarni Andre Zayarni(Cofounder of Qdrant), and
3 more.

refinery by code-kern-ai

0%
1k
Open-source tool for NLP data scaling, assessment, and maintenance
Created 3 years ago
Updated 9 months ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
12 more.

evaluate by huggingface

0.1%
2k
ML model evaluation library for standardized performance reporting
Created 3 years ago
Updated 1 month ago
Starred by Morgan Funtowicz Morgan Funtowicz(Head of ML Optimizations at Hugging Face), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
7 more.

lighteval by huggingface

2.6%
2k
LLM evaluation toolkit for multiple backends
Created 1 year ago
Updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Gabriel Almeida Gabriel Almeida(Cofounder of Langflow), and
5 more.

lit by PAIR-code

0.1%
4k
Interactive ML model analysis tool for understanding model behavior
Created 5 years ago
Updated 3 weeks ago
Feedback? Help us improve.