prometheus-eval by prometheus-eval

LLM evaluation framework using open LLMs

Created 1 year ago

1,046 stars

Top 35.8% on SourcePulse

View on GitHub

2 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Maxime Labonne

Head of Post-Training at Liquid AI

Project Summary

Prometheus-Eval provides a suite of tools for evaluating Large Language Models (LLMs) in generation tasks, offering an open-source alternative to proprietary evaluation methods. It enables users to assess LLM responses using its specialized Prometheus models, which are trained to act as impartial judges, supporting both absolute grading (1-5 scores) and relative grading (A/B comparisons).

How It Works

Prometheus-Eval leverages its family of Prometheus LLMs, specifically designed for evaluation. These models are fine-tuned to understand and apply complex scoring rubrics to LLM-generated text. The system supports flexible inference through local execution via vLLM or integration with LLM APIs via LiteLLM, allowing users to utilize powerful models like GPT-4 or their own hosted Prometheus instances.

Quick Start & Requirements

Installation: pip install prometheus-eval
Local Inference: Requires pip install vllm.
API Inference: Requires LiteLLM setup for various providers (e.g., OpenAI, Huggingface TGI).
Documentation: https://github.com/prometheus-eval/prometheus-eval

Highlighted Details

Prometheus 2 (8x7B) achieves a Pearson correlation of 0.6-0.7 with GPT-4-1106 on Likert scale benchmarks and 72-85% agreement with human judgments on pairwise ranking benchmarks.
Supports both absolute grading (1-5 scores) and pairwise ranking (A/B comparison) through different prompt formats.
Offers batch processing for significantly faster evaluation of multiple response pairs.
Includes specialized datasets like BiGGen-Bench for comprehensive LLM evaluation.

Maintenance & Community

The project is actively developed, with recent releases of M-Prometheus models and the BiGGen-Bench. Key contributors and affiliations are listed in the paper citations. Community interaction channels are not explicitly mentioned in the README.

Licensing & Compatibility

The project utilizes Apache 2.0 license, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

The library is noted to be in a beta stage, and users are encouraged to report any encountered issues. While Prometheus 2 (7B) is suitable for consumer GPUs (16GB VRAM), the larger 8x7B model will have higher hardware requirements.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

15 stars in the last 30 days