LLM evaluation framework using open LLMs
Top 38.6% on sourcepulse
Prometheus-Eval provides a suite of tools for evaluating Large Language Models (LLMs) in generation tasks, offering an open-source alternative to proprietary evaluation methods. It enables users to assess LLM responses using its specialized Prometheus models, which are trained to act as impartial judges, supporting both absolute grading (1-5 scores) and relative grading (A/B comparisons).
How It Works
Prometheus-Eval leverages its family of Prometheus LLMs, specifically designed for evaluation. These models are fine-tuned to understand and apply complex scoring rubrics to LLM-generated text. The system supports flexible inference through local execution via vLLM or integration with LLM APIs via LiteLLM, allowing users to utilize powerful models like GPT-4 or their own hosted Prometheus instances.
Quick Start & Requirements
pip install prometheus-eval
pip install vllm
.Highlighted Details
Maintenance & Community
The project is actively developed, with recent releases of M-Prometheus models and the BiGGen-Bench. Key contributors and affiliations are listed in the paper citations. Community interaction channels are not explicitly mentioned in the README.
Licensing & Compatibility
The project utilizes Apache 2.0 license, permitting commercial use and integration with closed-source applications.
Limitations & Caveats
The library is noted to be in a beta stage, and users are encouraged to report any encountered issues. While Prometheus 2 (7B) is suitable for consumer GPUs (16GB VRAM), the larger 8x7B model will have higher hardware requirements.
3 months ago
1 week