Evaluator LM for fine-grained assessment using customized rubrics
Top 89.3% on sourcepulse
Prometheus provides an open-source, reproducible, and cost-effective solution for fine-grained evaluation of language models using a customized score rubric. It serves as an alternative to human or GPT-4 evaluation, targeting researchers and developers needing detailed LLM performance assessments.
How It Works
Prometheus is an evaluator LM fine-tuned to provide detailed feedback and assign scores based on a provided rubric. It uses a specific prompt format that includes the instruction, response to evaluate, a reference answer for a perfect score, and the detailed scoring criteria. The model then generates feedback and a score between 1 and 5, formatted as "Feedback: (feedback) [RESULT] (score)". This approach allows for precise, rubric-driven evaluations, moving beyond general quality assessments.
Quick Start & Requirements
pip install -r requirements.txt
torchrun
and is built upon llama-recipes
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README does not specify the base model used for Prometheus or provide explicit licensing information, which may impact commercial use or integration into closed-source projects. Inference setup requires a separate TGI server.
1 year ago
1 week