prometheus  by prometheus-eval

Evaluator LM for fine-grained assessment using customized rubrics

created 1 year ago
302 stars

Top 89.3% on sourcepulse

GitHubView on GitHub
Project Summary

Prometheus provides an open-source, reproducible, and cost-effective solution for fine-grained evaluation of language models using a customized score rubric. It serves as an alternative to human or GPT-4 evaluation, targeting researchers and developers needing detailed LLM performance assessments.

How It Works

Prometheus is an evaluator LM fine-tuned to provide detailed feedback and assign scores based on a provided rubric. It uses a specific prompt format that includes the instruction, response to evaluate, a reference answer for a perfect score, and the detailed scoring criteria. The model then generates feedback and a score between 1 and 5, formatted as "Feedback: (feedback) [RESULT] (score)". This approach allows for precise, rubric-driven evaluations, moving beyond general quality assessments.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Inference requires a Hugging Face TGI server URL.
  • Training requires torchrun and is built upon llama-recipes.
  • See inference directory for example inference scripts.

Highlighted Details

  • Fine-grained evaluation on custom score rubrics.
  • Reproducible evaluation framework.
  • Alternative to human and GPT-4 evaluation.
  • Trained on the Feedback Collection dataset.

Maintenance & Community

  • Project associated with ICLR 2024 and NeurIPS 2023 workshops.
  • Citation available for academic use.

Licensing & Compatibility

  • License details are not explicitly stated in the README.

Limitations & Caveats

The README does not specify the base model used for Prometheus or provide explicit licensing information, which may impact commercial use or integration into closed-source projects. Inference setup requires a separate TGI server.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Luca Antiga Luca Antiga(CTO of Lightning AI), and
4 more.

helm by stanford-crfm

0.9%
2k
Open-source Python framework for holistic evaluation of foundation models
created 3 years ago
updated 1 day ago
Feedback? Help us improve.