Discover and explore top open-source AI tools and projects—updated daily.
Prompt-to-Leaderboard for LLM evaluation
Top 99.1% on SourcePulse
Prompt-to-Leaderboard (P2L) addresses the limitations of aggregated LLM evaluation metrics by enabling prompt-specific leaderboards. This allows for nuanced, unsupervised, and personalized LLM evaluations, as well as optimized query routing and automated assessment of model strengths and weaknesses. The target audience includes researchers and developers working with LLMs who need more granular performance insights.
How It Works
P2L trains a model to take natural language prompts as input and output vectors of Bradley-Terry coefficients. These coefficients are then used to predict human preference votes, generating prompt-dependent leaderboards. This approach captures performance variations across different prompts and users, offering a more detailed view than averaged metrics. The method's ability to produce prompt-specific evaluations scales similarly to LLMs themselves.
Quick Start & Requirements
uv
for environment management. Install uv
via curl -LsSf https://astral.sh/uv/install.sh | sh
, then source $HOME/.local/bin/env
. Create and activate a Python 3.10 environment with uv venv .env --python 3.10
and source .env/bin/activate
.uv pip install -r serve_requirements.txt
uv pip install -r route/requirements.txt
uv pip install -r train_requirements.txt
--no-cuda
.Highlighted Details
Maintenance & Community
The project is associated with LMArena and the paper "Prompt-to-Leaderboard." Further details on community or specific maintainers are not explicitly provided in the README.
Licensing & Compatibility
The README does not explicitly state the license type. Compatibility for commercial use or closed-source linking is not detailed.
Limitations & Caveats
The README mentions that Python versions other than 3.10 are untested. Specific compatibility details for commercial use or closed-source linking are not provided. The optimal-lp cost optimizer is only compatible with BT models, and simple-lp is only compatible with grounded RK models.
4 months ago
1 week