LLM benchmarking via Bayesian optimization
Top 92.5% on sourcepulse
BoCoEL offers a novel approach to efficiently and accurately evaluate Large Language Models (LLMs) by intelligently selecting a small, representative subset of a large corpus. This is particularly beneficial for researchers and developers dealing with the high computational cost and time associated with traditional LLM benchmarking on extensive datasets.
How It Works
BoCoEL leverages Bayesian optimization, specifically using Gaussian processes, to identify an optimal subset of queries from a corpus. The process involves encoding corpus entries into embeddings, which are significantly faster to compute than LLM inferences. Bayesian optimization then guides the selection of queries to evaluate, aiming to maximize coverage and accuracy within a defined budget. This method prioritizes efficient exploration of the embedding space, allowing for highly accurate evaluations with a minimal number of LLM interactions.
Quick Start & Requirements
pip install bocoel
or pip install "bocoel[all]"
for full features.transformers
and datasets
.examples/getting_started
directory.Highlighted Details
Maintenance & Community
The project is actively seeking contributors. A roadmap includes simplifying usage with a high-level wrapper, adding visualization tools, integrating alternative sampling methods, and supporting more LLM backends (VLLM, OpenAI API).
Licensing & Compatibility
Licensed under BSD-3-Clause, which permits commercial use and integration with closed-source projects.
Limitations & Caveats
The project is marked as "work in progress" on its roadmap, indicating potential for ongoing development and API changes. While embedders are faster than LLMs, the initial encoding of the entire corpus can still be a significant upfront cost.
1 month ago
1 day