bocoel by rentruewang

LLM benchmarking via Bayesian optimization

Created 2 years ago

287 stars

Top 91.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chris Van Pelt

Cofounder of Weights & Biases

Project Summary

BoCoEL offers a novel approach to efficiently and accurately evaluate Large Language Models (LLMs) by intelligently selecting a small, representative subset of a large corpus. This is particularly beneficial for researchers and developers dealing with the high computational cost and time associated with traditional LLM benchmarking on extensive datasets.

How It Works

BoCoEL leverages Bayesian optimization, specifically using Gaussian processes, to identify an optimal subset of queries from a corpus. The process involves encoding corpus entries into embeddings, which are significantly faster to compute than LLM inferences. Bayesian optimization then guides the selection of queries to evaluate, aiming to maximize coverage and accuracy within a defined budget. This method prioritizes efficient exploration of the embedding space, allowing for highly accurate evaluations with a minimal number of LLM interactions.

Quick Start & Requirements

Install: pip install bocoel or pip install "bocoel[all]" for full features.
Prerequisites: Python 3.12+. Integrates with Hugging Face transformers and datasets.
Usage examples are available in the examples/getting_started directory.

Highlighted Details

Achieves accurate LLM evaluations with as few as tens of samples.
Employs Bayesian optimization with Gaussian processes for sample selection.
Supports LLMs like GPT2, Pythia, and LLAMA via Hugging Face integration.
Features a modular design and efficient corpus representation techniques.

Maintenance & Community

The project is actively seeking contributors. A roadmap includes simplifying usage with a high-level wrapper, adding visualization tools, integrating alternative sampling methods, and supporting more LLM backends (VLLM, OpenAI API).

Licensing & Compatibility

Licensed under BSD-3-Clause, which permits commercial use and integration with closed-source projects.

Limitations & Caveats

The project is marked as "work in progress" on its roadmap, indicating potential for ongoing development and API changes. While embedders are faster than LLMs, the initial encoding of the entire corpus can still be a significant upfront cost.

Health Check

Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days