Discover and explore top open-source AI tools and projects—updated daily.
openaiLightweight library for evaluating language models
Top 11.9% on SourcePulse
This library provides a lightweight framework for evaluating language models, focusing on zero-shot, chain-of-thought prompting for realistic performance assessment. It targets researchers and developers needing transparent, reproducible benchmarks for LLM accuracy, offering a curated set of standard evaluations.
How It Works
The library emphasizes simple, direct instructions for evaluations, avoiding complex few-shot or role-playing prompts that can skew results for instruction-tuned models. This approach aims to better reflect real-world usage and model capabilities in a zero-shot setting.
Quick Start & Requirements
pip install openai (for OpenAI API), pip install anthropic (for Anthropic API).python -m simple_evals.simple_evals --model <model_name> --examples <num_examples>git clone https://github.com/openai/human-eval and pip install -e human-eval.Highlighted Details
openai/evals repository.Maintenance & Community
The repository states it will not be actively maintained, with limited acceptance of PRs for bug fixes, adding model adapters, or new eval results.
Licensing & Compatibility
Limitations & Caveats
This repository is not actively maintained and will not accept new evals. Some benchmarks (MGSM, DROP) may be saturated for newer models. Results for newer models on MATH use a newer, IID version (MATH-500).
3 months ago
1 day
kojima-takeshi188
mlfoundations
NVIDIA
stanford-crfm
EleutherAI