Discover and explore top open-source AI tools and projects—updated daily.
allenaiLLM evaluation system for reproducible research
Top 97.4% on SourcePulse
Open Language Model Evaluation System (OLMes) provides a flexible, reproducible system for evaluating Large Language Models (LLMs) across diverse tasks. Aimed at researchers and engineers, it enables faithful reproduction of LLM evaluation results from key papers, enhancing analytical depth through detailed logging and customizable configurations.
How It Works
Building on Eleuther AI's lm-evaluation-harness, OLMes introduces deep configuration for task variants and detailed instance-level logging (e.g., logprobs). It supports custom metrics, aggregation strategies, and flexible external data storage integrations, facilitating more thorough LLM performance analysis.
Quick Start & Requirements
pip install -e .. GPU support via pip install -e .[gpu] (requires vLLM).torch>=2.2 (potential downgrade needed).oe-eval --help offer in-tool documentation.Highlighted Details
--inspect) and command preview (--dry-run).Maintenance & Community
The project is backed by Allen Institute for AI (AI2) and their Open Language Model efforts. Specific community channels or contributor details are not provided in the README snippet.
Licensing & Compatibility
The license is not specified in the provided README content, potentially impacting commercial use or closed-source integration.
Limitations & Caveats
No explicit limitations, bugs, or status (alpha/beta) are listed. Potential dependency management nuances (e.g., PyTorch version) may exist. The absence of license information is a key adoption caveat.
1 week ago
Inactive
mlfoundations
huggingface
SWE-bench