Discover and explore top open-source AI tools and projects—updated daily.
Collaborative benchmark for probing and extrapolating LLM capabilities
Top 15.3% on SourcePulse
Summary
BIG-bench is a collaborative benchmark designed to comprehensively evaluate and extrapolate the capabilities of large language models (LLMs). It offers over 200 diverse tasks, enabling researchers and developers to probe LLMs beyond simple imitation, providing a standardized measure for assessing their progress and potential future abilities.
How It Works
The benchmark comprises over 200 tasks, categorized into JSON-based and programmatic types. JSON tasks are defined by task.json
files containing data and metadata, while programmatic tasks utilize Python scripts (task.py
) for more complex interactions. BIG-bench Lite (BBL) offers a curated subset of 24 tasks for more cost-effective evaluation and comparison via a public leaderboard.
Quick Start & Requirements
git clone https://github.com/google/BIG-bench.git
), navigate to the directory, and install via python setup.py sdist && pip install -e .
. For using JSON tasks with SeqIO, pip install git+https://github.com/google/BIG-bench.git
is an alternative.pytest
for testing. SeqIO is required for JSON tasks.docs/doc.md
.Highlighted Details
Maintenance & Community
bigbench@googlegroups.com
.big-bench-copyright@google.com
.Licensing & Compatibility
Limitations & Caveats
1 year ago
Inactive