Discover and explore top open-source AI tools and projects—updated daily.
googleCollaborative benchmark for probing and extrapolating LLM capabilities
Top 15.2% on SourcePulse
Summary
BIG-bench is a collaborative benchmark designed to comprehensively evaluate and extrapolate the capabilities of large language models (LLMs). It offers over 200 diverse tasks, enabling researchers and developers to probe LLMs beyond simple imitation, providing a standardized measure for assessing their progress and potential future abilities.
How It Works
The benchmark comprises over 200 tasks, categorized into JSON-based and programmatic types. JSON tasks are defined by task.json files containing data and metadata, while programmatic tasks utilize Python scripts (task.py) for more complex interactions. BIG-bench Lite (BBL) offers a curated subset of 24 tasks for more cost-effective evaluation and comparison via a public leaderboard.
Quick Start & Requirements
git clone https://github.com/google/BIG-bench.git), navigate to the directory, and install via python setup.py sdist && pip install -e .. For using JSON tasks with SeqIO, pip install git+https://github.com/google/BIG-bench.git is an alternative.pytest for testing. SeqIO is required for JSON tasks.docs/doc.md.Highlighted Details
Maintenance & Community
bigbench@googlegroups.com.big-bench-copyright@google.com.Licensing & Compatibility
Limitations & Caveats
1 year ago
Inactive
veekaybee
merrymercy
grahamjenson
ThilinaRajapakse
google-research
triton-inference-server
tensorflow
visenger
PaddlePaddle