BIG-bench by google

Collaborative benchmark for probing and extrapolating LLM capabilities

Created 5 years ago

3,214 stars

Top 14.8% on SourcePulse

View on GitHub

16 Experts Love This Project

Shengjia Zhao

Chief Scientist at Meta Superintelligence Lab

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Yaowei Zheng

Author of LLaMA-Factory

Nir Gazit

Cofounder of Traceloop

and 12 more!

Project Summary

Summary

BIG-bench is a collaborative benchmark designed to comprehensively evaluate and extrapolate the capabilities of large language models (LLMs). It offers over 200 diverse tasks, enabling researchers and developers to probe LLMs beyond simple imitation, providing a standardized measure for assessing their progress and potential future abilities.

How It Works

The benchmark comprises over 200 tasks, categorized into JSON-based and programmatic types. JSON tasks are defined by task.json files containing data and metadata, while programmatic tasks utilize Python scripts (task.py) for more complex interactions. BIG-bench Lite (BBL) offers a curated subset of 24 tasks for more cost-effective evaluation and comparison via a public leaderboard.

Quick Start & Requirements

Primary install / run command: Clone the repository (git clone https://github.com/google/BIG-bench.git), navigate to the directory, and install via python setup.py sdist && pip install -e .. For using JSON tasks with SeqIO, pip install git+https://github.com/google/BIG-bench.git is an alternative.
Non-default prerequisites and dependencies: Python 3.5-3.8, pytest for testing. SeqIO is required for JSON tasks.
Estimated setup time or resource footprint: Task evaluation can be time-consuming depending on hardware and task complexity.
Links: Colab notebooks are available for task inspection and creation. Detailed instructions are in docs/doc.md.

Highlighted Details

Features over 200 diverse tasks designed to challenge LLMs.
Includes BIG-bench Lite (BBL) for efficient evaluation and a public leaderboard.
Supports creation of both JSON-based and programmatic tasks for flexible benchmarking.
Tasks include a "canary" string to prevent accidental inclusion in training data.

Maintenance & Community

Organizers can be contacted via bigbench@googlegroups.com.
Copyright inquiries can be directed to big-bench-copyright@google.com.
Related project: NL-augmenter.

Licensing & Compatibility

The provided README does not specify a license. Compatibility for commercial use or closed-source linking is undetermined without a license.

Limitations & Caveats

The SeqIO library currently only supports loading BIG-bench tasks defined via JSON, not programmatic tasks.
Evaluating tasks can be resource-intensive and time-consuming.
Users developing programmatic tasks should be aware of incoming changes to function signatures.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

16 stars in the last 30 days