BIG-bench  by google

Collaborative benchmark for probing and extrapolating LLM capabilities

Created 4 years ago
3,124 stars

Top 15.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

BIG-bench is a collaborative benchmark designed to comprehensively evaluate and extrapolate the capabilities of large language models (LLMs). It offers over 200 diverse tasks, enabling researchers and developers to probe LLMs beyond simple imitation, providing a standardized measure for assessing their progress and potential future abilities.

How It Works

The benchmark comprises over 200 tasks, categorized into JSON-based and programmatic types. JSON tasks are defined by task.json files containing data and metadata, while programmatic tasks utilize Python scripts (task.py) for more complex interactions. BIG-bench Lite (BBL) offers a curated subset of 24 tasks for more cost-effective evaluation and comparison via a public leaderboard.

Quick Start & Requirements

  • Primary install / run command: Clone the repository (git clone https://github.com/google/BIG-bench.git), navigate to the directory, and install via python setup.py sdist && pip install -e .. For using JSON tasks with SeqIO, pip install git+https://github.com/google/BIG-bench.git is an alternative.
  • Non-default prerequisites and dependencies: Python 3.5-3.8, pytest for testing. SeqIO is required for JSON tasks.
  • Estimated setup time or resource footprint: Task evaluation can be time-consuming depending on hardware and task complexity.
  • Links: Colab notebooks are available for task inspection and creation. Detailed instructions are in docs/doc.md.

Highlighted Details

  • Features over 200 diverse tasks designed to challenge LLMs.
  • Includes BIG-bench Lite (BBL) for efficient evaluation and a public leaderboard.
  • Supports creation of both JSON-based and programmatic tasks for flexible benchmarking.
  • Tasks include a "canary" string to prevent accidental inclusion in training data.

Maintenance & Community

  • Organizers can be contacted via bigbench@googlegroups.com.
  • Copyright inquiries can be directed to big-bench-copyright@google.com.
  • Related project: NL-augmenter.

Licensing & Compatibility

  • The provided README does not specify a license. Compatibility for commercial use or closed-source linking is undetermined without a license.

Limitations & Caveats

  • The SeqIO library currently only supports loading BIG-bench tasks defined via JSON, not programmatic tasks.
  • Evaluating tasks can be resource-intensive and time-consuming.
  • Users developing programmatic tasks should be aware of incoming changes to function signatures.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
19 stars in the last 30 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
16 more.

text-to-text-transfer-transformer by google-research

0.1%
6k
Unified text-to-text transformer for NLP research
Created 6 years ago
Updated 5 months ago
Feedback? Help us improve.