t-zero by bigscience-workshop

Codebase for training, evaluation, and inference of the T0 model

Created 4 years ago

465 stars

Top 65.2% on SourcePulse

View on GitHub

4 Experts Love This Project

Stas Bekman

Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake

Philipp Schmid

DevRel at Google DeepMind

Edward Sun

Research Scientist at Meta Superintelligence Lab

Thomas Wolf

Cofounder of Hugging Face

Project Summary

This repository provides the codebase and instructions for reproducing the training, evaluation, and inference of T0, a large language model designed for zero-shot task generalization through massive multitask prompted fine-tuning. It enables researchers and practitioners to replicate T0's performance, which matches GPT-3 while being 16x smaller, and explore further advancements in multitask learning.

How It Works

The core approach involves massively multitask prompted fine-tuning, where the model is trained on a diverse mixture of datasets, each presented with specific prompts. This method, detailed in the paper "Multitask Prompted Training Enables Zero-Shot Task Generalization," allows T0 to generalize effectively to unseen tasks in a zero-shot manner. The repository facilitates replicating this training process, evaluating performance against benchmarks, and running inference with pre-trained checkpoints.

Quick Start & Requirements

Install by navigating to the root directory and running pip install -e ..
For seqio tasks, install with pip install -e .[seqio_tasks].
Requires Python and standard ML dependencies. Specific hardware requirements for training are not detailed but are substantial given the model size.
Pre-trained checkpoints are available on Hugging Face: T0, T0+, T0++, T0 3B.

Highlighted Details

T0 outperforms or matches GPT-3 while being 16x smaller.
Offers multiple checkpoints (T0, T0+, T0++, T0 3B) for varying performance and resource needs.
Includes checkpoints from ablation studies to analyze the impact of prompt variations.
Facilitates fine-tuning T0 with additional datasets or prompts.

Maintenance & Community

The project originates from the BigScience workshop, a large collaborative effort. Specific maintenance details or community links (e.g., Discord/Slack) are not provided in the README.

Licensing & Compatibility

The repository itself does not explicitly state a license. However, the underlying T0 models are typically released under permissive licenses allowing for research and commercial use, but users should verify the specific license for each Hugging Face checkpoint.

Limitations & Caveats

The README focuses on reproducing T0 and does not detail requirements for training from scratch, which would likely be resource-intensive. Specific instructions for inference or fine-tuning beyond the basic setup are not extensively covered.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days