t-zero  by bigscience-workshop

Codebase for training, evaluation, and inference of the T0 model

Created 3 years ago
462 stars

Top 65.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the codebase and instructions for reproducing the training, evaluation, and inference of T0, a large language model designed for zero-shot task generalization through massive multitask prompted fine-tuning. It enables researchers and practitioners to replicate T0's performance, which matches GPT-3 while being 16x smaller, and explore further advancements in multitask learning.

How It Works

The core approach involves massively multitask prompted fine-tuning, where the model is trained on a diverse mixture of datasets, each presented with specific prompts. This method, detailed in the paper "Multitask Prompted Training Enables Zero-Shot Task Generalization," allows T0 to generalize effectively to unseen tasks in a zero-shot manner. The repository facilitates replicating this training process, evaluating performance against benchmarks, and running inference with pre-trained checkpoints.

Quick Start & Requirements

  • Install by navigating to the root directory and running pip install -e ..
  • For seqio tasks, install with pip install -e .[seqio_tasks].
  • Requires Python and standard ML dependencies. Specific hardware requirements for training are not detailed but are substantial given the model size.
  • Pre-trained checkpoints are available on Hugging Face: T0, T0+, T0++, T0 3B.

Highlighted Details

  • T0 outperforms or matches GPT-3 while being 16x smaller.
  • Offers multiple checkpoints (T0, T0+, T0++, T0 3B) for varying performance and resource needs.
  • Includes checkpoints from ablation studies to analyze the impact of prompt variations.
  • Facilitates fine-tuning T0 with additional datasets or prompts.

Maintenance & Community

The project originates from the BigScience workshop, a large collaborative effort. Specific maintenance details or community links (e.g., Discord/Slack) are not provided in the README.

Licensing & Compatibility

The repository itself does not explicitly state a license. However, the underlying T0 models are typically released under permissive licenses allowing for research and commercial use, but users should verify the specific license for each Hugging Face checkpoint.

Limitations & Caveats

The README focuses on reproducing T0 and does not detail requirements for training from scratch, which would likely be resource-intensive. Specific instructions for inference or fine-tuning beyond the basic setup are not extensively covered.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Victor Taelin Victor Taelin(Author of Bend, Kind, HVM), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
2 more.

nanoT5 by PiotrNawrot

0.2%
1k
PyTorch code for T5 pre-training and fine-tuning on a single GPU
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.