t-zero  by bigscience-workshop

Codebase for training, evaluation, and inference of the T0 model

created 3 years ago
463 stars

Top 66.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the codebase and instructions for reproducing the training, evaluation, and inference of T0, a large language model designed for zero-shot task generalization through massive multitask prompted fine-tuning. It enables researchers and practitioners to replicate T0's performance, which matches GPT-3 while being 16x smaller, and explore further advancements in multitask learning.

How It Works

The core approach involves massively multitask prompted fine-tuning, where the model is trained on a diverse mixture of datasets, each presented with specific prompts. This method, detailed in the paper "Multitask Prompted Training Enables Zero-Shot Task Generalization," allows T0 to generalize effectively to unseen tasks in a zero-shot manner. The repository facilitates replicating this training process, evaluating performance against benchmarks, and running inference with pre-trained checkpoints.

Quick Start & Requirements

  • Install by navigating to the root directory and running pip install -e ..
  • For seqio tasks, install with pip install -e .[seqio_tasks].
  • Requires Python and standard ML dependencies. Specific hardware requirements for training are not detailed but are substantial given the model size.
  • Pre-trained checkpoints are available on Hugging Face: T0, T0+, T0++, T0 3B.

Highlighted Details

  • T0 outperforms or matches GPT-3 while being 16x smaller.
  • Offers multiple checkpoints (T0, T0+, T0++, T0 3B) for varying performance and resource needs.
  • Includes checkpoints from ablation studies to analyze the impact of prompt variations.
  • Facilitates fine-tuning T0 with additional datasets or prompts.

Maintenance & Community

The project originates from the BigScience workshop, a large collaborative effort. Specific maintenance details or community links (e.g., Discord/Slack) are not provided in the README.

Licensing & Compatibility

The repository itself does not explicitly state a license. However, the underlying T0 models are typically released under permissive licenses allowing for research and commercial use, but users should verify the specific license for each Hugging Face checkpoint.

Limitations & Caveats

The README focuses on reproducing T0 and does not detail requirements for training from scratch, which would likely be resource-intensive. Specific instructions for inference or fine-tuning beyond the basic setup are not extensively covered.

Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), Nathan Lambert Nathan Lambert(AI Researcher at AI2), and
1 more.

unified-io-2 by allenai

0.3%
619
Unified-IO 2 code for training, inference, and demo
created 1 year ago
updated 1 year ago
Feedback? Help us improve.