text-to-text-transfer-transformer by google-research

Unified text-to-text transformer for NLP research

Created 6 years ago

6,455 stars

Top 7.9% on SourcePulse

View on GitHub

18 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Pawel Garbacki

Cofounder of Fireworks AI

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

and 14 more!

Project Summary

Summary

This repository provides the TensorFlow/MeshTF code for the T5 (Text-To-Text Transfer Transformer) model, enabling unified text-to-text processing across various NLP tasks. It's primarily intended for reproducing experiments from the original paper. However, the project explicitly recommends using T5X (built with JAX/Flax) for new development, as the TensorFlow implementation is no longer actively maintained.

How It Works

T5 frames all NLP tasks as a text-to-text problem, taking text as input and producing text as output. This unified approach simplifies model architecture and training pipelines. The library includes t5.data for defining datasets, preprocessing text (e.g., adding task prefixes like "translate German to English:"), and specifying evaluation metrics using tf.data.Dataset. Model implementations are handled via t5.models, offering shims for TensorFlow MeshTF (for TPU-based, large-scale experiments) and an experimental integration with Hugging Face Transformers for PyTorch/GPU usage.

Quick Start & Requirements

Installation: pip install t5[gcp]
Prerequisites: Reproducing paper experiments or training large models heavily relies on Google Cloud TPUs. Setting up TPUs on GCP involves launching a VM and configuring environment variables for project, zone, bucket, and TPU details. The C4 dataset, used for pre-training, requires significant bandwidth (~7TB) and compute for preparation; distributed processing via Google Cloud Dataflow is recommended. Hugging Face integration is experimental and targets single GPUs.
Links: Colab Tutorial, TFDS Beam instructions.

Highlighted Details

Unified text-to-text framework for diverse NLP tasks.
Released pre-trained checkpoints ranging from T5-Small (60M parameters) to T5-11B (11B parameters).
t5.data provides a flexible system for task definition, data loading, and preprocessing.
Supports both TensorFlow MeshTF for TPU-based training and an experimental Hugging Face Transformers integration for GPU usage.

Maintenance & Community

The TensorFlow/MeshTF implementation is no longer actively developed; users are directed to T5X. No specific community channels (like Discord or Slack) are mentioned in the README.

Licensing & Compatibility

The license type is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The primary caveat is the lack of active development for this TensorFlow/MeshTF codebase, with T5X being the recommended successor. Reproducing the paper's results is heavily geared towards a TPU-centric workflow on Google Cloud. The Hugging Face integration is noted as experimental and subject to change.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

17 stars in the last 30 days