text-to-text-transfer-transformer  by google-research

Unified text-to-text transformer for NLP research

Created 6 years ago
6,425 stars

Top 8.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This repository provides the TensorFlow/MeshTF code for the T5 (Text-To-Text Transfer Transformer) model, enabling unified text-to-text processing across various NLP tasks. It's primarily intended for reproducing experiments from the original paper. However, the project explicitly recommends using T5X (built with JAX/Flax) for new development, as the TensorFlow implementation is no longer actively maintained.

How It Works

T5 frames all NLP tasks as a text-to-text problem, taking text as input and producing text as output. This unified approach simplifies model architecture and training pipelines. The library includes t5.data for defining datasets, preprocessing text (e.g., adding task prefixes like "translate German to English:"), and specifying evaluation metrics using tf.data.Dataset. Model implementations are handled via t5.models, offering shims for TensorFlow MeshTF (for TPU-based, large-scale experiments) and an experimental integration with Hugging Face Transformers for PyTorch/GPU usage.

Quick Start & Requirements

  • Installation: pip install t5[gcp]
  • Prerequisites: Reproducing paper experiments or training large models heavily relies on Google Cloud TPUs. Setting up TPUs on GCP involves launching a VM and configuring environment variables for project, zone, bucket, and TPU details. The C4 dataset, used for pre-training, requires significant bandwidth (~7TB) and compute for preparation; distributed processing via Google Cloud Dataflow is recommended. Hugging Face integration is experimental and targets single GPUs.
  • Links: Colab Tutorial, TFDS Beam instructions.

Highlighted Details

  • Unified text-to-text framework for diverse NLP tasks.
  • Released pre-trained checkpoints ranging from T5-Small (60M parameters) to T5-11B (11B parameters).
  • t5.data provides a flexible system for task definition, data loading, and preprocessing.
  • Supports both TensorFlow MeshTF for TPU-based training and an experimental Hugging Face Transformers integration for GPU usage.

Maintenance & Community

The TensorFlow/MeshTF implementation is no longer actively developed; users are directed to T5X. No specific community channels (like Discord or Slack) are mentioned in the README.

Licensing & Compatibility

The license type is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The primary caveat is the lack of active development for this TensorFlow/MeshTF codebase, with T5X being the recommended successor. Reproducing the paper's results is heavily geared towards a TPU-centric workflow on Google Cloud. The Hugging Face integration is noted as experimental and subject to change.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 30 days

Explore Similar Projects

Starred by Shengjia Zhao Shengjia Zhao(Chief Scientist at Meta Superintelligence Lab), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
14 more.

BIG-bench by google

0.1%
3k
Collaborative benchmark for probing and extrapolating LLM capabilities
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.