GPT2 by ConnorJL

GPT2 training implementation, supporting TPUs and GPUs

Created 6 years ago

1,414 stars

Top 28.5% on SourcePulse

View on GitHub

3 Experts Love This Project

George Hotz

Author of tinygrad; Founder of the tiny corp, comma.ai

Casper Hansen

Author of AutoAWQ

Ross Taylor

Cofounder of General Reasoning; Cocreator of Papers with Code

Project Summary

This repository provides an implementation for training the GPT-2 language model, with a focus on supporting Tensor Processing Units (TPUs) alongside GPUs. It's designed for researchers and engineers looking to train or fine-tune GPT-2, offering flexibility in dataset handling and model configuration.

How It Works

The implementation uses TensorFlow and follows the GPT-2 architecture specifications. It supports training on both GPUs and TPUs, with specific instructions for data preparation using TFRecords. The configuration is managed via JSON files, allowing detailed control over model parameters, training hyperparameters, and data paths.

Quick Start & Requirements

Install: pip3 install tensorflow-gpu regex (for GPUs) or pip3 install tensorflow regex google-api-python-client oauth2client (for TPUs). Additional packages (requests, tqdm, newspaper3k, ftfy) are needed for downloading models and generating datasets.
Prerequisites: TensorFlow, Python 3.x. TPU usage requires Google Cloud Storage for datasets.
Setup: Requires downloading pre-trained models and preparing a dataset (e.g., OpenWebText) in TFRecord format. Dataset generation can be resource-intensive.
Links: Official GPT-2 paper

Highlighted Details

Supports training on TPUs (v2 and v3 pods) and GPUs.
Offers pre-trained models: "117M", "PrettyBig", and "1.5B".
Detailed instructions for preparing custom datasets in TFRecord format.
JSON-based configuration for model and training parameters.
Includes options for learning rate scheduling, optimizers (Adam, Adafactor), and dropout.

Maintenance & Community

The project is maintained by ConnorJL. No specific community channels or roadmap are detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. It mentions that the implementation is not the official OpenAI GPT-2.

Limitations & Caveats

The author notes that this implementation has not replicated the full performance of the original OpenAI GPT-2 model, and the reason for this discrepancy is unknown. Prediction is not supported on TPUs. Evaluation on TPU pods must be commented out. Dataset scripts are described as "hacky" and may require adaptation.

Health Check

Last Commit

3 years ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days