GPT2  by ConnorJL

GPT2 training implementation, supporting TPUs and GPUs

created 6 years ago
1,424 stars

Top 29.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an implementation for training the GPT-2 language model, with a focus on supporting Tensor Processing Units (TPUs) alongside GPUs. It's designed for researchers and engineers looking to train or fine-tune GPT-2, offering flexibility in dataset handling and model configuration.

How It Works

The implementation uses TensorFlow and follows the GPT-2 architecture specifications. It supports training on both GPUs and TPUs, with specific instructions for data preparation using TFRecords. The configuration is managed via JSON files, allowing detailed control over model parameters, training hyperparameters, and data paths.

Quick Start & Requirements

  • Install: pip3 install tensorflow-gpu regex (for GPUs) or pip3 install tensorflow regex google-api-python-client oauth2client (for TPUs). Additional packages (requests, tqdm, newspaper3k, ftfy) are needed for downloading models and generating datasets.
  • Prerequisites: TensorFlow, Python 3.x. TPU usage requires Google Cloud Storage for datasets.
  • Setup: Requires downloading pre-trained models and preparing a dataset (e.g., OpenWebText) in TFRecord format. Dataset generation can be resource-intensive.
  • Links: Official GPT-2 paper

Highlighted Details

  • Supports training on TPUs (v2 and v3 pods) and GPUs.
  • Offers pre-trained models: "117M", "PrettyBig", and "1.5B".
  • Detailed instructions for preparing custom datasets in TFRecord format.
  • JSON-based configuration for model and training parameters.
  • Includes options for learning rate scheduling, optimizers (Adam, Adafactor), and dropout.

Maintenance & Community

  • The project is maintained by ConnorJL. No specific community channels or roadmap are detailed in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. It mentions that the implementation is not the official OpenAI GPT-2.

Limitations & Caveats

The author notes that this implementation has not replicated the full performance of the original OpenAI GPT-2 model, and the reason for this discrepancy is unknown. Prediction is not supported on TPUs. Evaluation on TPU pods must be commented out. Dataset scripts are described as "hacky" and may require adaptation.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
5 more.

gpt-neo by EleutherAI

0.0%
8k
GPT-2/3-style model implementation using mesh-tensorflow
created 5 years ago
updated 3 years ago
Feedback? Help us improve.