GPT2 training implementation, supporting TPUs and GPUs
Top 29.2% on sourcepulse
This repository provides an implementation for training the GPT-2 language model, with a focus on supporting Tensor Processing Units (TPUs) alongside GPUs. It's designed for researchers and engineers looking to train or fine-tune GPT-2, offering flexibility in dataset handling and model configuration.
How It Works
The implementation uses TensorFlow and follows the GPT-2 architecture specifications. It supports training on both GPUs and TPUs, with specific instructions for data preparation using TFRecords. The configuration is managed via JSON files, allowing detailed control over model parameters, training hyperparameters, and data paths.
Quick Start & Requirements
pip3 install tensorflow-gpu regex
(for GPUs) or pip3 install tensorflow regex google-api-python-client oauth2client
(for TPUs). Additional packages (requests
, tqdm
, newspaper3k
, ftfy
) are needed for downloading models and generating datasets.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The author notes that this implementation has not replicated the full performance of the original OpenAI GPT-2 model, and the reason for this discrepancy is unknown. Prediction is not supported on TPUs. Evaluation on TPU pods must be commented out. Dataset scripts are described as "hacky" and may require adaptation.
2 years ago
Inactive