pytorch-openai-transformer-lm  by huggingface

PyTorch implementation of OpenAI's Transformer LM

created 7 years ago
1,512 stars

Top 27.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of OpenAI's finetuned transformer language model, enabling users to leverage pre-trained weights for language understanding tasks. It's designed for researchers and practitioners familiar with transformer architectures and PyTorch, offering a direct translation of OpenAI's TensorFlow code for easier adoption and experimentation.

How It Works

The implementation closely mirrors OpenAI's original TensorFlow code, including a modified Adam optimizer with fixed weight decay and scheduled learning rates. It provides TransformerModel and LMHead classes for language modeling, and ClfHead for classification tasks, allowing users to add decoders or classifiers on top of the transformer's hidden states.

Quick Start & Requirements

  • Install: pip install torch (version >=0.4)
  • Additional for training: tqdm, sklearn, spacy, ftfy, pandas
  • Weights: Clone OpenAI's repo and place the model folder locally.
  • Dataset Encoding: Use encode_dataset() from utils.py.
  • Example Usage: See __main__ in train.py.
  • ROCStories Fine-tuning: python train.py --dataset rocstories --desc rocstories --submit --analysis --data_dir [path]

Highlighted Details

  • Reproduces OpenAI's ROCStories Cloze Test results (85.84% accuracy on single GPU).
  • Fine-tuning on ROCStories takes ~10 minutes on a single K-80 GPU.
  • Supports adding LM heads (tied encoder/decoder weights) and classifier heads.
  • Includes a script to import OpenAI's pre-trained TensorFlow weights.

Maintenance & Community

No specific community links or active maintenance signals are present in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The implementation is primarily single-GPU focused, limiting batch sizes and potentially impacting accuracy compared to multi-GPU setups. The README does not mention support for newer PyTorch versions or hardware accelerators beyond GPUs.

Health Check
Last commit

4 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0.4%
258
Efficiently train foundation models with PyTorch
created 1 year ago
updated 1 week ago
Feedback? Help us improve.