pytorch-openai-transformer-lm by huggingface

PyTorch implementation of OpenAI's Transformer LM

Created 7 years ago

1,523 stars

Top 27.0% on SourcePulse

View on GitHub

7 Experts Love This Project

Clement Delangue

Cofounder of Hugging Face

Lysandre Debut

Chief Open-Source Officer at Hugging Face

Thomas Wolf

Cofounder of Hugging Face

Tom Brown

Cofounder of Anthropic

and 3 more!

Project Summary

This repository provides a PyTorch implementation of OpenAI's finetuned transformer language model, enabling users to leverage pre-trained weights for language understanding tasks. It's designed for researchers and practitioners familiar with transformer architectures and PyTorch, offering a direct translation of OpenAI's TensorFlow code for easier adoption and experimentation.

How It Works

The implementation closely mirrors OpenAI's original TensorFlow code, including a modified Adam optimizer with fixed weight decay and scheduled learning rates. It provides TransformerModel and LMHead classes for language modeling, and ClfHead for classification tasks, allowing users to add decoders or classifiers on top of the transformer's hidden states.

Quick Start & Requirements

Install: pip install torch (version >=0.4)
Additional for training: tqdm, sklearn, spacy, ftfy, pandas
Weights: Clone OpenAI's repo and place the model folder locally.
Dataset Encoding: Use encode_dataset() from utils.py.
Example Usage: See __main__ in train.py.
ROCStories Fine-tuning: python train.py --dataset rocstories --desc rocstories --submit --analysis --data_dir [path]

Highlighted Details

Reproduces OpenAI's ROCStories Cloze Test results (85.84% accuracy on single GPU).
Fine-tuning on ROCStories takes ~10 minutes on a single K-80 GPU.
Supports adding LM heads (tied encoder/decoder weights) and classifier heads.
Includes a script to import OpenAI's pre-trained TensorFlow weights.

Maintenance & Community

No specific community links or active maintenance signals are present in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The implementation is primarily single-GPU focused, limiting batch sizes and potentially impacting accuracy compared to multi-GPU setups. The README does not mention support for newer PyTorch versions or hardware accelerators beyond GPUs.

Health Check

Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days