gpt-2-simple by minimaxir

Python package for GPT-2 text generation model fine-tuning

Created 6 years ago

3,403 stars

Top 14.2% on SourcePulse

View on GitHub

9 Experts Love This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Vaibhav Nivargi

Cofounder of Moveworks

Bryan Helmig

Cofounder of Zapier

Shawn Wang

Editor of Latent Space

and 5 more!

Project Summary

This package provides a simplified Python interface for fine-tuning and generating text with OpenAI's GPT-2 models (124M and 355M parameters). It's designed for users who want to easily adapt GPT-2 for custom text generation tasks, offering straightforward fine-tuning and generation capabilities.

How It Works

gpt-2-simple leverages existing fine-tuning and generation scripts from OpenAI's GPT-2 repository and Neil Shepperd's fork, along with textgenrnn for output management. It streamlines the process by handling model downloads, TensorFlow session management, and providing both Python API and command-line interfaces. The approach prioritizes ease of use for fine-tuning and generation, with specific handling for document start/end tokens for better contextual generation.

Quick Start & Requirements

Install via pip: pip3 install gpt-2-simple
Requires TensorFlow 2.X (min 2.5.1).
GPU is strongly recommended for fine-tuning; CPU can be used for generation but is slower.
See Colaboratory notebook for a demo.

Highlighted Details

Supports fine-tuning on custom text datasets.
Allows generation with prefixes and truncation for controlled output.
Offers parallel generation via batch_size for faster results on GPUs.
Can pre-encode large datasets for faster GPU loading.

Maintenance & Community

Developed by Max Woolf (@minimaxir).
Project development has been largely superseded by aitextgen, which offers similar capabilities with improved efficiency. Checkpoints from gpt-2-simple are compatible with aitextgen.
Creator's Patreon is mentioned for support.

Licensing & Compatibility

MIT License.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

GPT-2 has a maximum generation limit of 1024 tokens per request and cannot stop generation early on specific end tokens without using the truncate parameter. Fine-tuning larger GPT-2 models (774M, 1558M) may require more advanced GPU configurations or may not work out-of-the-box.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days