gpt-2-simple  by minimaxir

Python package for GPT-2 text generation model fine-tuning

Created 6 years ago
3,407 stars

Top 14.2% on SourcePulse

GitHubView on GitHub
Project Summary

This package provides a simplified Python interface for fine-tuning and generating text with OpenAI's GPT-2 models (124M and 355M parameters). It's designed for users who want to easily adapt GPT-2 for custom text generation tasks, offering straightforward fine-tuning and generation capabilities.

How It Works

gpt-2-simple leverages existing fine-tuning and generation scripts from OpenAI's GPT-2 repository and Neil Shepperd's fork, along with textgenrnn for output management. It streamlines the process by handling model downloads, TensorFlow session management, and providing both Python API and command-line interfaces. The approach prioritizes ease of use for fine-tuning and generation, with specific handling for document start/end tokens for better contextual generation.

Quick Start & Requirements

  • Install via pip: pip3 install gpt-2-simple
  • Requires TensorFlow 2.X (min 2.5.1).
  • GPU is strongly recommended for fine-tuning; CPU can be used for generation but is slower.
  • See Colaboratory notebook for a demo.

Highlighted Details

  • Supports fine-tuning on custom text datasets.
  • Allows generation with prefixes and truncation for controlled output.
  • Offers parallel generation via batch_size for faster results on GPUs.
  • Can pre-encode large datasets for faster GPU loading.

Maintenance & Community

  • Developed by Max Woolf (@minimaxir).
  • Project development has been largely superseded by aitextgen, which offers similar capabilities with improved efficiency. Checkpoints from gpt-2-simple are compatible with aitextgen.
  • Creator's Patreon is mentioned for support.

Licensing & Compatibility

  • MIT License.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

GPT-2 has a maximum generation limit of 1024 tokens per request and cannot stop generation early on specific end tokens without using the truncate parameter. Fine-tuning larger GPT-2 models (774M, 1558M) may require more advanced GPU configurations or may not work out-of-the-box.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.