gpt-2-simple  by minimaxir

Python package for GPT-2 text generation model fine-tuning

created 6 years ago
3,407 stars

Top 14.6% on sourcepulse

GitHubView on GitHub
Project Summary

This package provides a simplified Python interface for fine-tuning and generating text with OpenAI's GPT-2 models (124M and 355M parameters). It's designed for users who want to easily adapt GPT-2 for custom text generation tasks, offering straightforward fine-tuning and generation capabilities.

How It Works

gpt-2-simple leverages existing fine-tuning and generation scripts from OpenAI's GPT-2 repository and Neil Shepperd's fork, along with textgenrnn for output management. It streamlines the process by handling model downloads, TensorFlow session management, and providing both Python API and command-line interfaces. The approach prioritizes ease of use for fine-tuning and generation, with specific handling for document start/end tokens for better contextual generation.

Quick Start & Requirements

  • Install via pip: pip3 install gpt-2-simple
  • Requires TensorFlow 2.X (min 2.5.1).
  • GPU is strongly recommended for fine-tuning; CPU can be used for generation but is slower.
  • See Colaboratory notebook for a demo.

Highlighted Details

  • Supports fine-tuning on custom text datasets.
  • Allows generation with prefixes and truncation for controlled output.
  • Offers parallel generation via batch_size for faster results on GPUs.
  • Can pre-encode large datasets for faster GPU loading.

Maintenance & Community

  • Developed by Max Woolf (@minimaxir).
  • Project development has been largely superseded by aitextgen, which offers similar capabilities with improved efficiency. Checkpoints from gpt-2-simple are compatible with aitextgen.
  • Creator's Patreon is mentioned for support.

Licensing & Compatibility

  • MIT License.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

GPT-2 has a maximum generation limit of 1024 tokens per request and cannot stop generation early on specific end tokens without using the truncate parameter. Fine-tuning larger GPT-2 models (774M, 1558M) may require more advanced GPU configurations or may not work out-of-the-box.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

mlx-gpt2 by pranavjad

0.5%
393
Minimal GPT-2 implementation for educational purposes
created 1 year ago
updated 1 year ago
Feedback? Help us improve.