GPT2  by affjljoo3581

PyTorch implementation for GPT-2 model training and inference

created 5 years ago
337 stars

Top 82.8% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a PyTorch implementation of OpenAI's GPT-2 language model, targeting researchers and developers interested in training, fine-tuning, and deploying GPT-2 for text generation tasks. It offers a comprehensible and optimized codebase for unsupervised multitask learning.

How It Works

The implementation focuses on core GPT-2 architecture components, enabling users to train models from scratch on custom corpora or fine-tune existing checkpoints. It supports standard training loops, evaluation metrics, and text generation with nucleus sampling. Performance optimizations include optional automatic mixed-precision (AMP) and gradient checkpointing via NVIDIA Apex.

Quick Start & Requirements

  • Install: pip install -r requirements.txt (assuming requirements.txt exists, otherwise manual install of dependencies).
  • Prerequisites: PyTorch, NumPy, Matplotlib, Regex, Tqdm. NVIDIA Apex is recommended for AMP and fused CUDA layers.
  • Training: Requires tokenized training/evaluation datasets and a vocabulary file.
  • Demo: Google Colab notebooks are available for text generation and evaluation, including a specific notebook for a Korean GPT-2 model.

Highlighted Details

  • Supports training from scratch or resuming from checkpoints.
  • Offers text generation with nucleus sampling.
  • Includes model evaluation and metric visualization.
  • Optional integration with NVIDIA Apex for performance gains (AMP, fused CUDA layers).
  • Configurable model parameters (layers, heads, dimensions, sequence length).

Maintenance & Community

No specific information on contributors, sponsorships, or community channels (Discord/Slack) is provided in the README.

Licensing & Compatibility

  • License: Apache-2.0.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The README does not detail specific limitations, known bugs, or deprecation status. The setup for training requires manual corpus preparation and tokenization.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n) and Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm).

mlx-gpt2 by pranavjad

0.5%
393
Minimal GPT-2 implementation for educational purposes
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.