GPT2  by affjljoo3581

PyTorch implementation for GPT-2 model training and inference

Created 5 years ago
342 stars

Top 80.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a PyTorch implementation of OpenAI's GPT-2 language model, targeting researchers and developers interested in training, fine-tuning, and deploying GPT-2 for text generation tasks. It offers a comprehensible and optimized codebase for unsupervised multitask learning.

How It Works

The implementation focuses on core GPT-2 architecture components, enabling users to train models from scratch on custom corpora or fine-tune existing checkpoints. It supports standard training loops, evaluation metrics, and text generation with nucleus sampling. Performance optimizations include optional automatic mixed-precision (AMP) and gradient checkpointing via NVIDIA Apex.

Quick Start & Requirements

  • Install: pip install -r requirements.txt (assuming requirements.txt exists, otherwise manual install of dependencies).
  • Prerequisites: PyTorch, NumPy, Matplotlib, Regex, Tqdm. NVIDIA Apex is recommended for AMP and fused CUDA layers.
  • Training: Requires tokenized training/evaluation datasets and a vocabulary file.
  • Demo: Google Colab notebooks are available for text generation and evaluation, including a specific notebook for a Korean GPT-2 model.

Highlighted Details

  • Supports training from scratch or resuming from checkpoints.
  • Offers text generation with nucleus sampling.
  • Includes model evaluation and metric visualization.
  • Optional integration with NVIDIA Apex for performance gains (AMP, fused CUDA layers).
  • Configurable model parameters (layers, heads, dimensions, sequence length).

Maintenance & Community

No specific information on contributors, sponsorships, or community channels (Discord/Slack) is provided in the README.

Licensing & Compatibility

  • License: Apache-2.0.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The README does not detail specific limitations, known bugs, or deprecation status. The setup for training requires manual corpus preparation and tokenization.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Casper Hansen Casper Hansen(Author of AutoAWQ), and
1 more.

GPT2 by ConnorJL

0%
1k
GPT2 training implementation, supporting TPUs and GPUs
Created 6 years ago
Updated 2 years ago
Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Omar Khattab Omar Khattab(Coauthor of DSPy, ColBERT; Professor at MIT), and
15 more.

gpt-neo by EleutherAI

0.0%
8k
GPT-2/3-style model implementation using mesh-tensorflow
Created 5 years ago
Updated 3 years ago
Feedback? Help us improve.