GPT2 by affjljoo3581

PyTorch implementation for GPT-2 model training and inference

Created 5 years ago

353 stars

Top 79.1% on SourcePulse

Project Summary

This project provides a PyTorch implementation of OpenAI's GPT-2 language model, targeting researchers and developers interested in training, fine-tuning, and deploying GPT-2 for text generation tasks. It offers a comprehensible and optimized codebase for unsupervised multitask learning.

How It Works

The implementation focuses on core GPT-2 architecture components, enabling users to train models from scratch on custom corpora or fine-tune existing checkpoints. It supports standard training loops, evaluation metrics, and text generation with nucleus sampling. Performance optimizations include optional automatic mixed-precision (AMP) and gradient checkpointing via NVIDIA Apex.

Quick Start & Requirements

Install: pip install -r requirements.txt (assuming requirements.txt exists, otherwise manual install of dependencies).
Prerequisites: PyTorch, NumPy, Matplotlib, Regex, Tqdm. NVIDIA Apex is recommended for AMP and fused CUDA layers.
Training: Requires tokenized training/evaluation datasets and a vocabulary file.
Demo: Google Colab notebooks are available for text generation and evaluation, including a specific notebook for a Korean GPT-2 model.

Highlighted Details

Supports training from scratch or resuming from checkpoints.
Offers text generation with nucleus sampling.
Includes model evaluation and metric visualization.
Optional integration with NVIDIA Apex for performance gains (AMP, fused CUDA layers).
Configurable model parameters (layers, heads, dimensions, sequence length).

Maintenance & Community

No specific information on contributors, sponsorships, or community channels (Discord/Slack) is provided in the README.

Licensing & Compatibility

License: Apache-2.0.
Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The README does not detail specific limitations, known bugs, or deprecation status. The setup for training requires manual corpus preparation and tokenization.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days