gpt-2-tensorflow2.0  by akanyaani

GPT-2 implementation for sequence generation

created 6 years ago
262 stars

Top 97.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an implementation of OpenAI's GPT-2 model for pre-training and sequence generation using TensorFlow 2.0. It is designed for researchers and developers interested in replicating or extending GPT-2's capabilities within the TensorFlow ecosystem.

How It Works

The project implements the GPT-2 architecture, including the transformer decoder blocks, attention mechanisms, and positional encodings. It supports pre-training on custom datasets and generating text sequences based on provided context. The implementation leverages TensorFlow 2.0's eager execution and Keras API for a more Pythonic development experience.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python >= 3.6, TensorFlow-GPU == 2.3.0, NumPy, setuptools, ftfy, tqdm, Click, sentencepiece.
  • Setup: Requires cloning the repository and installing dependencies. Pre-training involves data preprocessing.
  • Links: OpenAI GPT-2 Paper, OpenWebText

Highlighted Details

  • Supports distributed training across multiple GPUs.
  • Includes a sequence_generator.ipynb notebook for text generation.
  • Offers command-line arguments for configuring pre-training and training parameters.
  • Provides TensorBoard logging for monitoring training progress.

Maintenance & Community

  • Author: Abhay Kumar (akanyaani@gmail.com)
  • Contributions via issues and pull requests are welcome.

Licensing & Compatibility

  • License: MIT
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The project lists "Parallel Preprocessing" and a "Fine-Tuning wrapper" as future tasks, indicating these features are not yet implemented. The TensorFlow version is pinned to 2.3.0, which may limit compatibility with newer TensorFlow releases.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Alex Cheema Alex Cheema(Cofounder of EXO Labs), and
1 more.

recurrent-pretraining by seal-rg

0.1%
806
Pretraining code for depth-recurrent language model research
created 5 months ago
updated 2 weeks ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), Abhishek Thakur Abhishek Thakur(World's First 4x Kaggle GrandMaster), and
5 more.

xlnet by zihangdai

0.0%
6k
Language model research paper using generalized autoregressive pretraining
created 6 years ago
updated 2 years ago
Feedback? Help us improve.