gpt-2-tensorflow2.0 by akanyaani

GPT-2 implementation for sequence generation

Created 6 years ago

262 stars

Top 97.2% on SourcePulse

Project Summary

This repository provides an implementation of OpenAI's GPT-2 model for pre-training and sequence generation using TensorFlow 2.0. It is designed for researchers and developers interested in replicating or extending GPT-2's capabilities within the TensorFlow ecosystem.

How It Works

The project implements the GPT-2 architecture, including the transformer decoder blocks, attention mechanisms, and positional encodings. It supports pre-training on custom datasets and generating text sequences based on provided context. The implementation leverages TensorFlow 2.0's eager execution and Keras API for a more Pythonic development experience.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python >= 3.6, TensorFlow-GPU == 2.3.0, NumPy, setuptools, ftfy, tqdm, Click, sentencepiece.
Setup: Requires cloning the repository and installing dependencies. Pre-training involves data preprocessing.
Links: OpenAI GPT-2 Paper, OpenWebText

Highlighted Details

Supports distributed training across multiple GPUs.
Includes a sequence_generator.ipynb notebook for text generation.
Offers command-line arguments for configuring pre-training and training parameters.
Provides TensorBoard logging for monitoring training progress.

Maintenance & Community

Author: Abhay Kumar (akanyaani@gmail.com)
Contributions via issues and pull requests are welcome.

Licensing & Compatibility

License: MIT
Compatible with commercial use and closed-source linking.

Limitations & Caveats

The project lists "Parallel Preprocessing" and a "Fine-Tuning wrapper" as future tasks, indicating these features are not yet implemented. The TensorFlow version is pinned to 2.3.0, which may limit compatibility with newer TensorFlow releases.

Health Check

Last Commit

2 years ago

Responsiveness

1+ week

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

ModelCenter by OpenBMB

Transformer library for efficient, low-resource, distributed training

Created 3 years ago

Updated 2 years ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI), and

1 more.

libai by Oneflow-Inc

Large-scale distributed parallel training toolbox

Created 4 years ago

Updated 5 months ago

Omega-AI by dromara

Java DL framework for model training/inference, supporting multi-GPU

Created 6 years ago

Updated 3 months ago

MINI_LLM by jiahe7ay

LLM pre-training reproduction repo for experimentation

Created 1 year ago

Updated 8 months ago

Starred by

Chuan Li

Chuan Li(Chief Scientific Officer at Lambda).

NeMo-Framework-Launcher by NVIDIA

Cloud-native tool for launching NeMo framework training jobs

Created 3 years ago

Updated 8 months ago

Starred by

Amanpreet Singh

Amanpreet Singh(Cofounder of Contextual AI),

Johannes Hagemann

Johannes Hagemann(Cofounder of Prime Intellect), and

2 more.

Megatron-DeepSpeed by bigscience-workshop

Transformer LM research repo for BERT & GPT-2 training at scale

Created 4 years ago

Updated 1 year ago

Starred by

George Hotz

George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai),

Casper Hansen

Casper Hansen(Author of AutoAWQ), and

1 more.

GPT2 by ConnorJL

GPT2 training implementation, supporting TPUs and GPUs

Created 6 years ago

Updated 3 years ago

Starred by

Lukas Biewald

Lukas Biewald(Cofounder of Weights & Biases),

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and

2 more.

DialoGPT by microsoft

Response generation model via large-scale pretraining

Created 6 years ago

Updated 3 years ago

Starred by

Piotr Dąbkowski

Piotr Dąbkowski(Cofounder of ElevenLabs),

Andreas Jansson

Andreas Jansson(Cofounder of Replicate), and

4 more.

OpenSeq2Seq by NVIDIA

TensorFlow toolkit for sequence-to-sequence model experimentation

Created 8 years ago

Updated 4 years ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI),

Amit Jain

Amit Jain(Cofounder of Luma AI), and

22 more.

Megatron-LM by NVIDIA

Framework for training transformer models at scale

Created 6 years ago

Updated 18 hours ago

Starred by

Albert Gu

Albert Gu(Cofounder of Cartesia; Professor at CMU),

Luca Soldaini

Luca Soldaini(Research Scientist at Ai2), and

34 more.

pytorch-lightning by Lightning-AI

Deep learning framework for pretraining, finetuning, and deploying AI models

Created 6 years ago

Updated 3 days ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Daniel Gross

Daniel Gross(Cofounder of Safe Superintelligence), and

46 more.

nanoGPT by karpathy

Minimalist repo for training/finetuning GPT models

Created 3 years ago

Updated 2 months ago

Feedback? Help us improve.