gpt-2 by openai

Code for research paper "Language Models are Unsupervised Multitask Learners"

Created 7 years ago

24,541 stars

Top 1.6% on SourcePulse

32 Experts Love This Project

aravindsrinivas

Aravind Srinivas

Cofounder of Perplexity

simonw

Coauthor of Django

0hq

Coauthor of Sora

shyamal-anadkat

Shyamal Anadkat

Research Scientist at OpenAI

and 28 more!

Project Summary

This repository provides the code and models for OpenAI's GPT-2 language model, as described in their "Language Models are Unsupervised Multitask Learners" paper. It serves as a starting point for researchers and engineers to experiment with GPT-2's capabilities, particularly for exploring its unsupervised multitask learning potential.

How It Works

GPT-2 is a transformer-based language model that generates text by predicting the next word in a sequence. Its architecture allows it to perform a wide range of tasks without explicit task-specific training, demonstrating the power of large-scale unsupervised learning.

Quick Start & Requirements

Install: pip install gpt-2
Prerequisites: Python 3.5+, TensorFlow 1.10.0+ or PyTorch 1.0+.
Models: Pre-trained models of various sizes (small, medium, large, XL) are available for download.
Documentation: Model Card

Highlighted Details

Code and models from the seminal "Language Models are Unsupervised Multitask Learners" paper.
Staged release approach detailed in accompanying blog posts.
Correction of previously reported parameter counts for model sizes.

Maintenance & Community

Status: Archived; no further updates are expected.
Community: Open to collaboration on research and applications, especially concerning malicious use, defenses, and bias mitigation.

Licensing & Compatibility

License: Modified MIT.
Compatibility: Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The models are provided as-is, with no updates planned. GPT-2's robustness and worst-case behaviors are not fully understood, and it may exhibit biases and factual inaccuracies present in its training data. Generated text should be clearly labeled as synthetic, as models can be subtly incoherent or inaccurate.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

2

Issues (30d)

7

Star History

117 stars in the last 30 days

Explore Similar Projects

Starred by

Ying Sheng

Ying Sheng(Coauthor of SGLang),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

1 more.

llm-seminar by craffel

Course reading list for large language models

Created 3 years ago

Updated 3 years ago

Starred by

Ishaan Jaffer

Ishaan Jaffer(Cofounder of LiteLLM).

GPT-Fathom by GPT-Fathom

LLM evaluation suite for open/closed-source models, reproducible research

Created 2 years ago

Updated 1 year ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

awesome-huge-models by zhengzangw

Curated list of resources for large AI models

Created 3 years ago

Updated 2 years ago

Starred by

Ying Sheng

Ying Sheng(Coauthor of SGLang),

Casper Hansen

Casper Hansen(Author of AutoAWQ), and

1 more.

InfiniteBench by OpenBMB

Benchmark for evaluating language models on super-long contexts (100k+ tokens)

Created 2 years ago

Updated 1 year ago

PyCodeGPT by microsoft

GPT model for Python code completion and generation

Created 3 years ago

Updated 2 years ago

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai).

Xwin-LM by Xwin-LM

LLM for alignment research, fine-tuning, and open-source contribution

Created 2 years ago

Updated 1 year ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

DrWhy by ModelOriented

XAI toolkit for exploration, explanation, and visualization of predictive models

Created 7 years ago

Updated 2 years ago

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

4 more.

chain-of-thought-hub by FranxYao

LLM benchmark for complex reasoning via chain-of-thought prompting

Created 2 years ago

Updated 1 year ago

Starred by

Christian Laforte

Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI) and

Logan Kilpatrick

Logan Kilpatrick(Product Lead on Google AI Studio).

model-zoo by FluxML

Julia/FluxML model demos

Created 8 years ago

Updated 1 year ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Gabriel Almeida

Gabriel Almeida(Cofounder of Langflow), and

5 more.

lit by PAIR-code

Interactive ML model analysis tool for understanding model behavior

Created 5 years ago

Updated 1 month ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Neel Nanda

Neel Nanda(Research Scientist at Google DeepMind), and

1 more.

TransformerLens by TransformerLensOrg

Library for mechanistic interpretability research on GPT-style language models

Created 3 years ago

Updated 2 days ago

Starred by

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind),

Omar Khattab

Omar Khattab(Coauthor of DSPy, ColBERT; Professor at MIT), and

15 more.

gpt-neo by EleutherAI

GPT-2/3-style model implementation using mesh-tensorflow

Created 5 years ago

Updated 3 years ago

Feedback? Help us improve.