gpt-2  by openai

Code for research paper "Language Models are Unsupervised Multitask Learners"

Created 6 years ago
24,190 stars

Top 1.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the code and models for OpenAI's GPT-2 language model, as described in their "Language Models are Unsupervised Multitask Learners" paper. It serves as a starting point for researchers and engineers to experiment with GPT-2's capabilities, particularly for exploring its unsupervised multitask learning potential.

How It Works

GPT-2 is a transformer-based language model that generates text by predicting the next word in a sequence. Its architecture allows it to perform a wide range of tasks without explicit task-specific training, demonstrating the power of large-scale unsupervised learning.

Quick Start & Requirements

  • Install: pip install gpt-2
  • Prerequisites: Python 3.5+, TensorFlow 1.10.0+ or PyTorch 1.0+.
  • Models: Pre-trained models of various sizes (small, medium, large, XL) are available for download.
  • Documentation: Model Card

Highlighted Details

  • Code and models from the seminal "Language Models are Unsupervised Multitask Learners" paper.
  • Staged release approach detailed in accompanying blog posts.
  • Correction of previously reported parameter counts for model sizes.

Maintenance & Community

  • Status: Archived; no further updates are expected.
  • Community: Open to collaboration on research and applications, especially concerning malicious use, defenses, and bias mitigation.

Licensing & Compatibility

  • License: Modified MIT.
  • Compatibility: Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The models are provided as-is, with no updates planned. GPT-2's robustness and worst-case behaviors are not fully understood, and it may exhibit biases and factual inaccuracies present in its training data. Generated text should be clearly labeled as synthetic, as models can be subtly incoherent or inaccurate.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
129 stars in the last 30 days

Explore Similar Projects

Starred by Christian Laforte Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI) and Logan Kilpatrick Logan Kilpatrick(Product Lead on Google AI Studio).

model-zoo by FluxML

0%
932
Julia/FluxML model demos
Created 8 years ago
Updated 9 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Gabriel Almeida Gabriel Almeida(Cofounder of Langflow), and
5 more.

lit by PAIR-code

0.1%
4k
Interactive ML model analysis tool for understanding model behavior
Created 5 years ago
Updated 3 weeks ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Neel Nanda Neel Nanda(Research Scientist at Google DeepMind), and
1 more.

TransformerLens by TransformerLensOrg

1.0%
3k
Library for mechanistic interpretability research on GPT-style language models
Created 3 years ago
Updated 1 day ago
Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Omar Khattab Omar Khattab(Coauthor of DSPy, ColBERT; Professor at MIT), and
15 more.

gpt-neo by EleutherAI

0.0%
8k
GPT-2/3-style model implementation using mesh-tensorflow
Created 5 years ago
Updated 3 years ago
Feedback? Help us improve.