gpt-neo by EleutherAI

GPT-2/3-style model implementation using mesh-tensorflow

Created 5 years ago

8,284 stars

Top 6.2% on SourcePulse

View on GitHub

17 Experts Love This Project

Elie Bursztein

Cybersecurity Lead at Google DeepMind

Omar Khattab

Coauthor of DSPy, ColBERT; Professor at MIT

Stella Rose Biderman

Executive Director at EleutherAI

Johannes Hagemann

Cofounder of Prime Intellect

and 13 more!

Project Summary

This repository provides an implementation of GPT-Neo, a large-scale autoregressive language model inspired by GPT-3, utilizing mesh-tensorflow for model and data parallelism. It targets researchers and practitioners interested in training or fine-tuning large language models, offering features like local and linear attention, Mixture of Experts, and axial positional embeddings.

How It Works

GPT-Neo leverages mesh-tensorflow to distribute model and data across multiple processing units, enabling the training of large transformer models. It supports various attention mechanisms (global, local, linear) and architectural modifications like Mixture of Experts and axial positional embeddings, offering flexibility beyond standard GPT-3 implementations.

Quick Start & Requirements

Install: pip3 install -r requirements.txt
Prerequisites: Google Cloud Platform account and TPU access for training. GPU support is available but may require additional configuration.
Pre-trained Models: Available for download from the-eye.eu (1.3B and 2.7B parameters).
Resources: Training is officially supported on TPUs; GPU setup may require troubleshooting.
Documentation: Colab notebook, Training Guide.

Highlighted Details

Implements model and data parallelism for large-scale training.
Supports advanced features: local attention, linear attention, Mixture of Experts, axial positional embeddings.
Offers pre-trained models (1.3B and 2.7B parameters) trained on "The Pile" dataset.
Includes evaluation benchmarks comparing GPT-Neo models against GPT-2 and GPT-3.

Maintenance & Community

The code is no longer actively maintained as of August 2021, with focus shifted to the GPT-NeoX repository. It is preserved for archival purposes.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README text. Users should verify licensing for commercial use or closed-source integration.

Limitations & Caveats

The project is archived and no longer maintained. While functional, users should expect potential issues with outdated dependencies or lack of community support for new features or bug fixes. Training efficiency at very large scales (200B+ parameters) is noted as poor.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days