gpt-neo  by EleutherAI

GPT-2/3-style model implementation using mesh-tensorflow

created 5 years ago
8,297 stars

Top 6.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an implementation of GPT-Neo, a large-scale autoregressive language model inspired by GPT-3, utilizing mesh-tensorflow for model and data parallelism. It targets researchers and practitioners interested in training or fine-tuning large language models, offering features like local and linear attention, Mixture of Experts, and axial positional embeddings.

How It Works

GPT-Neo leverages mesh-tensorflow to distribute model and data across multiple processing units, enabling the training of large transformer models. It supports various attention mechanisms (global, local, linear) and architectural modifications like Mixture of Experts and axial positional embeddings, offering flexibility beyond standard GPT-3 implementations.

Quick Start & Requirements

  • Install: pip3 install -r requirements.txt
  • Prerequisites: Google Cloud Platform account and TPU access for training. GPU support is available but may require additional configuration.
  • Pre-trained Models: Available for download from the-eye.eu (1.3B and 2.7B parameters).
  • Resources: Training is officially supported on TPUs; GPU setup may require troubleshooting.
  • Documentation: Colab notebook, Training Guide.

Highlighted Details

  • Implements model and data parallelism for large-scale training.
  • Supports advanced features: local attention, linear attention, Mixture of Experts, axial positional embeddings.
  • Offers pre-trained models (1.3B and 2.7B parameters) trained on "The Pile" dataset.
  • Includes evaluation benchmarks comparing GPT-Neo models against GPT-2 and GPT-3.

Maintenance & Community

The code is no longer actively maintained as of August 2021, with focus shifted to the GPT-NeoX repository. It is preserved for archival purposes.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README text. Users should verify licensing for commercial use or closed-source integration.

Limitations & Caveats

The project is archived and no longer maintained. While functional, users should expect potential issues with outdated dependencies or lack of community support for new features or bug fixes. Training efficiency at very large scales (200B+ parameters) is noted as poor.

Health Check
Last commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
32 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.