GPT-2/3-style model implementation using mesh-tensorflow
Top 6.3% on sourcepulse
This repository provides an implementation of GPT-Neo, a large-scale autoregressive language model inspired by GPT-3, utilizing mesh-tensorflow for model and data parallelism. It targets researchers and practitioners interested in training or fine-tuning large language models, offering features like local and linear attention, Mixture of Experts, and axial positional embeddings.
How It Works
GPT-Neo leverages mesh-tensorflow to distribute model and data across multiple processing units, enabling the training of large transformer models. It supports various attention mechanisms (global, local, linear) and architectural modifications like Mixture of Experts and axial positional embeddings, offering flexibility beyond standard GPT-3 implementations.
Quick Start & Requirements
pip3 install -r requirements.txt
the-eye.eu
(1.3B and 2.7B parameters).Highlighted Details
Maintenance & Community
The code is no longer actively maintained as of August 2021, with focus shifted to the GPT-NeoX repository. It is preserved for archival purposes.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README text. Users should verify licensing for commercial use or closed-source integration.
Limitations & Caveats
The project is archived and no longer maintained. While functional, users should expect potential issues with outdated dependencies or lack of community support for new features or bug fixes. Training efficiency at very large scales (200B+ parameters) is noted as poor.
3 years ago
Inactive