gpt-neox by EleutherAI

Framework for training large-scale autoregressive language models

Created 5 years ago

7,364 stars

Top 6.9% on SourcePulse

View on GitHub

27 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Stas Bekman

Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Travis Fischer

Founder of Agentic

and 23 more!

Project Summary

EleutherAI/gpt-neox is a PyTorch framework for training large-scale autoregressive language models, built upon NVIDIA's Megatron-LM and DeepSpeed libraries. It is designed for researchers and engineers focused on training models with billions of parameters from scratch, offering advanced distributed training capabilities and support for cutting-edge architectural innovations.

How It Works

GPT-NeoX employs a combination of 3D parallelism (tensor, pipeline, and data parallelism) and DeepSpeed's ZeRO optimizer for efficient memory usage and distributed training. It integrates novel architectural features like rotary and ALiBi positional embeddings, parallel feedforward layers, and FlashAttention for performance gains. The framework supports various launching mechanisms (Slurm, MPI, IBM Job Step Manager) and hardware platforms, including AMD GPUs.

Quick Start & Requirements

Installation: pip install -r requirements/requirements.txt (additional requirements for logging and fused kernels).
Prerequisites: Python 3.8-3.10, PyTorch 1.8-2.0. Recommended to use environment isolation (e.g., Anaconda).
Fused Kernels: For AMD GPUs or manual pre-build: from megatron.fused_kernels import load; load().
Flash Attention: Install from ./requirements/requirements-flashattention.txt or use NGC containers.
Transformer Engine: Install from ./requirements/requirements-transformer-engine.txt or use NGC containers.
Multi-Node Launching: Requires a hostfile or configuration for Slurm, MPI, or pdsh.
Containerized Setup: Docker and Apptainer (Singularity) containers are provided.
Documentation: GPT-NeoX Documentation

Highlighted Details

Supports ZeRO and 3D parallelism for distributed training.
Integrates Flash Attention 2.x, Transformer Engine, and fused kernels for AMD GPUs.
Offers predefined configurations for popular architectures (Pythia, Falcon, LLaMA 1 & 2).
Includes support for Mixture-of-Experts (MoE) with various configurations.
Facilitates easy integration with Hugging Face, Weights & Biases, Comet ML, and TensorBoard.

Maintenance & Community

The project is actively maintained by EleutherAI and has seen contributions from numerous academic and industry labs. It is used by organizations like Oak Ridge National Lab, Stability AI, and Together.ai.

Licensing & Compatibility

Licensed under the Apache License, Version 2.0. Modifications of NVIDIA code retain NVIDIA copyright headers. Derivative works must preserve these headers.

Limitations & Caveats

This library is specifically designed for training models with billions of parameters from scratch; it is not recommended for generic inference needs, for which Hugging Face Transformers is suggested. Pipeline parallelism is noted as "coming soon" in some sections.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

21 stars in the last 30 days