gpt-neox  by EleutherAI

Framework for training large-scale autoregressive language models

Created 4 years ago
7,304 stars

Top 7.1% on SourcePulse

GitHubView on GitHub
Project Summary

EleutherAI/gpt-neox is a PyTorch framework for training large-scale autoregressive language models, built upon NVIDIA's Megatron-LM and DeepSpeed libraries. It is designed for researchers and engineers focused on training models with billions of parameters from scratch, offering advanced distributed training capabilities and support for cutting-edge architectural innovations.

How It Works

GPT-NeoX employs a combination of 3D parallelism (tensor, pipeline, and data parallelism) and DeepSpeed's ZeRO optimizer for efficient memory usage and distributed training. It integrates novel architectural features like rotary and ALiBi positional embeddings, parallel feedforward layers, and FlashAttention for performance gains. The framework supports various launching mechanisms (Slurm, MPI, IBM Job Step Manager) and hardware platforms, including AMD GPUs.

Quick Start & Requirements

  • Installation: pip install -r requirements/requirements.txt (additional requirements for logging and fused kernels).
  • Prerequisites: Python 3.8-3.10, PyTorch 1.8-2.0. Recommended to use environment isolation (e.g., Anaconda).
  • Fused Kernels: For AMD GPUs or manual pre-build: from megatron.fused_kernels import load; load().
  • Flash Attention: Install from ./requirements/requirements-flashattention.txt or use NGC containers.
  • Transformer Engine: Install from ./requirements/requirements-transformer-engine.txt or use NGC containers.
  • Multi-Node Launching: Requires a hostfile or configuration for Slurm, MPI, or pdsh.
  • Containerized Setup: Docker and Apptainer (Singularity) containers are provided.
  • Documentation: GPT-NeoX Documentation

Highlighted Details

  • Supports ZeRO and 3D parallelism for distributed training.
  • Integrates Flash Attention 2.x, Transformer Engine, and fused kernels for AMD GPUs.
  • Offers predefined configurations for popular architectures (Pythia, Falcon, LLaMA 1 & 2).
  • Includes support for Mixture-of-Experts (MoE) with various configurations.
  • Facilitates easy integration with Hugging Face, Weights & Biases, Comet ML, and TensorBoard.

Maintenance & Community

The project is actively maintained by EleutherAI and has seen contributions from numerous academic and industry labs. It is used by organizations like Oak Ridge National Lab, Stability AI, and Together.ai.

Licensing & Compatibility

Licensed under the Apache License, Version 2.0. Modifications of NVIDIA code retain NVIDIA copyright headers. Derivative works must preserve these headers.

Limitations & Caveats

This library is specifically designed for training models with billions of parameters from scratch; it is not recommended for generic inference needs, for which Hugging Face Transformers is suggested. Pipeline parallelism is noted as "coming soon" in some sections.

Health Check
Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
31 stars in the last 30 days

Explore Similar Projects

Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
6 more.

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
Created 11 months ago
Updated 2 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
26 more.

ColossalAI by hpcaitech

0.1%
41k
AI system for large-scale parallel training
Created 3 years ago
Updated 13 hours ago
Feedback? Help us improve.