gpt-neox  by EleutherAI

Framework for training large-scale autoregressive language models

created 4 years ago
7,269 stars

Top 7.2% on sourcepulse

GitHubView on GitHub
Project Summary

EleutherAI/gpt-neox is a PyTorch framework for training large-scale autoregressive language models, built upon NVIDIA's Megatron-LM and DeepSpeed libraries. It is designed for researchers and engineers focused on training models with billions of parameters from scratch, offering advanced distributed training capabilities and support for cutting-edge architectural innovations.

How It Works

GPT-NeoX employs a combination of 3D parallelism (tensor, pipeline, and data parallelism) and DeepSpeed's ZeRO optimizer for efficient memory usage and distributed training. It integrates novel architectural features like rotary and ALiBi positional embeddings, parallel feedforward layers, and FlashAttention for performance gains. The framework supports various launching mechanisms (Slurm, MPI, IBM Job Step Manager) and hardware platforms, including AMD GPUs.

Quick Start & Requirements

  • Installation: pip install -r requirements/requirements.txt (additional requirements for logging and fused kernels).
  • Prerequisites: Python 3.8-3.10, PyTorch 1.8-2.0. Recommended to use environment isolation (e.g., Anaconda).
  • Fused Kernels: For AMD GPUs or manual pre-build: from megatron.fused_kernels import load; load().
  • Flash Attention: Install from ./requirements/requirements-flashattention.txt or use NGC containers.
  • Transformer Engine: Install from ./requirements/requirements-transformer-engine.txt or use NGC containers.
  • Multi-Node Launching: Requires a hostfile or configuration for Slurm, MPI, or pdsh.
  • Containerized Setup: Docker and Apptainer (Singularity) containers are provided.
  • Documentation: GPT-NeoX Documentation

Highlighted Details

  • Supports ZeRO and 3D parallelism for distributed training.
  • Integrates Flash Attention 2.x, Transformer Engine, and fused kernels for AMD GPUs.
  • Offers predefined configurations for popular architectures (Pythia, Falcon, LLaMA 1 & 2).
  • Includes support for Mixture-of-Experts (MoE) with various configurations.
  • Facilitates easy integration with Hugging Face, Weights & Biases, Comet ML, and TensorBoard.

Maintenance & Community

The project is actively maintained by EleutherAI and has seen contributions from numerous academic and industry labs. It is used by organizations like Oak Ridge National Lab, Stability AI, and Together.ai.

Licensing & Compatibility

Licensed under the Apache License, Version 2.0. Modifications of NVIDIA code retain NVIDIA copyright headers. Derivative works must preserve these headers.

Limitations & Caveats

This library is specifically designed for training models with billions of parameters from scratch; it is not recommended for generic inference needs, for which Hugging Face Transformers is suggested. Pipeline parallelism is noted as "coming soon" in some sections.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
127 stars in the last 90 days

Explore Similar Projects

Starred by Tri Dao Tri Dao(Chief Scientist at Together AI), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
1 more.

oslo by tunib-ai

0%
309
Framework for large-scale transformer optimization
created 3 years ago
updated 2 years ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

InternEvo by InternLM

1.0%
402
Lightweight training framework for model pre-training
created 1 year ago
updated 1 week ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
5 more.

Liger-Kernel by linkedin

0.6%
5k
Triton kernels for efficient LLM training
created 1 year ago
updated 1 day ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Anton Bukov Anton Bukov(Cofounder of 1inch Network), and
16 more.

tinygrad by tinygrad

0.1%
30k
Minimalist deep learning framework for education and exploration
created 4 years ago
updated 19 hours ago
Feedback? Help us improve.