Framework for training large-scale autoregressive language models
Top 7.2% on sourcepulse
EleutherAI/gpt-neox is a PyTorch framework for training large-scale autoregressive language models, built upon NVIDIA's Megatron-LM and DeepSpeed libraries. It is designed for researchers and engineers focused on training models with billions of parameters from scratch, offering advanced distributed training capabilities and support for cutting-edge architectural innovations.
How It Works
GPT-NeoX employs a combination of 3D parallelism (tensor, pipeline, and data parallelism) and DeepSpeed's ZeRO optimizer for efficient memory usage and distributed training. It integrates novel architectural features like rotary and ALiBi positional embeddings, parallel feedforward layers, and FlashAttention for performance gains. The framework supports various launching mechanisms (Slurm, MPI, IBM Job Step Manager) and hardware platforms, including AMD GPUs.
Quick Start & Requirements
pip install -r requirements/requirements.txt
(additional requirements for logging and fused kernels).from megatron.fused_kernels import load; load()
../requirements/requirements-flashattention.txt
or use NGC containers../requirements/requirements-transformer-engine.txt
or use NGC containers.Highlighted Details
Maintenance & Community
The project is actively maintained by EleutherAI and has seen contributions from numerous academic and industry labs. It is used by organizations like Oak Ridge National Lab, Stability AI, and Together.ai.
Licensing & Compatibility
Licensed under the Apache License, Version 2.0. Modifications of NVIDIA code retain NVIDIA copyright headers. Derivative works must preserve these headers.
Limitations & Caveats
This library is specifically designed for training models with billions of parameters from scratch; it is not recommended for generic inference needs, for which Hugging Face Transformers is suggested. Pipeline parallelism is noted as "coming soon" in some sections.
1 week ago
1 day