gigaGPT by Cerebras

Simple codebase for training large language models

Created 2 years ago

318 stars

Top 85.2% on SourcePulse

Project Summary

gigaGPT provides a simplified PyTorch codebase for training large language models (LLMs) up to GPT-3 scale, inspired by nanoGPT. It targets researchers and engineers aiming to train massive models with minimal code complexity, leveraging Cerebras hardware for efficient scaling.

How It Works

gigaGPT implements the basic GPT-2 architecture with learned positional embeddings and standard attention, mirroring nanoGPT's structure. Its core advantage lies in its extreme conciseness (565 lines of Python) and its design for seamless scaling on Cerebras hardware, utilizing weight streaming and data parallelism for exaflop-scale clusters. This approach contrasts with complex frameworks like Megatron-LM, offering a more accessible path to large-scale LLM training.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt (for GPU) or pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu (for CPU/CSX).
Data preprocessing: python data/openwebtext/prepare.py
Training: python train.py configs/<config_file>.yaml (e.g., configs/111m.yaml, configs/70b.yaml).
Evaluation: python eval.py configs/<config_file>.yaml --checkpoint_path <path_to_checkpoint>
Generation: python sample.py --checkpoint_path <model_dir/checkpoint.mdl>
Prerequisites: PyTorch, cerebras.pytorch (specific to Cerebras hardware). OpenWebText dataset recommended for larger models.
Notes: 70B models require Cerebras hardware due to GPU memory limitations.

Highlighted Details

Successfully trained models up to 175B parameters.
Codebase is 565 lines of Python, significantly smaller than Megatron-LM (20,507 lines).
Configurations provided for 111M, 13B, 70B, and 175B parameter models.
Validation focused on functional correctness and high throughput rather than convergence metrics.

Maintenance & Community

Developed by Cerebras.
Further details on technical overview available via a linked blog post.

Licensing & Compatibility

License not explicitly stated in the README.

Limitations & Caveats

Primarily optimized for Cerebras hardware; performance on standard GPUs or CPUs for large models is limited.
Validation focused on functional correctness, not optimal convergence or downstream performance, requiring users to carefully tune hyperparameters for large-scale runs.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days