Simple codebase for training large language models
Top 88.4% on sourcepulse
gigaGPT provides a simplified PyTorch codebase for training large language models (LLMs) up to GPT-3 scale, inspired by nanoGPT. It targets researchers and engineers aiming to train massive models with minimal code complexity, leveraging Cerebras hardware for efficient scaling.
How It Works
gigaGPT implements the basic GPT-2 architecture with learned positional embeddings and standard attention, mirroring nanoGPT's structure. Its core advantage lies in its extreme conciseness (565 lines of Python) and its design for seamless scaling on Cerebras hardware, utilizing weight streaming and data parallelism for exaflop-scale clusters. This approach contrasts with complex frameworks like Megatron-LM, offering a more accessible path to large-scale LLM training.
Quick Start & Requirements
pip install -r requirements.txt
(for GPU) or pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu
(for CPU/CSX).python data/openwebtext/prepare.py
python train.py configs/<config_file>.yaml
(e.g., configs/111m.yaml
, configs/70b.yaml
).python eval.py configs/<config_file>.yaml --checkpoint_path <path_to_checkpoint>
python sample.py --checkpoint_path <model_dir/checkpoint.mdl>
cerebras.pytorch
(specific to Cerebras hardware). OpenWebText dataset recommended for larger models.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
3 months ago
Inactive