gigaGPT  by Cerebras

Simple codebase for training large language models

created 1 year ago
307 stars

Top 88.4% on sourcepulse

GitHubView on GitHub
Project Summary

gigaGPT provides a simplified PyTorch codebase for training large language models (LLMs) up to GPT-3 scale, inspired by nanoGPT. It targets researchers and engineers aiming to train massive models with minimal code complexity, leveraging Cerebras hardware for efficient scaling.

How It Works

gigaGPT implements the basic GPT-2 architecture with learned positional embeddings and standard attention, mirroring nanoGPT's structure. Its core advantage lies in its extreme conciseness (565 lines of Python) and its design for seamless scaling on Cerebras hardware, utilizing weight streaming and data parallelism for exaflop-scale clusters. This approach contrasts with complex frameworks like Megatron-LM, offering a more accessible path to large-scale LLM training.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt (for GPU) or pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu (for CPU/CSX).
  • Data preprocessing: python data/openwebtext/prepare.py
  • Training: python train.py configs/<config_file>.yaml (e.g., configs/111m.yaml, configs/70b.yaml).
  • Evaluation: python eval.py configs/<config_file>.yaml --checkpoint_path <path_to_checkpoint>
  • Generation: python sample.py --checkpoint_path <model_dir/checkpoint.mdl>
  • Prerequisites: PyTorch, cerebras.pytorch (specific to Cerebras hardware). OpenWebText dataset recommended for larger models.
  • Notes: 70B models require Cerebras hardware due to GPU memory limitations.

Highlighted Details

  • Successfully trained models up to 175B parameters.
  • Codebase is 565 lines of Python, significantly smaller than Megatron-LM (20,507 lines).
  • Configurations provided for 111M, 13B, 70B, and 175B parameter models.
  • Validation focused on functional correctness and high throughput rather than convergence metrics.

Maintenance & Community

  • Developed by Cerebras.
  • Further details on technical overview available via a linked blog post.

Licensing & Compatibility

  • License not explicitly stated in the README.

Limitations & Caveats

  • Primarily optimized for Cerebras hardware; performance on standard GPUs or CPUs for large models is limited.
  • Validation focused on functional correctness, not optimal convergence or downstream performance, requiring users to carefully tune hyperparameters for large-scale runs.
Health Check
Last commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Feedback? Help us improve.