JAX example code for loading and running Grok-1 open-weights model
Top 0.5% on sourcepulse
This repository provides JAX example code for loading and running the Grok-1 open-weights model, a 314B parameter Mixture of Experts (MoE) language model. It is intended for researchers and developers interested in experimenting with large-scale MoE architectures.
How It Works
Grok-1 utilizes a Mixture of Experts (MoE) architecture, specifically employing 8 experts with 2 experts activated per token. The model features 64 layers, 48 attention heads for queries, and 8 for keys/values, with an embedding size of 6,144. It incorporates Rotary Positional Embeddings (RoPE) and supports activation sharding and 8-bit quantization, with a maximum context length of 8,192 tokens. The provided MoE layer implementation prioritizes correctness over efficiency, avoiding custom kernels.
Quick Start & Requirements
pip install -r requirements.txt
python run.py
Highlighted Details
Maintenance & Community
No specific community channels, roadmap, or contributor information is detailed in the README.
Licensing & Compatibility
The code and Grok-1 model weights are released under the Apache 2.0 license. This license permits commercial use and integration with closed-source projects.
Limitations & Caveats
The MoE layer implementation is noted as inefficient, chosen for validation rather than performance. Running the model requires substantial GPU resources.
11 months ago
Inactive