grok-1 by xai-org

JAX example code for loading and running Grok-1 open-weights model

Created 1 year ago

50,565 stars

Top 0.5% on SourcePulse

View on GitHub

27 Experts Love This Project

George Hotz

Author of tinygrad; Founder of the tiny corp, comma.ai

Cofounder of Mintlify

and 23 more!

Project Summary

This repository provides JAX example code for loading and running the Grok-1 open-weights model, a 314B parameter Mixture of Experts (MoE) language model. It is intended for researchers and developers interested in experimenting with large-scale MoE architectures.

How It Works

Grok-1 utilizes a Mixture of Experts (MoE) architecture, specifically employing 8 experts with 2 experts activated per token. The model features 64 layers, 48 attention heads for queries, and 8 for keys/values, with an embedding size of 6,144. It incorporates Rotary Positional Embeddings (RoPE) and supports activation sharding and 8-bit quantization, with a maximum context length of 8,192 tokens. The provided MoE layer implementation prioritizes correctness over efficiency, avoiding custom kernels.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt
Run example: python run.py
Requires significant GPU memory due to the 314B parameter size.
Weights can be downloaded via torrent or HuggingFace Hub.

Highlighted Details

314B parameter Mixture of Experts (MoE) model.
8 experts, 2 utilized per token.
Supports activation sharding and 8-bit quantization.
Maximum sequence length of 8,192 tokens.

Maintenance & Community

No specific community channels, roadmap, or contributor information is detailed in the README.

Licensing & Compatibility

The code and Grok-1 model weights are released under the Apache 2.0 license. This license permits commercial use and integration with closed-source projects.

Limitations & Caveats

The MoE layer implementation is noted as inefficient, chosen for validation rather than performance. Running the model requires substantial GPU resources.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

80 stars in the last 30 days