grok-1  by xai-org

JAX example code for loading and running Grok-1 open-weights model

created 1 year ago
50,394 stars

Top 0.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides JAX example code for loading and running the Grok-1 open-weights model, a 314B parameter Mixture of Experts (MoE) language model. It is intended for researchers and developers interested in experimenting with large-scale MoE architectures.

How It Works

Grok-1 utilizes a Mixture of Experts (MoE) architecture, specifically employing 8 experts with 2 experts activated per token. The model features 64 layers, 48 attention heads for queries, and 8 for keys/values, with an embedding size of 6,144. It incorporates Rotary Positional Embeddings (RoPE) and supports activation sharding and 8-bit quantization, with a maximum context length of 8,192 tokens. The provided MoE layer implementation prioritizes correctness over efficiency, avoiding custom kernels.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Run example: python run.py
  • Requires significant GPU memory due to the 314B parameter size.
  • Weights can be downloaded via torrent or HuggingFace Hub.

Highlighted Details

  • 314B parameter Mixture of Experts (MoE) model.
  • 8 experts, 2 utilized per token.
  • Supports activation sharding and 8-bit quantization.
  • Maximum sequence length of 8,192 tokens.

Maintenance & Community

No specific community channels, roadmap, or contributor information is detailed in the README.

Licensing & Compatibility

The code and Grok-1 model weights are released under the Apache 2.0 license. This license permits commercial use and integration with closed-source projects.

Limitations & Caveats

The MoE layer implementation is noted as inefficient, chosen for validation rather than performance. Running the model requires substantial GPU resources.

Health Check
Last commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
454 stars in the last 90 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-V2 by deepseek-ai

0.1%
5k
MoE language model for research/API use
created 1 year ago
updated 10 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Feedback? Help us improve.