grok-1  by xai-org

JAX example code for loading and running Grok-1 open-weights model

Created 2 years ago
51,520 stars

Top 0.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides JAX example code for loading and running the Grok-1 open-weights model, a 314B parameter Mixture of Experts (MoE) language model. It is intended for researchers and developers interested in experimenting with large-scale MoE architectures.

How It Works

Grok-1 utilizes a Mixture of Experts (MoE) architecture, specifically employing 8 experts with 2 experts activated per token. The model features 64 layers, 48 attention heads for queries, and 8 for keys/values, with an embedding size of 6,144. It incorporates Rotary Positional Embeddings (RoPE) and supports activation sharding and 8-bit quantization, with a maximum context length of 8,192 tokens. The provided MoE layer implementation prioritizes correctness over efficiency, avoiding custom kernels.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Run example: python run.py
  • Requires significant GPU memory due to the 314B parameter size.
  • Weights can be downloaded via torrent or HuggingFace Hub.

Highlighted Details

  • 314B parameter Mixture of Experts (MoE) model.
  • 8 experts, 2 utilized per token.
  • Supports activation sharding and 8-bit quantization.
  • Maximum sequence length of 8,192 tokens.

Maintenance & Community

No specific community channels, roadmap, or contributor information is detailed in the README.

Licensing & Compatibility

The code and Grok-1 model weights are released under the Apache 2.0 license. This license permits commercial use and integration with closed-source projects.

Limitations & Caveats

The MoE layer implementation is noted as inefficient, chosen for validation rather than performance. Running the model requires substantial GPU resources.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
126 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Binyuan Hui Binyuan Hui(Research Scientist at Alibaba Qwen), and
3 more.

xgen by salesforce

0%
726
LLM research release with 8k sequence length
Created 2 years ago
Updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
7 more.

gemma.cpp by google

0.8%
7k
C++ inference engine for Google's Gemma models
Created 2 years ago
Updated 3 days ago
Starred by Didier Lopes Didier Lopes(Founder of OpenBB), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

DeepSeek-Coder-V2 by deepseek-ai

0.3%
7k
Open-source code language model comparable to GPT4-Turbo
Created 1 year ago
Updated 5 months ago
Feedback? Help us improve.