grok-1  by xai-org

JAX example code for loading and running Grok-1 open-weights model

Created 1 year ago
50,507 stars

Top 0.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides JAX example code for loading and running the Grok-1 open-weights model, a 314B parameter Mixture of Experts (MoE) language model. It is intended for researchers and developers interested in experimenting with large-scale MoE architectures.

How It Works

Grok-1 utilizes a Mixture of Experts (MoE) architecture, specifically employing 8 experts with 2 experts activated per token. The model features 64 layers, 48 attention heads for queries, and 8 for keys/values, with an embedding size of 6,144. It incorporates Rotary Positional Embeddings (RoPE) and supports activation sharding and 8-bit quantization, with a maximum context length of 8,192 tokens. The provided MoE layer implementation prioritizes correctness over efficiency, avoiding custom kernels.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Run example: python run.py
  • Requires significant GPU memory due to the 314B parameter size.
  • Weights can be downloaded via torrent or HuggingFace Hub.

Highlighted Details

  • 314B parameter Mixture of Experts (MoE) model.
  • 8 experts, 2 utilized per token.
  • Supports activation sharding and 8-bit quantization.
  • Maximum sequence length of 8,192 tokens.

Maintenance & Community

No specific community channels, roadmap, or contributor information is detailed in the README.

Licensing & Compatibility

The code and Grok-1 model weights are released under the Apache 2.0 license. This license permits commercial use and integration with closed-source projects.

Limitations & Caveats

The MoE layer implementation is noted as inefficient, chosen for validation rather than performance. Running the model requires substantial GPU resources.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
218 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0.2%
462
MoE model for research
Created 4 months ago
Updated 4 weeks ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Binyuan Hui Binyuan Hui(Research Scientist at Alibaba Qwen), and
3 more.

xgen by salesforce

0.1%
723
LLM research release with 8k sequence length
Created 2 years ago
Updated 7 months ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

llm-awq by mit-han-lab

0.3%
3k
Weight quantization research paper for LLM compression/acceleration
Created 2 years ago
Updated 2 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
7 more.

gemma.cpp by google

0.1%
7k
C++ inference engine for Google's Gemma models
Created 1 year ago
Updated 1 day ago
Starred by Didier Lopes Didier Lopes(Founder of OpenBB), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

DeepSeek-Coder-V2 by deepseek-ai

0.3%
6k
Open-source code language model comparable to GPT4-Turbo
Created 1 year ago
Updated 11 months ago
Feedback? Help us improve.