makeMoE by AviSoori1x

Sparse mixture of experts language model from scratch

created 1 year ago

732 stars

Top 48.2% on sourcepulse

Project Summary

This repository provides a from-scratch implementation of a sparse mixture of experts (MoE) language model, inspired by Andrej Karpathy's makemore project. It targets researchers and developers interested in understanding and experimenting with MoE architectures for autoregressive character-level language modeling, offering a highly hackable and educational resource.

How It Works

The core innovation is the replacement of a standard feed-forward network with a sparsely-gated MoE layer. This architecture utilizes top-k gating (and noisy top-k gating) to route input tokens to a selected subset of "expert" feed-forward networks. This approach aims for greater parameter efficiency and potentially improved performance by allowing different parts of the model to specialize. The implementation leverages PyTorch and borrows reusable components from the makemore project.

Quick Start & Requirements

Install: pip install torch
Prerequisites: PyTorch. Databricks environment with MLFlow is recommended for tracking but optional.
Resources: Developed on a single A100 GPU; can scale to larger clusters.
Docs: HuggingFace Blog Part 1, HuggingFace Blog Part 2

Highlighted Details

Single-file PyTorch implementation (makeMoE.py).
Includes detailed walkthrough notebooks for understanding the architecture and expert capacity.
References key MoE publications: Sparsely-Gated MoE and Mixtral of Experts.
Focuses on readability and hackability over raw performance.

Maintenance & Community

Developed by AviSoori1x.
MLFlow integration is optional but encouraged for metric tracking.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The implementation emphasizes readability and hackability, meaning performance optimizations are not a primary focus and may be required for production use. The license is not specified, which could impact commercial adoption.

makeMoE by AviSoori1x

Explore Similar Projects

ArchScale by microsoft

curated-transformers by explosion

TokenFormer by Haiyang-W

zeta by kyegomez

Moonlight by MoonshotAI

LaMDA-rlhf-pytorch by conceptofmind

OLMoE by allenai

MixtralKit by open-compass

smol-vision by merveenoyan

flash-linear-attention by fla-org

catalyst by catalyst-team

Efficient-AI-Backbones by huawei-noah