Sparse mixture of experts language model from scratch
Top 48.2% on sourcepulse
This repository provides a from-scratch implementation of a sparse mixture of experts (MoE) language model, inspired by Andrej Karpathy's makemore project. It targets researchers and developers interested in understanding and experimenting with MoE architectures for autoregressive character-level language modeling, offering a highly hackable and educational resource.
How It Works
The core innovation is the replacement of a standard feed-forward network with a sparsely-gated MoE layer. This architecture utilizes top-k gating (and noisy top-k gating) to route input tokens to a selected subset of "expert" feed-forward networks. This approach aims for greater parameter efficiency and potentially improved performance by allowing different parts of the model to specialize. The implementation leverages PyTorch and borrows reusable components from the makemore project.
Quick Start & Requirements
pip install torch
Highlighted Details
makeMoE.py
).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The implementation emphasizes readability and hackability, meaning performance optimizations are not a primary focus and may be required for production use. The license is not specified, which could impact commercial adoption.
9 months ago
1 day