makeMoE  by AviSoori1x

Sparse mixture of experts language model from scratch

created 1 year ago
732 stars

Top 48.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a from-scratch implementation of a sparse mixture of experts (MoE) language model, inspired by Andrej Karpathy's makemore project. It targets researchers and developers interested in understanding and experimenting with MoE architectures for autoregressive character-level language modeling, offering a highly hackable and educational resource.

How It Works

The core innovation is the replacement of a standard feed-forward network with a sparsely-gated MoE layer. This architecture utilizes top-k gating (and noisy top-k gating) to route input tokens to a selected subset of "expert" feed-forward networks. This approach aims for greater parameter efficiency and potentially improved performance by allowing different parts of the model to specialize. The implementation leverages PyTorch and borrows reusable components from the makemore project.

Quick Start & Requirements

  • Install: pip install torch
  • Prerequisites: PyTorch. Databricks environment with MLFlow is recommended for tracking but optional.
  • Resources: Developed on a single A100 GPU; can scale to larger clusters.
  • Docs: HuggingFace Blog Part 1, HuggingFace Blog Part 2

Highlighted Details

  • Single-file PyTorch implementation (makeMoE.py).
  • Includes detailed walkthrough notebooks for understanding the architecture and expert capacity.
  • References key MoE publications: Sparsely-Gated MoE and Mixtral of Experts.
  • Focuses on readability and hackability over raw performance.

Maintenance & Community

  • Developed by AviSoori1x.
  • MLFlow integration is optional but encouraged for metric tracking.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The implementation emphasizes readability and hackability, meaning performance optimizations are not a primary focus and may be required for production use. The license is not specified, which could impact commercial adoption.

Health Check
Last commit

9 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
37 stars in the last 90 days

Explore Similar Projects

Starred by Logan Kilpatrick Logan Kilpatrick(Product Lead on Google AI Studio), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

catalyst by catalyst-team

0%
3k
PyTorch framework for accelerated deep learning R&D
created 7 years ago
updated 1 month ago
Feedback? Help us improve.