GRIN-MoE by microsoft

MoE for code and math, using gradient-informed routing

Created 1 year ago

264 stars

Top 96.8% on SourcePulse

Project Summary

GRIN-MoE is a 6.6B active parameter Mixture-of-Experts (MoE) language model designed for memory/compute-constrained and latency-bound environments, excelling in coding and mathematics tasks. It targets researchers and developers building generative AI applications requiring strong reasoning capabilities.

How It Works

GRIN-MoE employs SparseMixer-v2 for gradient-informed expert routing, a departure from conventional MoE training that uses gating as a proxy. This approach enables efficient scaling without expert parallelism or token dropping, leading to improved performance with fewer active parameters.

Quick Start & Requirements

Inference Demo: curl https://raw.githubusercontent.com/microsoft/GRIN-MoE/main/demo/demo.sh | bash -s (requires Docker)
Interactive Demo: Launch a Jupyter notebook via Docker: docker run --gpus all -p 8887:8887 --rm nvcr.io/nvidia/pytorch:24.08-py3 /bin/bash -c 'git clone https://github.com/microsoft/GRIN-MoE.git && jupyter notebook --port 8887 --notebook-dir GRIN-MoE/demo'
Prerequisites: Docker, NVIDIA GPUs (for demo scripts).

Highlighted Details

Achieves 79.6 average score across popular benchmarks, outperforming Mixtral 8x7B and Llama3 8B.
Demonstrates strong performance in coding (HumanEval: 74.4, MBPP: 80.3) and mathematics (GSM-8K: 90.4).
Trained on 4.0T tokens, including high-quality educational and synthetic data.
Context length is 4K tokens.

Maintenance & Community

Developed by Microsoft.
Model weights and code are available on Hugging Face.
Technical report available: https://arxiv.org/html/2409.12136v1

Licensing & Compatibility

Licensed under the MIT license, permitting commercial use and modification.

Limitations & Caveats

The model is primarily trained on English and may exhibit reduced performance on other languages or English dialects with less representation. It can perpetuate societal biases and generate inaccurate or offensive content, requiring careful evaluation and mitigation for sensitive applications. Code generation is primarily focused on Python.

GRIN-MoE by microsoft

Explore Similar Projects

MoE-plus-plus by SkyworkAI

Seed-Thinking-v1.5 by ByteDance-Seed

MoE-Infinity by EfficientMoE

Black-Box-Tuning by txsun1997

yalm by andrewkchan

ort by pytorch

Moonlight by MoonshotAI

X-R1 by dhcode-cpp

Muon by KellerJordan

Tutel by microsoft

GaLore by jiaweizzhao

megablocks by databricks