GRIN-MoE  by microsoft

MoE for code and math, using gradient-informed routing

created 11 months ago
264 stars

Top 97.5% on sourcepulse

GitHubView on GitHub
Project Summary

GRIN-MoE is a 6.6B active parameter Mixture-of-Experts (MoE) language model designed for memory/compute-constrained and latency-bound environments, excelling in coding and mathematics tasks. It targets researchers and developers building generative AI applications requiring strong reasoning capabilities.

How It Works

GRIN-MoE employs SparseMixer-v2 for gradient-informed expert routing, a departure from conventional MoE training that uses gating as a proxy. This approach enables efficient scaling without expert parallelism or token dropping, leading to improved performance with fewer active parameters.

Quick Start & Requirements

  • Inference Demo: curl https://raw.githubusercontent.com/microsoft/GRIN-MoE/main/demo/demo.sh | bash -s (requires Docker)
  • Interactive Demo: Launch a Jupyter notebook via Docker: docker run --gpus all -p 8887:8887 --rm nvcr.io/nvidia/pytorch:24.08-py3 /bin/bash -c 'git clone https://github.com/microsoft/GRIN-MoE.git && jupyter notebook --port 8887 --notebook-dir GRIN-MoE/demo'
  • Prerequisites: Docker, NVIDIA GPUs (for demo scripts).

Highlighted Details

  • Achieves 79.6 average score across popular benchmarks, outperforming Mixtral 8x7B and Llama3 8B.
  • Demonstrates strong performance in coding (HumanEval: 74.4, MBPP: 80.3) and mathematics (GSM-8K: 90.4).
  • Trained on 4.0T tokens, including high-quality educational and synthetic data.
  • Context length is 4K tokens.

Maintenance & Community

Licensing & Compatibility

  • Licensed under the MIT license, permitting commercial use and modification.

Limitations & Caveats

The model is primarily trained on English and may exhibit reduced performance on other languages or English dialects with less representation. It can perpetuate societal biases and generate inaccurate or offensive content, requiring careful evaluation and mitigation for sensitive applications. Code generation is primarily focused on Python.

Health Check
Last commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

yarn by jquesnelle

1.0%
2k
Context window extension method for LLMs (research paper, models)
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

DeepSeek-Coder-V2 by deepseek-ai

0.4%
6k
Open-source code language model comparable to GPT4-Turbo
created 1 year ago
updated 10 months ago
Feedback? Help us improve.