MoE for code and math, using gradient-informed routing
Top 97.5% on sourcepulse
GRIN-MoE is a 6.6B active parameter Mixture-of-Experts (MoE) language model designed for memory/compute-constrained and latency-bound environments, excelling in coding and mathematics tasks. It targets researchers and developers building generative AI applications requiring strong reasoning capabilities.
How It Works
GRIN-MoE employs SparseMixer-v2 for gradient-informed expert routing, a departure from conventional MoE training that uses gating as a proxy. This approach enables efficient scaling without expert parallelism or token dropping, leading to improved performance with fewer active parameters.
Quick Start & Requirements
curl https://raw.githubusercontent.com/microsoft/GRIN-MoE/main/demo/demo.sh | bash -s
(requires Docker)docker run --gpus all -p 8887:8887 --rm nvcr.io/nvidia/pytorch:24.08-py3 /bin/bash -c 'git clone https://github.com/microsoft/GRIN-MoE.git && jupyter notebook --port 8887 --notebook-dir GRIN-MoE/demo'
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The model is primarily trained on English and may exhibit reduced performance on other languages or English dialects with less representation. It can perpetuate societal biases and generate inaccurate or offensive content, requiring careful evaluation and mitigation for sensitive applications. Code generation is primarily focused on Python.
10 months ago
Inactive