scattermoe by shawntan

Triton-based Sparse Mixture-of-Experts for efficient deep learning

Created 2 years ago

269 stars

Top 95.6% on SourcePulse

View on GitHub

4 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Lianmin Zheng

Coauthor of SGLang, vLLM

Casper Hansen

Author of AutoAWQ

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

ScatterMoE offers a Triton-based implementation of Sparse Mixture-of-Experts (SMoE) optimized for GPU performance. It targets researchers and engineers aiming to improve inference and training speeds, and reduce memory footprint in deep learning models. The project provides a lightweight, efficient SMoE solution by addressing limitations in existing implementations.

How It Works

This implementation leverages Triton for high-performance GPU kernels, focusing on an efficient SMoE approach. It avoids common performance bottlenecks such as input padding and excessive data copying. Key innovations include fusing expert linear transforms and reordering operations into a ParallelLinear module, streamlining computation. This design results in a remarkably lightweight codebase, estimated at approximately 700 lines.

Quick Start & Requirements

Installation involves cloning the repository, navigating into the directory, and running pip install -e .. Basic functionality can be verified with PYTHONPATH=. pytest tests. The project requires a GPU environment due to its Triton backend. It integrates with Hugging Face Transformers models, with specific instructions provided for injecting MoE functionality into transformers.models.gpt_oss and transformers.models.granitemoehybrid.

Paper: arXiv preprint arXiv:2403.08245

Highlighted Details

Performance gains: Improved inference/training speed and reduced memory footprint via elimination of padding and excessive copies.
Core mechanism: Fused expert linear transforms and reordering operations using the ParallelLinear module.
Integration: Seamless integration with Hugging Face Transformers models.
Codebase: Lightweight implementation (~700 lines of Triton/Python).

Maintenance & Community

The provided README does not detail specific community channels (e.g., Discord, Slack), active contributors beyond the authors of the cited paper, or a public roadmap.

Licensing & Compatibility

The license type is not specified in the provided README. Compatibility for commercial use or linking with closed-source projects cannot be determined without this information.

Limitations & Caveats

ScatterMoE is designed to work within existing distributed training frameworks like FSDP or pipeline parallelism but does not include its own multi-node training infrastructure code.

Health Check

Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days