scattermoe  by shawntan

Triton-based Sparse Mixture-of-Experts for efficient deep learning

Created 1 year ago
253 stars

Top 99.3% on SourcePulse

GitHubView on GitHub
Project Summary

ScatterMoE offers a Triton-based implementation of Sparse Mixture-of-Experts (SMoE) optimized for GPU performance. It targets researchers and engineers aiming to improve inference and training speeds, and reduce memory footprint in deep learning models. The project provides a lightweight, efficient SMoE solution by addressing limitations in existing implementations.

How It Works

This implementation leverages Triton for high-performance GPU kernels, focusing on an efficient SMoE approach. It avoids common performance bottlenecks such as input padding and excessive data copying. Key innovations include fusing expert linear transforms and reordering operations into a ParallelLinear module, streamlining computation. This design results in a remarkably lightweight codebase, estimated at approximately 700 lines.

Quick Start & Requirements

Installation involves cloning the repository, navigating into the directory, and running pip install -e .. Basic functionality can be verified with PYTHONPATH=. pytest tests. The project requires a GPU environment due to its Triton backend. It integrates with Hugging Face Transformers models, with specific instructions provided for injecting MoE functionality into transformers.models.gpt_oss and transformers.models.granitemoehybrid.

Highlighted Details

  • Performance gains: Improved inference/training speed and reduced memory footprint via elimination of padding and excessive copies.
  • Core mechanism: Fused expert linear transforms and reordering operations using the ParallelLinear module.
  • Integration: Seamless integration with Hugging Face Transformers models.
  • Codebase: Lightweight implementation (~700 lines of Triton/Python).

Maintenance & Community

The provided README does not detail specific community channels (e.g., Discord, Slack), active contributors beyond the authors of the cited paper, or a public roadmap.

Licensing & Compatibility

The license type is not specified in the provided README. Compatibility for commercial use or linking with closed-source projects cannot be determined without this information.

Limitations & Caveats

ScatterMoE is designed to work within existing distributed training frameworks like FSDP or pipeline parallelism but does not include its own multi-node training infrastructure code.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Tri Dao Tri Dao(Chief Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
1 more.

oslo by tunib-ai

0%
309
Framework for large-scale transformer optimization
Created 4 years ago
Updated 3 years ago
Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
791
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

VeOmni by ByteDance-Seed

2.4%
1k
Framework for scaling multimodal model training across accelerators
Created 8 months ago
Updated 1 day ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
15 more.

FasterTransformer by NVIDIA

0.1%
6k
Optimized transformer library for inference
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.