awesome-moe-inference by MoE-Inf

Curated MoE LLM inference optimizations

Created 10 months ago

296 stars

Top 89.4% on SourcePulse

Project Summary

This repository is a curated collection of research papers focused on optimizing the inference of Mixture-of-Experts (MoE) Large Language Models (LLMs). It serves as a valuable resource for researchers and engineers working on efficient deployment and scaling of MoE models, providing a structured overview of advancements in algorithms, system-level optimizations, and hardware acceleration.

How It Works

The collection categorizes papers across various optimization techniques, including MoE module design, model compression (pruning, quantization, distillation), expert skip/adaptive gating, expert merging, sparse-to-dense transformations, and system-level optimizations like expert parallelism, offloading, and scheduling. This categorization allows users to quickly identify relevant research for specific inference challenges.

Quick Start & Requirements

This repository is a collection of papers and does not have direct installation or execution requirements. Users can browse the categorized list of papers, which often include links to preprints (arXiv), code repositories, and conference publications.

Highlighted Details

Comprehensive coverage of MoE inference optimization techniques, from algorithmic improvements to system and hardware co-design.
Includes a detailed table comparing various open-source MoE LLMs with their architectural parameters.
Provides links to numerous research papers, many with associated code implementations, facilitating practical application and further research.
Covers recent advancements up to early 2025, reflecting the rapid pace of development in MoE research.

Maintenance & Community

The repository is curated by Jiacheng Liu and colleagues, with a citation provided for their survey paper on MoE inference optimization. Further community engagement details (e.g., Discord, Slack) are not specified in the README.

Licensing & Compatibility

The repository itself is a collection of links and does not have a specific license. The linked papers and code repositories will have their own respective licenses, which users should consult.

Limitations & Caveats

As a curated list, the repository's content is limited to the papers the curators have included. It does not provide direct tools or implementations but rather pointers to existing research. Users need to evaluate the quality and applicability of individual papers and their associated code.

awesome-moe-inference by MoE-Inf

Explore Similar Projects

MoE-Infinity by EfficientMoE

Awesome-Efficient-Arch by weigao266

LLM-Reading-List by evanmiller

llama-moe by pjlab-sys4nlp

Moonlight by MoonshotAI

awesome-mixture-of-experts by XueFuzhao

OLMoE by allenai

LLMs-Zero-to-Hero by bbruceyuan

megablocks by databricks

mixtral-offloading by dvmazur

ESFT by deepseek-ai

llm-action by liguodongiot