awesome-moe-inference  by MoE-Inf

Curated MoE LLM inference optimizations

Created 9 months ago
265 stars

Top 96.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository is a curated collection of research papers focused on optimizing the inference of Mixture-of-Experts (MoE) Large Language Models (LLMs). It serves as a valuable resource for researchers and engineers working on efficient deployment and scaling of MoE models, providing a structured overview of advancements in algorithms, system-level optimizations, and hardware acceleration.

How It Works

The collection categorizes papers across various optimization techniques, including MoE module design, model compression (pruning, quantization, distillation), expert skip/adaptive gating, expert merging, sparse-to-dense transformations, and system-level optimizations like expert parallelism, offloading, and scheduling. This categorization allows users to quickly identify relevant research for specific inference challenges.

Quick Start & Requirements

This repository is a collection of papers and does not have direct installation or execution requirements. Users can browse the categorized list of papers, which often include links to preprints (arXiv), code repositories, and conference publications.

Highlighted Details

  • Comprehensive coverage of MoE inference optimization techniques, from algorithmic improvements to system and hardware co-design.
  • Includes a detailed table comparing various open-source MoE LLMs with their architectural parameters.
  • Provides links to numerous research papers, many with associated code implementations, facilitating practical application and further research.
  • Covers recent advancements up to early 2025, reflecting the rapid pace of development in MoE research.

Maintenance & Community

The repository is curated by Jiacheng Liu and colleagues, with a citation provided for their survey paper on MoE inference optimization. Further community engagement details (e.g., Discord, Slack) are not specified in the README.

Licensing & Compatibility

The repository itself is a collection of links and does not have a specific license. The linked papers and code repositories will have their own respective licenses, which users should consult.

Limitations & Caveats

As a curated list, the repository's content is limited to the papers the curators have included. It does not provide direct tools or implementations but rather pointers to existing research. Users need to evaluate the quality and applicability of individual papers and their associated code.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
31 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.