Discover and explore top open-source AI tools and projects—updated daily.
Curated MoE LLM inference optimizations
Top 96.6% on SourcePulse
This repository is a curated collection of research papers focused on optimizing the inference of Mixture-of-Experts (MoE) Large Language Models (LLMs). It serves as a valuable resource for researchers and engineers working on efficient deployment and scaling of MoE models, providing a structured overview of advancements in algorithms, system-level optimizations, and hardware acceleration.
How It Works
The collection categorizes papers across various optimization techniques, including MoE module design, model compression (pruning, quantization, distillation), expert skip/adaptive gating, expert merging, sparse-to-dense transformations, and system-level optimizations like expert parallelism, offloading, and scheduling. This categorization allows users to quickly identify relevant research for specific inference challenges.
Quick Start & Requirements
This repository is a collection of papers and does not have direct installation or execution requirements. Users can browse the categorized list of papers, which often include links to preprints (arXiv), code repositories, and conference publications.
Highlighted Details
Maintenance & Community
The repository is curated by Jiacheng Liu and colleagues, with a citation provided for their survey paper on MoE inference optimization. Further community engagement details (e.g., Discord, Slack) are not specified in the README.
Licensing & Compatibility
The repository itself is a collection of links and does not have a specific license. The linked papers and code repositories will have their own respective licenses, which users should consult.
Limitations & Caveats
As a curated list, the repository's content is limited to the papers the curators have included. It does not provide direct tools or implementations but rather pointers to existing research. Users need to evaluate the quality and applicability of individual papers and their associated code.
3 days ago
Inactive