LLM-inference-optimization-paper by chenhongyu2048

Navigating the landscape of LLM inference optimization

Created 2 years ago

262 stars

Top 96.9% on SourcePulse

Project Summary

This repository serves as a curated knowledge base for optimizing Large Language Model (LLM) inference. It addresses the challenge of navigating the rapidly evolving landscape of LLM inference research by consolidating key papers, repositories, researchers, and labs. The primary benefit is providing engineers, researchers, and power users with a centralized, up-to-date resource to accelerate their understanding and adoption of LLM inference optimization techniques.

How It Works

The project functions as a comprehensive, manually curated list organized into distinct sections: Repositories, Key Individuals/Labs, and specific Works. Works are further categorized by research interest, including surveys, evaluations, benchmarks, technical reports, and specific optimization areas like parallel decoding, quantization, batch processing, Mixture-of-Experts (MoE), and multimodal models. This structured approach aims to provide a navigable overview of the LLM inference optimization domain, highlighting seminal contributions and emerging trends.

Quick Start & Requirements

This repository is a curated list of research papers and resources, not a runnable software project. Therefore, there are no installation or execution requirements.

Highlighted Details

Extensive categorization covering numerous LLM inference optimization sub-fields, such as Survey/Evaluations/Benchmarks, Parallel Decoding, Quantization, Batch Processing, MoE, Multimodal, Diffusion Models, and more.
Identification of influential researchers and leading academic labs actively contributing to LLM inference optimization.
Emphasis on keeping the list updated, reflecting the dynamic nature of the field.
Inclusion of links to specific papers, GitHub repositories, and benchmark results for direct access and further investigation.

Maintenance & Community

The repository appears to be actively maintained by the author, chenhongyu2048, with an explicit goal of keeping the paper list updated. The author invites community contributions and feedback through GitHub issues, fostering a collaborative environment for knowledge sharing.

Licensing & Compatibility

No open-source license is specified in the provided README content. This absence of licensing information presents a significant caveat for potential users or contributors regarding usage rights and compatibility.

Limitations & Caveats

The author acknowledges the inherent subjectivity and potential incompleteness of the curation, noting that "shortness of my knowledge" may lead to omissions of important people or works. Some sections are marked with "💡" indicating areas that may require further refinement or are not yet fully comprehensive. The value and accuracy of the information are dependent on the curator's ongoing efforts and judgment.

LLM-inference-optimization-paper by chenhongyu2048

Explore Similar Projects

Awesome-KV-Cache-Management by TreeAI-Lab

kaiwu by val1813

Awesome-KV-Cache-Optimization by jjiantong

ScaleLLM by vectorch-ai

llama.cpp-deepseek-v4-flash by antirez

ai-infra-learning by cr7258

atlas by Avarok-Cybersecurity

LLM-Viewer by hahnyuan

picolm by RightNow-AI

tiny-llm by skyzh

LiteRT-LM by google-ai-edge

mistral.rs by EricLBuehler