Awesome-LLM-Inference by xlite-dev

Curated list of LLM/VLM inference research papers with code

Created 2 years ago

4,899 stars

Top 10.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Ying Sheng

Coauthor of SGLang

Project Summary

This repository is a curated list of research papers and code related to Large Language Model (LLM) and Vision-Language Model (VLM) inference. It serves as a comprehensive resource for researchers and engineers looking to optimize LLM performance, covering topics from attention mechanisms and quantization to parallelism and KV cache management.

How It Works

The project organizes papers by key LLM inference topics, providing links to their respective research papers and associated code repositories. It categorizes advancements in areas like FlashAttention, PagedAttention, quantization techniques (WINT8/4, FP8), parallelism strategies (Tensor Parallelism, Sequence Parallelism), KV cache optimization, and efficient decoding methods. This structured approach allows users to quickly find and explore state-of-the-art solutions for specific inference challenges.

Quick Start & Requirements

This repository is a collection of links and does not require installation or execution. Users can directly access the linked papers and code repositories for their specific needs.

Highlighted Details

Extensive coverage of recent advancements in LLM inference, including papers from 2024 and early 2025.
Detailed categorization of techniques such as quantization, attention mechanisms, KV cache optimization, and parallelism.
Links to code repositories for many of the featured papers, facilitating practical implementation.
Inclusion of trending topics and specific model families like DeepSeek.

Maintenance & Community

The repository is maintained by xlite-dev and contributors. It welcomes contributions via pull requests.

Licensing & Compatibility

The repository itself is licensed under the GNU General Public License v3.0. Individual linked papers and code repositories will have their own licenses, which users must adhere to.

Limitations & Caveats

As a curated list, the repository does not provide direct implementations or benchmarks. Users are responsible for evaluating the applicability and performance of the linked resources in their specific use cases. The "Recom" column uses a star rating, which is subjective.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

97 stars in the last 30 days