LLM-inference-optimization-paper  by chenhongyu2048

Navigating the landscape of LLM inference optimization

Created 2 years ago
255 stars

Top 98.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a curated knowledge base for optimizing Large Language Model (LLM) inference. It addresses the challenge of navigating the rapidly evolving landscape of LLM inference research by consolidating key papers, repositories, researchers, and labs. The primary benefit is providing engineers, researchers, and power users with a centralized, up-to-date resource to accelerate their understanding and adoption of LLM inference optimization techniques.

How It Works

The project functions as a comprehensive, manually curated list organized into distinct sections: Repositories, Key Individuals/Labs, and specific Works. Works are further categorized by research interest, including surveys, evaluations, benchmarks, technical reports, and specific optimization areas like parallel decoding, quantization, batch processing, Mixture-of-Experts (MoE), and multimodal models. This structured approach aims to provide a navigable overview of the LLM inference optimization domain, highlighting seminal contributions and emerging trends.

Quick Start & Requirements

This repository is a curated list of research papers and resources, not a runnable software project. Therefore, there are no installation or execution requirements.

Highlighted Details

  • Extensive categorization covering numerous LLM inference optimization sub-fields, such as Survey/Evaluations/Benchmarks, Parallel Decoding, Quantization, Batch Processing, MoE, Multimodal, Diffusion Models, and more.
  • Identification of influential researchers and leading academic labs actively contributing to LLM inference optimization.
  • Emphasis on keeping the list updated, reflecting the dynamic nature of the field.
  • Inclusion of links to specific papers, GitHub repositories, and benchmark results for direct access and further investigation.

Maintenance & Community

The repository appears to be actively maintained by the author, chenhongyu2048, with an explicit goal of keeping the paper list updated. The author invites community contributions and feedback through GitHub issues, fostering a collaborative environment for knowledge sharing.

Licensing & Compatibility

No open-source license is specified in the provided README content. This absence of licensing information presents a significant caveat for potential users or contributors regarding usage rights and compatibility.

Limitations & Caveats

The author acknowledges the inherent subjectivity and potential incompleteness of the curation, noting that "shortness of my knowledge" may lead to omissions of important people or works. Some sections are marked with "💡" indicating areas that may require further refinement or are not yet fully comprehensive. The value and accuracy of the information are dependent on the curator's ongoing efforts and judgment.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 30 days

Explore Similar Projects

Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
12 more.

mistral.rs by EricLBuehler

0.4%
7k
LLM inference engine for blazing fast performance
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.