SpeculativeDecodingPapers  by hemingkx

Paper list for speculative decoding literature

created 1 year ago
851 stars

Top 42.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a curated, regularly updated collection of research papers, blogs, and code related to Speculative Decoding, a technique for accelerating Large Language Model (LLM) inference. It targets researchers, engineers, and practitioners seeking to understand and implement efficient LLM generation.

How It Works

Speculative Decoding accelerates LLM inference by using a smaller, faster "draft" model to predict multiple future tokens in parallel. These draft tokens are then verified by the larger, more accurate LLM. If the draft tokens are accepted, significant speedups are achieved; if rejected, only the verified tokens are used, ensuring correctness. This approach balances computational cost with generation quality.

Quick Start & Requirements

This repository is a curated list of papers and does not have a direct installation or execution command. It provides links to research papers (PDFs), associated code repositories, and blog posts for further exploration.

Highlighted Details

  • Comprehensive coverage of speculative decoding techniques, categorized by application (Seq2Seq, LLMs, Multimodal, Long-Context) and methodology.
  • Includes links to over 150 research papers, many with accompanying code and supplementary materials like slides or videos.
  • Features a dedicated survey paper by the repository maintainers, offering a structured overview of the field.
  • Categorizes papers by keywords, conference, and application areas for easy navigation.

Maintenance & Community

The repository is maintained by Heming Xia and actively encourages community contributions to include new and relevant works. The primary citation is for the survey paper "Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding."

Licensing & Compatibility

The repository itself does not impose a license on the curated content. Individual papers and code repositories linked within will have their own respective licenses.

Limitations & Caveats

This repository is a static collection of links and does not provide an executable framework. Users must individually access and evaluate the linked papers and code for their specific use cases. The rapid pace of research means the list may not always be immediately exhaustive of the very latest publications.

Health Check
Last commit

3 days ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
0
Star History
154 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.