Paper list for speculative decoding literature
Top 42.9% on sourcepulse
This repository serves as a curated, regularly updated collection of research papers, blogs, and code related to Speculative Decoding, a technique for accelerating Large Language Model (LLM) inference. It targets researchers, engineers, and practitioners seeking to understand and implement efficient LLM generation.
How It Works
Speculative Decoding accelerates LLM inference by using a smaller, faster "draft" model to predict multiple future tokens in parallel. These draft tokens are then verified by the larger, more accurate LLM. If the draft tokens are accepted, significant speedups are achieved; if rejected, only the verified tokens are used, ensuring correctness. This approach balances computational cost with generation quality.
Quick Start & Requirements
This repository is a curated list of papers and does not have a direct installation or execution command. It provides links to research papers (PDFs), associated code repositories, and blog posts for further exploration.
Highlighted Details
Maintenance & Community
The repository is maintained by Heming Xia and actively encourages community contributions to include new and relevant works. The primary citation is for the survey paper "Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding."
Licensing & Compatibility
The repository itself does not impose a license on the curated content. Individual papers and code repositories linked within will have their own respective licenses.
Limitations & Caveats
This repository is a static collection of links and does not provide an executable framework. Users must individually access and evaluate the linked papers and code for their specific use cases. The rapid pace of research means the list may not always be immediately exhaustive of the very latest publications.
3 days ago
1 week