SpeculativeDecodingPapers  by hemingkx

Paper list for speculative decoding literature

Created 2 years ago
935 stars

Top 39.2% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository serves as a curated, regularly updated collection of research papers, blogs, and code related to Speculative Decoding, a technique for accelerating Large Language Model (LLM) inference. It targets researchers, engineers, and practitioners seeking to understand and implement efficient LLM generation.

How It Works

Speculative Decoding accelerates LLM inference by using a smaller, faster "draft" model to predict multiple future tokens in parallel. These draft tokens are then verified by the larger, more accurate LLM. If the draft tokens are accepted, significant speedups are achieved; if rejected, only the verified tokens are used, ensuring correctness. This approach balances computational cost with generation quality.

Quick Start & Requirements

This repository is a curated list of papers and does not have a direct installation or execution command. It provides links to research papers (PDFs), associated code repositories, and blog posts for further exploration.

Highlighted Details

  • Comprehensive coverage of speculative decoding techniques, categorized by application (Seq2Seq, LLMs, Multimodal, Long-Context) and methodology.
  • Includes links to over 150 research papers, many with accompanying code and supplementary materials like slides or videos.
  • Features a dedicated survey paper by the repository maintainers, offering a structured overview of the field.
  • Categorizes papers by keywords, conference, and application areas for easy navigation.

Maintenance & Community

The repository is maintained by Heming Xia and actively encourages community contributions to include new and relevant works. The primary citation is for the survey paper "Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding."

Licensing & Compatibility

The repository itself does not impose a license on the curated content. Individual papers and code repositories linked within will have their own respective licenses.

Limitations & Caveats

This repository is a static collection of links and does not provide an executable framework. Users must individually access and evaluate the linked papers and code for their specific use cases. The rapid pace of research means the list may not always be immediately exhaustive of the very latest publications.

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
48 stars in the last 30 days

Explore Similar Projects

Starred by Cody Yu Cody Yu(Coauthor of vLLM; MTS at OpenAI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

Consistency_LLM by hao-ai-lab

0.3%
404
Parallel decoder for efficient LLM inference
Created 1 year ago
Updated 10 months ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

10.6%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.