SpeculativeDecodingPapers by hemingkx

Paper list for speculative decoding literature

Created 2 years ago

1,072 stars

Top 35.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Philip Howes

Cofounder of Baseten

Project Summary

This repository serves as a curated, regularly updated collection of research papers, blogs, and code related to Speculative Decoding, a technique for accelerating Large Language Model (LLM) inference. It targets researchers, engineers, and practitioners seeking to understand and implement efficient LLM generation.

How It Works

Speculative Decoding accelerates LLM inference by using a smaller, faster "draft" model to predict multiple future tokens in parallel. These draft tokens are then verified by the larger, more accurate LLM. If the draft tokens are accepted, significant speedups are achieved; if rejected, only the verified tokens are used, ensuring correctness. This approach balances computational cost with generation quality.

Quick Start & Requirements

This repository is a curated list of papers and does not have a direct installation or execution command. It provides links to research papers (PDFs), associated code repositories, and blog posts for further exploration.

Highlighted Details

Comprehensive coverage of speculative decoding techniques, categorized by application (Seq2Seq, LLMs, Multimodal, Long-Context) and methodology.
Includes links to over 150 research papers, many with accompanying code and supplementary materials like slides or videos.
Features a dedicated survey paper by the repository maintainers, offering a structured overview of the field.
Categorizes papers by keywords, conference, and application areas for easy navigation.

Maintenance & Community

The repository is maintained by Heming Xia and actively encourages community contributions to include new and relevant works. The primary citation is for the survey paper "Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding."

Licensing & Compatibility

The repository itself does not impose a license on the curated content. Individual papers and code repositories linked within will have their own respective licenses.

Limitations & Caveats

This repository is a static collection of links and does not provide an executable framework. Users must individually access and evaluate the linked papers and code for their specific use cases. The rapid pace of research means the list may not always be immediately exhaustive of the very latest publications.

Health Check

Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

19 stars in the last 30 days