Spec-Bench  by hemingkx

Benchmark for speculative decoding methods (ACL 2024 paper)

Created 1 year ago
318 stars

Top 84.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Spec-Bench provides a unified evaluation platform and benchmark for speculative decoding methods in large language models. It aims to facilitate fair and systematic comparisons of various open-source speculative decoding approaches across diverse scenarios, benefiting researchers and developers working on LLM inference optimization.

How It Works

Spec-Bench integrates multiple open-source speculative decoding algorithms, including EAGLE, Hydra, Medusa, and others, into a single framework. This allows for standardized testing and performance measurement on the same hardware and within the same environment, ensuring reproducible results and direct comparison of speedups and output quality against vanilla autoregressive decoding.

Quick Start & Requirements

  • Installation: conda create -n specbench python=3.12, conda activate specbench, cd Spec-Bench, pip install -r requirements.txt.
  • Prerequisites: Python 3.12, Conda. Model weights for specific models (e.g., Vicuna-v1.3, EAGLE-1,3, Hydra, Medusa-1, SPACE) need to be downloaded separately.
  • Additional Setup: The REST method requires building DraftRetriever from source using Rust and maturin. A datastore setup is also needed for REST.
  • Resources: Setup involves environment creation, dependency installation, and potentially downloading large model weights.
  • Links: Paper, Blog, Leaderboard, Roadmap.

Highlighted Details

  • Supports evaluation of EAGLE-1,2,3, Hydra, Medusa, Speculative Sampling, Prompt Lookup Decoding, TokenRecycling, REST, Lookahead Decoding, SPACE, and SAM-Decoding.
  • Includes scripts for calculating speedup and comparing generated results against autoregressive decoding.
  • Actively updated with new methods and features, as indicated by recent commits and roadmap.
  • Built upon existing codebases from Medusa and EAGLE.

Maintenance & Community

  • The project is actively maintained, with recent updates integrating new models like EAGLE-3 and SAM-Decoding.
  • Contributions are welcomed via pull requests and issues.
  • Further details on community and roadmap can be found via provided links.

Licensing & Compatibility

  • The repository's license is not explicitly stated in the README. However, it mentions being built from Medusa and EAGLE, which may have their own licenses. Compatibility for commercial use or closed-source linking requires verification of the specific licenses of all integrated components.

Limitations & Caveats

The README does not explicitly state the license for the Spec-Bench repository itself, which could impact commercial use. The REST method requires a Rust toolchain and manual build process, adding complexity to setup.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Maxime Labonne Maxime Labonne(Head of Post-Training at Liquid AI), and
1 more.

GPTFast by MDK8888

0%
686
HF Transformers accelerator for faster inference
Created 1 year ago
Updated 1 year ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

1.3%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 2 days ago
Feedback? Help us improve.