Spec-Bench  by hemingkx

Benchmark for speculative decoding methods (ACL 2024 paper)

created 1 year ago
300 stars

Top 89.8% on sourcepulse

GitHubView on GitHub
Project Summary

Spec-Bench provides a unified evaluation platform and benchmark for speculative decoding methods in large language models. It aims to facilitate fair and systematic comparisons of various open-source speculative decoding approaches across diverse scenarios, benefiting researchers and developers working on LLM inference optimization.

How It Works

Spec-Bench integrates multiple open-source speculative decoding algorithms, including EAGLE, Hydra, Medusa, and others, into a single framework. This allows for standardized testing and performance measurement on the same hardware and within the same environment, ensuring reproducible results and direct comparison of speedups and output quality against vanilla autoregressive decoding.

Quick Start & Requirements

  • Installation: conda create -n specbench python=3.12, conda activate specbench, cd Spec-Bench, pip install -r requirements.txt.
  • Prerequisites: Python 3.12, Conda. Model weights for specific models (e.g., Vicuna-v1.3, EAGLE-1,3, Hydra, Medusa-1, SPACE) need to be downloaded separately.
  • Additional Setup: The REST method requires building DraftRetriever from source using Rust and maturin. A datastore setup is also needed for REST.
  • Resources: Setup involves environment creation, dependency installation, and potentially downloading large model weights.
  • Links: Paper, Blog, Leaderboard, Roadmap.

Highlighted Details

  • Supports evaluation of EAGLE-1,2,3, Hydra, Medusa, Speculative Sampling, Prompt Lookup Decoding, TokenRecycling, REST, Lookahead Decoding, SPACE, and SAM-Decoding.
  • Includes scripts for calculating speedup and comparing generated results against autoregressive decoding.
  • Actively updated with new methods and features, as indicated by recent commits and roadmap.
  • Built upon existing codebases from Medusa and EAGLE.

Maintenance & Community

  • The project is actively maintained, with recent updates integrating new models like EAGLE-3 and SAM-Decoding.
  • Contributions are welcomed via pull requests and issues.
  • Further details on community and roadmap can be found via provided links.

Licensing & Compatibility

  • The repository's license is not explicitly stated in the README. However, it mentions being built from Medusa and EAGLE, which may have their own licenses. Compatibility for commercial use or closed-source linking requires verification of the specific licenses of all integrated components.

Limitations & Caveats

The README does not explicitly state the license for the Spec-Bench repository itself, which could impact commercial use. The REST method requires a Rust toolchain and manual build process, adding complexity to setup.

Health Check
Last commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
40 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeremy Howard Jeremy Howard(Cofounder of fast.ai).

GPTFast by MDK8888

0%
685
HF Transformers accelerator for faster inference
created 1 year ago
updated 11 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 12 hours ago
Feedback? Help us improve.