Tree-based speculative decoding algorithm (research paper)
Top 80.4% on sourcepulse
Sequoia implements a scalable and robust tree-based speculative decoding algorithm designed to accelerate large language model (LLM) inference. It targets researchers and engineers seeking to improve LLM throughput by reducing latency, particularly for demanding inference workloads.
How It Works
Sequoia employs a tree-based speculative decoding approach, where a smaller "draft" model generates multiple candidate tokens in parallel. These candidates are then validated by a larger "target" model. The tree structure, defined by "growmaps," allows for efficient exploration of potential token sequences, optimizing the trade-off between draft model speed and target model accuracy. This method aims to achieve higher inference speeds by performing multiple draft steps for each confirmed target step.
Quick Start & Requirements
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu121
, pip install transformers==4.36.2
, pip install accelerate==0.26.1
, pip install datasets==2.16.1
, pip install einops
, pip install protobuf
, pip install sentencepiece
, pip install typing-extensions
.testbed.py
, testbed_greedy.py
, etc.) require specific growmaps and model paths.Highlighted Details
Maintenance & Community
The project is associated with Infini-AI-Lab. Further community engagement details (Discord, Slack, roadmap) are not explicitly provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Currently, only Llama family models are supported for the --model
and --target
arguments in the evaluation scripts. Support for other open-source models, multi-round dialogue, INT4/8 quantization, and multi-GPU inference are listed as future TODOs. The maximum sequence length for experiments is 256, requiring adjustments to --M
for longer sequences.
6 months ago
1 week