Discover and explore top open-source AI tools and projects—updated daily.
tanishqkumarLightweight LLM inference engine for accelerated decoding
New!
Top 45.0% on SourcePulse
This project introduces Speculative Speculative Decoding (SSD), a novel LLM inference algorithm designed for extreme speed. It enables parallel drafting and verification across distinct hardware, significantly outperforming traditional sequential speculative decoding for high-throughput LLM deployments.
How It Works
SSD enhances speculative decoding (SD) by executing the small model's token guessing (drafting) and the large model's verification steps concurrently on separate hardware. Unlike sequential SD, SSD's small model anticipates verification outcomes, allowing simultaneous speculation. This parallel approach eliminates drafting overhead and can immediately return correct speculations, boosting inference speed.
Quick Start & Requirements
uv (install via curl -LsSf https://astral.sh/uv/install.sh | sh). Clone repo, sync deps (uv sync, uv sync --extra scripts), activate env (source .venv/bin/activate).SSD_HF_CACHE, SSD_DATASET_DIR, SSD_CUDA_ARCH, HF_DATASETS_CACHE must be set.scripts/download_from_hf.py and scripts/get_data_from_hf.py are provided.https://github.com/tanishqkumar/ssd. Paper: https://arxiv.org/abs/2603.03251.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 week ago
Inactive
ScalingIntelligence
hao-ai-lab
zhihu
GeeeekExplorer