Discover and explore top open-source AI tools and projects—updated daily.
Train speculative decoding models for faster inference
Top 81.4% on SourcePulse
SpecForge is a framework for training speculative decoding models, designed for seamless integration with the SGLang serving framework to accelerate inference. It targets researchers and developers looking to implement and deploy efficient LLM inference, offering two distinct training modes to accommodate varying hardware and storage capabilities.
How It Works
SpecForge supports two training methodologies: online and offline. Online training generates auxiliary hidden states on-the-fly during the draft model's training, requiring more GPUs but minimal disk space. Offline training pre-generates and stores these hidden states, demanding significant disk space but allowing for training with as few as one GPU. Both methods ensure checkpoints are directly compatible with SGLang, eliminating post-processing steps.
Quick Start & Requirements
pip install -v .
Highlighted Details
Maintenance & Community
The project is actively maintained by the SGLang team. Community support is available via Slack.
Licensing & Compatibility
Limitations & Caveats
Offline data preparation for hidden states generation is resource-intensive, requiring substantial disk space (e.g., 5TB for 1000 samples) and significant processing time. Customizing tensor-parallel versions of target models requires manual implementation.
17 hours ago
Inactive