SpecForge by sgl-project

Train speculative decoding models for faster inference

Created 5 months ago

503 stars

Top 61.8% on SourcePulse

View on GitHub

4 Experts Love This Project

Lianmin Zheng

Coauthor of SGLang, vLLM

Inference Lead at SGLang; Research Scientist at Together AI

Project Summary

SpecForge is a framework for training speculative decoding models, designed for seamless integration with the SGLang serving framework to accelerate inference. It targets researchers and developers looking to implement and deploy efficient LLM inference, offering two distinct training modes to accommodate varying hardware and storage capabilities.

How It Works

SpecForge supports two training methodologies: online and offline. Online training generates auxiliary hidden states on-the-fly during the draft model's training, requiring more GPUs but minimal disk space. Offline training pre-generates and stores these hidden states, demanding significant disk space but allowing for training with as few as one GPU. Both methods ensure checkpoints are directly compatible with SGLang, eliminating post-processing steps.

Quick Start & Requirements

Installation: pip install -v .
Data Preparation: Requires datasets formatted as JSONL. Online training uses raw data; offline training requires pre-generated hidden states, which can take ~2 hours and ~5TB of disk space for 1000 samples.
Training: Supports online and offline training for models like Llama 3 and Qwen 3, with example scripts provided.
Customization: Allows customization of training arguments, chat templates, target models (including tensor parallelism), and draft models.
Links: LMSYS Blog, Slack, Hugging Face

Highlighted Details

Byte-for-byte checkpoint compatibility with SGLang.
Supports online/offline training, tensor-parallelism, and FSDP.
Provides scripts for preparing datasets (Ultrachat, ShareGPT) and generating hidden states.
Offers extensive customization for training arguments, chat templates, and model architectures.

Maintenance & Community

The project is actively maintained by the SGLang team. Community support is available via Slack.

Licensing & Compatibility

License: MIT License 2.0.
Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

Offline data preparation for hidden states generation is resource-intensive, requiring substantial disk space (e.g., 5TB for 1000 samples) and significant processing time. Customizing tensor-parallel versions of target models requires manual implementation.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

54 stars in the last 30 days