SpecForge  by sgl-project

Train speculative decoding models for faster inference

Created 4 months ago
426 stars

Top 69.5% on SourcePulse

GitHubView on GitHub
Project Summary

SpecForge is a framework for training speculative decoding models, designed for seamless integration with the SGLang serving framework to accelerate inference. It targets researchers and developers looking to implement and deploy efficient LLM inference, offering two distinct training modes to accommodate varying hardware and storage capabilities.

How It Works

SpecForge supports two training methodologies: online and offline. Online training generates auxiliary hidden states on-the-fly during the draft model's training, requiring more GPUs but minimal disk space. Offline training pre-generates and stores these hidden states, demanding significant disk space but allowing for training with as few as one GPU. Both methods ensure checkpoints are directly compatible with SGLang, eliminating post-processing steps.

Quick Start & Requirements

  • Installation: pip install -v .
  • Data Preparation: Requires datasets formatted as JSONL. Online training uses raw data; offline training requires pre-generated hidden states, which can take ~2 hours and ~5TB of disk space for 1000 samples.
  • Training: Supports online and offline training for models like Llama 3 and Qwen 3, with example scripts provided.
  • Customization: Allows customization of training arguments, chat templates, target models (including tensor parallelism), and draft models.
  • Links: LMSYS Blog, Slack, Hugging Face

Highlighted Details

  • Byte-for-byte checkpoint compatibility with SGLang.
  • Supports online/offline training, tensor-parallelism, and FSDP.
  • Provides scripts for preparing datasets (Ultrachat, ShareGPT) and generating hidden states.
  • Offers extensive customization for training arguments, chat templates, and model architectures.

Maintenance & Community

The project is actively maintained by the SGLang team. Community support is available via Slack.

Licensing & Compatibility

  • License: MIT License 2.0.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

Offline data preparation for hidden states generation is resource-intensive, requiring substantial disk space (e.g., 5TB for 1000 samples) and significant processing time. Customizing tensor-parallel versions of target models requires manual implementation.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
23
Issues (30d)
5
Star History
39 stars in the last 30 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

llm_training_handbook by huggingface

4.8%
536
Handbook for large language model training methodologies
Created 2 years ago
Updated 1 year ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.0%
7k
Framework for training large-scale autoregressive language models
Created 4 years ago
Updated 2 weeks ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
27 more.

ColossalAI by hpcaitech

0.0%
41k
AI system for large-scale parallel training
Created 4 years ago
Updated 1 day ago
Feedback? Help us improve.