SpecForge  by sgl-project

Train speculative decoding models for faster inference

Created 2 months ago
338 stars

Top 81.4% on SourcePulse

GitHubView on GitHub
Project Summary

SpecForge is a framework for training speculative decoding models, designed for seamless integration with the SGLang serving framework to accelerate inference. It targets researchers and developers looking to implement and deploy efficient LLM inference, offering two distinct training modes to accommodate varying hardware and storage capabilities.

How It Works

SpecForge supports two training methodologies: online and offline. Online training generates auxiliary hidden states on-the-fly during the draft model's training, requiring more GPUs but minimal disk space. Offline training pre-generates and stores these hidden states, demanding significant disk space but allowing for training with as few as one GPU. Both methods ensure checkpoints are directly compatible with SGLang, eliminating post-processing steps.

Quick Start & Requirements

  • Installation: pip install -v .
  • Data Preparation: Requires datasets formatted as JSONL. Online training uses raw data; offline training requires pre-generated hidden states, which can take ~2 hours and ~5TB of disk space for 1000 samples.
  • Training: Supports online and offline training for models like Llama 3 and Qwen 3, with example scripts provided.
  • Customization: Allows customization of training arguments, chat templates, target models (including tensor parallelism), and draft models.
  • Links: LMSYS Blog, Slack, Hugging Face

Highlighted Details

  • Byte-for-byte checkpoint compatibility with SGLang.
  • Supports online/offline training, tensor-parallelism, and FSDP.
  • Provides scripts for preparing datasets (Ultrachat, ShareGPT) and generating hidden states.
  • Offers extensive customization for training arguments, chat templates, and model architectures.

Maintenance & Community

The project is actively maintained by the SGLang team. Community support is available via Slack.

Licensing & Compatibility

  • License: MIT License 2.0.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

Offline data preparation for hidden states generation is resource-intensive, requiring substantial disk space (e.g., 5TB for 1000 samples) and significant processing time. Customizing tensor-parallel versions of target models requires manual implementation.

Health Check
Last Commit

17 hours ago

Responsiveness

Inactive

Pull Requests (30d)
78
Issues (30d)
33
Star History
96 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
20 more.

TinyLlama by jzhang38

0.2%
9k
Tiny pretraining project for a 1.1B Llama model
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.