checkpoint-engine by MoonshotAI

Middleware for efficient LLM weight updates during inference

Created 4 months ago

886 stars

Top 40.7% on SourcePulse

View on GitHub

8 Experts Love This Project

Pawel Garbacki

Cofounder of Fireworks AI

Zhuohan Li

Coauthor of vLLM

Luis Capelo

Cofounder of Lightning AI

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

and 4 more!

Project Summary

Checkpoint-engine provides efficient middleware for updating LLM model weights in inference engines, crucial for reinforcement learning. It targets engineers needing fast, inplace weight updates across distributed GPU setups, offering significant performance gains.

How It Works

The core ParameterServer manages updates via Broadcast (synchronous, high-throughput) and P2P (dynamic instances via mooncake-transfer-engine). Broadcast optimizes transfers through a 3-stage pipeline (H2D, inter-worker broadcast, engine reload) with overlapped communication/copy, falling back to serial execution if GPU memory is constrained.

Quick Start & Requirements

Install: pip install checkpoint-engine or pip install 'checkpoint-engine[p2p]'.
Prerequisites: vLLM (v0.10.2rc1, specific API commit), Python 3.12, potentially H800/H20 GPUs. FP8 requires vLLM patches.
Setup: Involves cloning vLLM, environment setup, installing dependencies, model download, and launching vLLM with VllmColocateWorkerExtension.
Docs/Demo: README provides setup guide and demo commands. Link: vLLM.

Highlighted Details

Updates 1T parameter model (Kimi-K2) in ~20s across thousands of GPUs.
Benchmarks show efficient updates: e.g., 1.42 GiB in 3.94s (Broadcast) on 8xH800 for GLM-4.5-Air.
Supports dynamic joining of new inference instances, reusing weights.
Implements pipelined data transfer, overlapping communication and computation.

Maintenance & Community

No specific community links (Discord, Slack) or roadmap details are provided in the README. Mentions contributions from youkaichao regarding vLLM integration.

Licensing & Compatibility

The license type and any compatibility restrictions are not specified in the provided README content.

Limitations & Caveats

Currently vLLM-specific; other frameworks (SGLang) integration is planned.
Full three-stage pipeline not yet implemented.
P2P update method has potential optimizations.
FP8 support requires specific patches and may have compatibility issues beyond tested models.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

29 stars in the last 30 days