InfiniStore by bytedance

High-performance KV store for distributed LLM inference

Created 1 year ago

384 stars

Top 74.5% on SourcePulse

Project Summary

InfiniStore is a high-performance KV cache store designed for distributed LLM inference clusters, enabling efficient KV cache transfer and reuse between prefill and decoding nodes, or acting as an extended cache pool in non-disaggregated setups. It targets LLM inference operators and researchers seeking to optimize throughput and latency by managing KV cache effectively across nodes.

How It Works

InfiniStore facilitates KV cache management through a distributed key-value store architecture. It supports both TCP/IP and RDMA (RoCE, InfiniBand) networks for low-latency data transfer. The system is designed to integrate with inference engines like vLLM (via LMCache) and others in progress, allowing for disaggregated or non-disaggregated cluster configurations to leverage shared or extended KV cache pools.

Quick Start & Requirements

Installation: pip install infinistore for users, or from source for development.
Development Prerequisites: apt install libuv1-dev libflatbuffers-dev libspdlog-dev libfmt-dev ibverbs-utils libibverbs-dev libboost-dev libboost-stacktrace-dev.
Verification: infinistore --manage-port 8088 and curl http://127.0.0.1:8088/selftest.
Running Server: infinistore --service-port <port> (TCP/IP) or with --dev-name and --link-type for RDMA.
vLLM Integration: Requires installing vLLM, LMCache, and InfiniStore on all nodes. See splitwise-demos for setup details.

Highlighted Details

Supports prefill-decoding disaggregation and non-disaggregated cluster modes.
Enables KV cache transfer and reuse across inference nodes.
Integrated with vLLM via LMCache; integrations with SGLang and others are in progress.
Offers both TCP/IP and RDMA network support for performance.

Maintenance & Community

InfiniStore is an open-source project welcoming community contributions. Development installation includes pre-commit hooks for code quality.

Licensing & Compatibility

The README does not explicitly state the license.

Limitations & Caveats

The README does not detail specific limitations or known caveats. The setup for vLLM integration is described as complex, with users directed to a separate demo repository.

InfiniStore by bytedance

Explore Similar Projects

Awesome-KV-Cache-Management by TreeAI-Lab

Awesome-KV-Cache-Compression by October2001

Awesome-LLM-KV-Cache by Zefan-Cai

Quest by mit-han-lab

llumnix by AlibabaPAI

omniserve by mit-han-lab

candle-vllm by EricLBuehler

H2O by FMInference

KVCache-Factory by Zefan-Cai

Mooncake by kvcache-ai

dynamo by ai-dynamo

LMCache by LMCache