InfiniStore  by bytedance

High-performance KV store for distributed LLM inference

created 11 months ago
303 stars

Top 89.1% on sourcepulse

GitHubView on GitHub
Project Summary

InfiniStore is a high-performance KV cache store designed for distributed LLM inference clusters, enabling efficient KV cache transfer and reuse between prefill and decoding nodes, or acting as an extended cache pool in non-disaggregated setups. It targets LLM inference operators and researchers seeking to optimize throughput and latency by managing KV cache effectively across nodes.

How It Works

InfiniStore facilitates KV cache management through a distributed key-value store architecture. It supports both TCP/IP and RDMA (RoCE, InfiniBand) networks for low-latency data transfer. The system is designed to integrate with inference engines like vLLM (via LMCache) and others in progress, allowing for disaggregated or non-disaggregated cluster configurations to leverage shared or extended KV cache pools.

Quick Start & Requirements

  • Installation: pip install infinistore for users, or from source for development.
  • Development Prerequisites: apt install libuv1-dev libflatbuffers-dev libspdlog-dev libfmt-dev ibverbs-utils libibverbs-dev libboost-dev libboost-stacktrace-dev.
  • Verification: infinistore --manage-port 8088 and curl http://127.0.0.1:8088/selftest.
  • Running Server: infinistore --service-port <port> (TCP/IP) or with --dev-name and --link-type for RDMA.
  • vLLM Integration: Requires installing vLLM, LMCache, and InfiniStore on all nodes. See splitwise-demos for setup details.

Highlighted Details

  • Supports prefill-decoding disaggregation and non-disaggregated cluster modes.
  • Enables KV cache transfer and reuse across inference nodes.
  • Integrated with vLLM via LMCache; integrations with SGLang and others are in progress.
  • Offers both TCP/IP and RDMA network support for performance.

Maintenance & Community

InfiniStore is an open-source project welcoming community contributions. Development installation includes pre-commit hooks for code quality.

Licensing & Compatibility

The README does not explicitly state the license.

Limitations & Caveats

The README does not detail specific limitations or known caveats. The setup for vLLM integration is described as complex, with users directed to a separate demo repository.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
138 stars in the last 90 days

Explore Similar Projects

Starred by Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
4 more.

dynamo by ai-dynamo

1.1%
5k
Inference framework for distributed generative AI model serving
created 5 months ago
updated 19 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Tobi Lutke Tobi Lutke(Cofounder of Shopify), and
27 more.

vllm by vllm-project

1.0%
54k
LLM serving engine for high-throughput, memory-efficient inference
created 2 years ago
updated 16 hours ago
Feedback? Help us improve.