High-performance KV store for distributed LLM inference
Top 89.1% on sourcepulse
InfiniStore is a high-performance KV cache store designed for distributed LLM inference clusters, enabling efficient KV cache transfer and reuse between prefill and decoding nodes, or acting as an extended cache pool in non-disaggregated setups. It targets LLM inference operators and researchers seeking to optimize throughput and latency by managing KV cache effectively across nodes.
How It Works
InfiniStore facilitates KV cache management through a distributed key-value store architecture. It supports both TCP/IP and RDMA (RoCE, InfiniBand) networks for low-latency data transfer. The system is designed to integrate with inference engines like vLLM (via LMCache) and others in progress, allowing for disaggregated or non-disaggregated cluster configurations to leverage shared or extended KV cache pools.
Quick Start & Requirements
pip install infinistore
for users, or from source for development.apt install libuv1-dev libflatbuffers-dev libspdlog-dev libfmt-dev ibverbs-utils libibverbs-dev libboost-dev libboost-stacktrace-dev
.infinistore --manage-port 8088
and curl http://127.0.0.1:8088/selftest
.infinistore --service-port <port>
(TCP/IP) or with --dev-name
and --link-type
for RDMA.Highlighted Details
Maintenance & Community
InfiniStore is an open-source project welcoming community contributions. Development installation includes pre-commit hooks for code quality.
Licensing & Compatibility
The README does not explicitly state the license.
Limitations & Caveats
The README does not detail specific limitations or known caveats. The setup for vLLM integration is described as complex, with users directed to a separate demo repository.
1 month ago
Inactive