kaiwu by val1813

Auto-tuned local LLM serving for optimal performance

Created 2 months ago

254 stars

Top 99.1% on SourcePulse

Project Summary

Summary Kaiwu optimizes local LLM deployment by automatically tuning hardware, model configurations, and inference parameters for maximum speed and context window. It targets engineers and power users seeking effortless, high-performance local LLM serving via an OpenAI-compatible API.

How It Works Kaiwu probes hardware (GPU, VRAM, RAM) and model specifics to benchmark KV cache types and context window sizes. It dynamically determines the optimal configuration for sustained inference speed, automatically handling MoE architectures by routing expert layers to the CPU when VRAM is limited. Built on llama.cpp, it caches results for rapid 2-second subsequent launches.

Quick Start & Requirements Install via PowerShell (Windows) or curl/bash (Linux/macOS) scripts. Run models with kaiwu run <model> or kaiwu run /path/to/model.gguf. The OpenAI-compatible API is at http://localhost:11435/v1. Requires NVIDIA GPU (4GB+ VRAM), driver >= 550.54, Windows 10/11 or Linux (Ubuntu 20.04+), 8GB+ RAM (16GB+ for 30B MoE), and GGUF models.

Highlighted Details Demonstrates significant performance gains, e.g., 8.7 tok/s with 32K context on an 8GB GPU for a 30B MoE model (vs. 3 tok/s, 4K context in LM Studio). Automatically optimizes MoE models, multi-GPU tensor splitting (weighted by VRAM/bandwidth), KV cache types, ubatch size, and thread count through measured benchmarks.

Maintenance & Community Active development is evident from frequent changelog updates. Built upon llama.cpp. No specific community channels or prominent contributors are listed.

Licensing & Compatibility The specific open-source license for Kaiwu is not explicitly stated in the README, requiring clarification for adoption decisions.

Limitations & Caveats Primarily focused on NVIDIA GPUs; CPU-only inference is secondary. Requires specific NVIDIA driver versions. The absence of a stated license is a notable caveat.

kaiwu by val1813

Explore Similar Projects

Amis by cPilot-GUI

LLM-inference-optimization-paper by chenhongyu2048

eLLM by lucienhuangfu

ollama-benchmark by aidatatools

llama.cpp-deepseek-v4-flash by antirez

sarathi-serve by microsoft

atlas by Avarok-Cybersecurity

LLMServingSim by casys-kaist

candle-vllm by EricLBuehler

picolm by RightNow-AI

RedKnot by rednote-machine-learning

LiteRT-LM by google-ai-edge