kaiwu  by val1813

Auto-tuned local LLM serving for optimal performance

Created 1 month ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary Kaiwu optimizes local LLM deployment by automatically tuning hardware, model configurations, and inference parameters for maximum speed and context window. It targets engineers and power users seeking effortless, high-performance local LLM serving via an OpenAI-compatible API.

How It Works Kaiwu probes hardware (GPU, VRAM, RAM) and model specifics to benchmark KV cache types and context window sizes. It dynamically determines the optimal configuration for sustained inference speed, automatically handling MoE architectures by routing expert layers to the CPU when VRAM is limited. Built on llama.cpp, it caches results for rapid 2-second subsequent launches.

Quick Start & Requirements Install via PowerShell (Windows) or curl/bash (Linux/macOS) scripts. Run models with kaiwu run <model> or kaiwu run /path/to/model.gguf. The OpenAI-compatible API is at http://localhost:11435/v1. Requires NVIDIA GPU (4GB+ VRAM), driver >= 550.54, Windows 10/11 or Linux (Ubuntu 20.04+), 8GB+ RAM (16GB+ for 30B MoE), and GGUF models.

Highlighted Details Demonstrates significant performance gains, e.g., 8.7 tok/s with 32K context on an 8GB GPU for a 30B MoE model (vs. 3 tok/s, 4K context in LM Studio). Automatically optimizes MoE models, multi-GPU tensor splitting (weighted by VRAM/bandwidth), KV cache types, ubatch size, and thread count through measured benchmarks.

Maintenance & Community Active development is evident from frequent changelog updates. Built upon llama.cpp. No specific community channels or prominent contributors are listed.

Licensing & Compatibility The specific open-source license for Kaiwu is not explicitly stated in the README, requiring clarification for adoption decisions.

Limitations & Caveats Primarily focused on NVIDIA GPUs; CPU-only inference is secondary. Requires specific NVIDIA driver versions. The absence of a stated license is a notable caveat.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
20
Star History
212 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.