Discover and explore top open-source AI tools and projects—updated daily.
val1813Auto-tuned local LLM serving for optimal performance
Top 99.8% on SourcePulse
Summary Kaiwu optimizes local LLM deployment by automatically tuning hardware, model configurations, and inference parameters for maximum speed and context window. It targets engineers and power users seeking effortless, high-performance local LLM serving via an OpenAI-compatible API.
How It Works
Kaiwu probes hardware (GPU, VRAM, RAM) and model specifics to benchmark KV cache types and context window sizes. It dynamically determines the optimal configuration for sustained inference speed, automatically handling MoE architectures by routing expert layers to the CPU when VRAM is limited. Built on llama.cpp, it caches results for rapid 2-second subsequent launches.
Quick Start & Requirements
Install via PowerShell (Windows) or curl/bash (Linux/macOS) scripts. Run models with kaiwu run <model> or kaiwu run /path/to/model.gguf. The OpenAI-compatible API is at http://localhost:11435/v1. Requires NVIDIA GPU (4GB+ VRAM), driver >= 550.54, Windows 10/11 or Linux (Ubuntu 20.04+), 8GB+ RAM (16GB+ for 30B MoE), and GGUF models.
Highlighted Details Demonstrates significant performance gains, e.g., 8.7 tok/s with 32K context on an 8GB GPU for a 30B MoE model (vs. 3 tok/s, 4K context in LM Studio). Automatically optimizes MoE models, multi-GPU tensor splitting (weighted by VRAM/bandwidth), KV cache types, ubatch size, and thread count through measured benchmarks.
Maintenance & Community
Active development is evident from frequent changelog updates. Built upon llama.cpp. No specific community channels or prominent contributors are listed.
Licensing & Compatibility The specific open-source license for Kaiwu is not explicitly stated in the README, requiring clarification for adoption decisions.
Limitations & Caveats Primarily focused on NVIDIA GPUs; CPU-only inference is secondary. Requires specific NVIDIA driver versions. The absence of a stated license is a notable caveat.
2 weeks ago
Inactive
AI-Hypercomputer