Discover and explore top open-source AI tools and projects—updated daily.
A lightweight, local-first AI inference server
New!
Top 19.4% on SourcePulse
Summary: Shimmy offers a highly efficient, local-first AI inference server with OpenAI API compatibility, targeting developers seeking private, cost-effective, and fast AI model execution. It addresses the large binary sizes and slow startup times of alternatives, providing a 5MB single-binary Rust application for sub-second responses and minimal resource overhead.
How It Works:
Built with Rust and Tokio, Shimmy uses the llama.cpp
backend for GGUF model inference. Its design prioritizes extreme resource efficiency: a minimal 5.1MB binary, sub-100ms startup, and under 50MB memory usage. It automatically discovers models from common locations and manages ports dynamically, simplifying integration via OpenAI API compatibility and first-class LoRA adapter support.
Quick Start & Requirements:
cargo install shimmy
(Linux, macOS, Windows). Windows users should prefer cargo install
over the pre-built binary due to potential Defender false positives. macOS requires brew install cmake rust
prior to cargo install
.Highlighted Details:
Maintenance & Community: Primarily maintained by Michael A. Kuykendall, who seeks sponsorship via GitHub Sponsors. Community interaction occurs through GitHub Issues and Discussions.
Licensing & Compatibility: Licensed under the permissive MIT License, ensuring free, perpetual use and compatibility with commercial/closed-source applications. Verified compatibility includes Intel/Apple Silicon Macs with Metal GPU acceleration.
Limitations & Caveats:
The pre-built Windows binary may trigger Defender false positives. npm (shimmy-js
) and Python (pip install shimmy
) integrations are "coming soon." High bus factor due to single maintainer.
23 hours ago
Inactive