shimmy by Michael-A-Kuykendall

A lightweight, local-first AI inference server

Created 4 months ago

3,528 stars

Top 13.7% on SourcePulse

Project Summary

Summary: Shimmy offers a highly efficient, local-first AI inference server with OpenAI API compatibility, targeting developers seeking private, cost-effective, and fast AI model execution. It addresses the large binary sizes and slow startup times of alternatives, providing a 5MB single-binary Rust application for sub-second responses and minimal resource overhead.

How It Works: Built with Rust and Tokio, Shimmy uses the llama.cpp backend for GGUF model inference. Its design prioritizes extreme resource efficiency: a minimal 5.1MB binary, sub-100ms startup, and under 50MB memory usage. It automatically discovers models from common locations and manages ports dynamically, simplifying integration via OpenAI API compatibility and first-class LoRA adapter support.

Quick Start & Requirements:

Installation: Install via cargo install shimmy (Linux, macOS, Windows). Windows users should prefer cargo install over the pre-built binary due to potential Defender false positives. macOS requires brew install cmake rust prior to cargo install.
Prerequisites: Rust toolchain; CMake and Rust for macOS. Metal GPU acceleration supported on macOS.
Setup Time: Claimed 30 seconds.
Links: GitHub Releases: https://github.com/Michael-A-Kuykendall/shimmy/releases. Sponsorship: https://github.com/sponsors/Michael-A-Kuykendall.

Highlighted Details:

Resource Efficiency: Features a 5.1MB binary, <100ms startup, and <50MB memory, significantly outperforming Ollama and llama.cpp.
OpenAI API Compatibility: 100% compatible for seamless integration with tools like VSCode, Cursor, and Continue.dev.
LoRA Support: First-class support for LoRA adapters, enabling rapid integration from training to production.
Auto-Discovery & Port Management: Automatically finds models and allocates ports, simplifying setup.

Maintenance & Community: Primarily maintained by Michael A. Kuykendall, who seeks sponsorship via GitHub Sponsors. Community interaction occurs through GitHub Issues and Discussions.

Licensing & Compatibility: Licensed under the permissive MIT License, ensuring free, perpetual use and compatibility with commercial/closed-source applications. Verified compatibility includes Intel/Apple Silicon Macs with Metal GPU acceleration.

Limitations & Caveats: The pre-built Windows binary may trigger Defender false positives. npm (shimmy-js) and Python (pip install shimmy) integrations are "coming soon." High bus factor due to single maintainer.

shimmy by Michael-A-Kuykendall

Explore Similar Projects

OpenArc by SearchSavior

Kolosal by KolosalAI

mlx-omni-server by madroidmaq

local.ai by louisgv

MaixPy by sipeed

cortex.cpp by janhq

ort by pykeio

Foundry-Local by microsoft

LitServe by Lightning-AI

nexa-sdk by NexaAI

jan by janhq

LocalAI by mudler