shimmy  by Michael-A-Kuykendall

A lightweight, local-first AI inference server

Created 8 months ago
4,651 stars

Top 10.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary: Shimmy offers a highly efficient, local-first AI inference server with OpenAI API compatibility, targeting developers seeking private, cost-effective, and fast AI model execution. It addresses the large binary sizes and slow startup times of alternatives, providing a 5MB single-binary Rust application for sub-second responses and minimal resource overhead.

How It Works: Built with Rust and Tokio, Shimmy uses the llama.cpp backend for GGUF model inference. Its design prioritizes extreme resource efficiency: a minimal 5.1MB binary, sub-100ms startup, and under 50MB memory usage. It automatically discovers models from common locations and manages ports dynamically, simplifying integration via OpenAI API compatibility and first-class LoRA adapter support.

Quick Start & Requirements:

  • Installation: Install via cargo install shimmy (Linux, macOS, Windows). Windows users should prefer cargo install over the pre-built binary due to potential Defender false positives. macOS requires brew install cmake rust prior to cargo install.
  • Prerequisites: Rust toolchain; CMake and Rust for macOS. Metal GPU acceleration supported on macOS.
  • Setup Time: Claimed 30 seconds.
  • Links: GitHub Releases: https://github.com/Michael-A-Kuykendall/shimmy/releases. Sponsorship: https://github.com/sponsors/Michael-A-Kuykendall.

Highlighted Details:

  • Resource Efficiency: Features a 5.1MB binary, <100ms startup, and <50MB memory, significantly outperforming Ollama and llama.cpp.
  • OpenAI API Compatibility: 100% compatible for seamless integration with tools like VSCode, Cursor, and Continue.dev.
  • LoRA Support: First-class support for LoRA adapters, enabling rapid integration from training to production.
  • Auto-Discovery & Port Management: Automatically finds models and allocates ports, simplifying setup.

Maintenance & Community: Primarily maintained by Michael A. Kuykendall, who seeks sponsorship via GitHub Sponsors. Community interaction occurs through GitHub Issues and Discussions.

Licensing & Compatibility: Licensed under the permissive MIT License, ensuring free, perpetual use and compatibility with commercial/closed-source applications. Verified compatibility includes Intel/Apple Silicon Macs with Metal GPU acceleration.

Limitations & Caveats: The pre-built Windows binary may trigger Defender false positives. npm (shimmy-js) and Python (pip install shimmy) integrations are "coming soon." High bus factor due to single maintainer.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
6
Issues (30d)
3
Star History
797 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.