shimmy  by Michael-A-Kuykendall

A lightweight, local-first AI inference server

Created 3 weeks ago

New!

2,364 stars

Top 19.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary: Shimmy offers a highly efficient, local-first AI inference server with OpenAI API compatibility, targeting developers seeking private, cost-effective, and fast AI model execution. It addresses the large binary sizes and slow startup times of alternatives, providing a 5MB single-binary Rust application for sub-second responses and minimal resource overhead.

How It Works: Built with Rust and Tokio, Shimmy uses the llama.cpp backend for GGUF model inference. Its design prioritizes extreme resource efficiency: a minimal 5.1MB binary, sub-100ms startup, and under 50MB memory usage. It automatically discovers models from common locations and manages ports dynamically, simplifying integration via OpenAI API compatibility and first-class LoRA adapter support.

Quick Start & Requirements:

  • Installation: Install via cargo install shimmy (Linux, macOS, Windows). Windows users should prefer cargo install over the pre-built binary due to potential Defender false positives. macOS requires brew install cmake rust prior to cargo install.
  • Prerequisites: Rust toolchain; CMake and Rust for macOS. Metal GPU acceleration supported on macOS.
  • Setup Time: Claimed 30 seconds.
  • Links: GitHub Releases: https://github.com/Michael-A-Kuykendall/shimmy/releases. Sponsorship: https://github.com/sponsors/Michael-A-Kuykendall.

Highlighted Details:

  • Resource Efficiency: Features a 5.1MB binary, <100ms startup, and <50MB memory, significantly outperforming Ollama and llama.cpp.
  • OpenAI API Compatibility: 100% compatible for seamless integration with tools like VSCode, Cursor, and Continue.dev.
  • LoRA Support: First-class support for LoRA adapters, enabling rapid integration from training to production.
  • Auto-Discovery & Port Management: Automatically finds models and allocates ports, simplifying setup.

Maintenance & Community: Primarily maintained by Michael A. Kuykendall, who seeks sponsorship via GitHub Sponsors. Community interaction occurs through GitHub Issues and Discussions.

Licensing & Compatibility: Licensed under the permissive MIT License, ensuring free, perpetual use and compatibility with commercial/closed-source applications. Verified compatibility includes Intel/Apple Silicon Macs with Metal GPU acceleration.

Limitations & Caveats: The pre-built Windows binary may trigger Defender false positives. npm (shimmy-js) and Python (pip install shimmy) integrations are "coming soon." High bus factor due to single maintainer.

Health Check
Last Commit

23 hours ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
33
Star History
2,429 stars in the last 21 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

0.3%
4k
AI inference pipeline framework
Created 1 year ago
Updated 2 days ago
Feedback? Help us improve.