lemonade by lemonade-sdk

Local LLM serving with hardware acceleration

Created 8 months ago

1,985 stars

Top 22.0% on SourcePulse

View on GitHub

3 Experts Love This Project

Travis Fischer

Founder of Agentic

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Tim J. Baek

Founder of Open WebUI

Project Summary

Lemonade is a local LLM serving solution designed for ease of use and high performance on consumer hardware, targeting developers and power users who want to run LLMs efficiently on their PCs. It leverages Vulkan GPU and NPU acceleration to maximize inference speed and responsiveness, offering a built-in chat interface and OpenAI-compatible API for seamless integration with existing applications.

How It Works

Lemonade utilizes a multi-engine approach, supporting llama.cpp, Hugging Face, and ONNX runtimes. Its core advantage lies in its ability to harness specialized hardware like AMD Ryzen AI NPUs and Vulkan-compatible GPUs, providing significant performance gains over CPU-only inference. The server architecture allows for easy switching between different model formats (GGUF, ONNX) and hardware acceleration backends at runtime.

Quick Start & Requirements

Install: GUI installer (Windows only), pip install lemonade-server, or from source.
Prerequisites: Vulkan-compatible GPU (focus on AMD Ryzen AI 7000/8000/300, Radeon 7000/9000 series), or AMD Ryzen AI 300 series NPU.
Usage: lemonade-server run <model_name> (e.g., Gemma-3-4b-it-GGUF).
Documentation: https://lemonade-server.ai/

Highlighted Details

Supports GGUF and ONNX model formats.
OpenAI-compatible API endpoint (http://localhost:8000/api/v1).
Includes a Python API and a CLI for model management, benchmarking, and profiling.
Actively seeking community contributions and app integrations.

Maintenance & Community

Sponsored by AMD.
Maintained by a team of four core contributors.
Community support via Discord and GitHub issues.

Licensing & Compatibility

Licensed under Apache 2.0.
Compatible with OpenAI-compatible client libraries across various languages for commercial and closed-source applications.

Limitations & Caveats

The primary focus on AMD hardware, particularly Ryzen AI NPUs and specific Radeon GPUs, may limit performance or compatibility on non-AMD or older hardware. While Vulkan support is broad, optimal NPU acceleration is tied to newer AMD chipsets.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

147 stars in the last 30 days