lemonade  by lemonade-sdk

Local LLM serving with hardware acceleration

Created 4 months ago
1,271 stars

Top 31.2% on SourcePulse

GitHubView on GitHub
Project Summary

Lemonade is a local LLM serving solution designed for ease of use and high performance on consumer hardware, targeting developers and power users who want to run LLMs efficiently on their PCs. It leverages Vulkan GPU and NPU acceleration to maximize inference speed and responsiveness, offering a built-in chat interface and OpenAI-compatible API for seamless integration with existing applications.

How It Works

Lemonade utilizes a multi-engine approach, supporting llama.cpp, Hugging Face, and ONNX runtimes. Its core advantage lies in its ability to harness specialized hardware like AMD Ryzen AI NPUs and Vulkan-compatible GPUs, providing significant performance gains over CPU-only inference. The server architecture allows for easy switching between different model formats (GGUF, ONNX) and hardware acceleration backends at runtime.

Quick Start & Requirements

  • Install: GUI installer (Windows only), pip install lemonade-server, or from source.
  • Prerequisites: Vulkan-compatible GPU (focus on AMD Ryzen AI 7000/8000/300, Radeon 7000/9000 series), or AMD Ryzen AI 300 series NPU.
  • Usage: lemonade-server run <model_name> (e.g., Gemma-3-4b-it-GGUF).
  • Documentation: https://lemonade-server.ai/

Highlighted Details

  • Supports GGUF and ONNX model formats.
  • OpenAI-compatible API endpoint (http://localhost:8000/api/v1).
  • Includes a Python API and a CLI for model management, benchmarking, and profiling.
  • Actively seeking community contributions and app integrations.

Maintenance & Community

  • Sponsored by AMD.
  • Maintained by a team of four core contributors.
  • Community support via Discord and GitHub issues.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Compatible with OpenAI-compatible client libraries across various languages for commercial and closed-source applications.

Limitations & Caveats

The primary focus on AMD hardware, particularly Ryzen AI NPUs and specific Radeon GPUs, may limit performance or compatibility on non-AMD or older hardware. While Vulkan support is broad, optimal NPU acceleration is tied to newer AMD chipsets.

Health Check
Last Commit

22 hours ago

Responsiveness

1 day

Pull Requests (30d)
72
Issues (30d)
82
Star History
589 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.