lemonade  by lemonade-sdk

Local LLM serving with hardware acceleration

created 2 months ago
332 stars

Top 83.7% on sourcepulse

GitHubView on GitHub
Project Summary

Lemonade is a local LLM serving solution designed for ease of use and high performance on consumer hardware, targeting developers and power users who want to run LLMs efficiently on their PCs. It leverages Vulkan GPU and NPU acceleration to maximize inference speed and responsiveness, offering a built-in chat interface and OpenAI-compatible API for seamless integration with existing applications.

How It Works

Lemonade utilizes a multi-engine approach, supporting llama.cpp, Hugging Face, and ONNX runtimes. Its core advantage lies in its ability to harness specialized hardware like AMD Ryzen AI NPUs and Vulkan-compatible GPUs, providing significant performance gains over CPU-only inference. The server architecture allows for easy switching between different model formats (GGUF, ONNX) and hardware acceleration backends at runtime.

Quick Start & Requirements

  • Install: GUI installer (Windows only), pip install lemonade-server, or from source.
  • Prerequisites: Vulkan-compatible GPU (focus on AMD Ryzen AI 7000/8000/300, Radeon 7000/9000 series), or AMD Ryzen AI 300 series NPU.
  • Usage: lemonade-server run <model_name> (e.g., Gemma-3-4b-it-GGUF).
  • Documentation: https://lemonade-server.ai/

Highlighted Details

  • Supports GGUF and ONNX model formats.
  • OpenAI-compatible API endpoint (http://localhost:8000/api/v1).
  • Includes a Python API and a CLI for model management, benchmarking, and profiling.
  • Actively seeking community contributions and app integrations.

Maintenance & Community

  • Sponsored by AMD.
  • Maintained by a team of four core contributors.
  • Community support via Discord and GitHub issues.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Compatible with OpenAI-compatible client libraries across various languages for commercial and closed-source applications.

Limitations & Caveats

The primary focus on AMD hardware, particularly Ryzen AI NPUs and specific Radeon GPUs, may limit performance or compatibility on non-AMD or older hardware. While Vulkan support is broad, optimal NPU acceleration is tied to newer AMD chipsets.

Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
19
Issues (30d)
47
Star History
351 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

gpustack by gpustack

1.6%
3k
GPU cluster manager for AI model deployment
created 1 year ago
updated 2 days ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
created 5 years ago
updated 3 weeks ago
Feedback? Help us improve.