FastFlowLM  by FastFlowLM

LLM inference optimized for AMD Ryzen™ AI NPUs

Created 4 months ago
329 stars

Top 83.0% on SourcePulse

GitHubView on GitHub
Project Summary

FastFlowLM (FLM) provides a purpose-built, NPU-first runtime for executing large language models (LLMs) and vision models on AMD Ryzen™ AI NPUs. Targeting users seeking efficient, local AI inference without relying on discrete GPUs, FLM offers a significantly more power-efficient and faster alternative to traditional CPU/GPU-based solutions, enabling LLM deployment on consumer hardware.

How It Works

FLM leverages AMD's XDNA2 NPU architecture (found in Strix, Strix Halo, and Kraken series chips) for accelerated inference. It functions as an NPU-optimized runtime, inspired by Ollama's user-friendly interface and command-line experience, but deeply tailored for NPU performance. This approach bypasses the need for model rewrites or low-level tuning, allowing users to run models directly with enhanced speed and power efficiency.

Quick Start & Requirements

  • Installation: A packaged FLM Windows installer (flm-setup.exe) is available.
  • Prerequisites: AMD Ryzen™ AI Series chip with XDNA2 NPU. NPU driver version 32.0.203.258 or later is required. Internet access is needed for downloading optimized model kernels from HuggingFace.
  • Resource Footprint: Ultra-lightweight runtime (14 MB), installs within 20 seconds.
  • Links: Download, Docs, Discord.

Highlighted Details

  • Runs exclusively on AMD Ryzen™ AI NPUs, requiring no GPU or CPU load.
  • Supports context lengths up to 256k tokens.
  • Offers both CLI and server modes (REST and OpenAI API compatible).
  • Achieves over 10x power efficiency compared to alternatives.
  • Supports Vision and Mixture-of-Experts (MoE) models.

Maintenance & Community

FLM has been integrated into AMD's Lemonade Server. Community support and feedback are available via their Discord channel and by opening issues on GitHub.

Licensing & Compatibility

The orchestration code and CLI tools are open-source under the MIT License. However, the NPU-accelerated kernels are proprietary binaries, provided free for non-commercial use only. Commercial use requires contacting info@fastflowlm.com for licensing. Non-commercial users must acknowledge FastFlowLM in their projects.

Limitations & Caveats

The core NPU-accelerated kernels are proprietary and restricted to non-commercial use. The project specifically targets AMD Ryzen™ AI NPUs and requires a minimum NPU driver version, potentially limiting compatibility with older hardware or systems without the specified AMD NPUs.

Health Check
Last Commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
32
Issues (30d)
8
Star History
136 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.8%
4k
High-performance C++ LLM inference library
Created 2 years ago
Updated 2 weeks ago
Feedback? Help us improve.