FastFlowLM by FastFlowLM

LLM inference optimized for AMD Ryzen™ AI NPUs

Created 5 months ago

479 stars

Top 63.9% on SourcePulse

Project Summary

FastFlowLM (FLM) provides a purpose-built, NPU-first runtime for executing large language models (LLMs) and vision models on AMD Ryzen™ AI NPUs. Targeting users seeking efficient, local AI inference without relying on discrete GPUs, FLM offers a significantly more power-efficient and faster alternative to traditional CPU/GPU-based solutions, enabling LLM deployment on consumer hardware.

How It Works

FLM leverages AMD's XDNA2 NPU architecture (found in Strix, Strix Halo, and Kraken series chips) for accelerated inference. It functions as an NPU-optimized runtime, inspired by Ollama's user-friendly interface and command-line experience, but deeply tailored for NPU performance. This approach bypasses the need for model rewrites or low-level tuning, allowing users to run models directly with enhanced speed and power efficiency.

Quick Start & Requirements

Installation: A packaged FLM Windows installer (flm-setup.exe) is available.
Prerequisites: AMD Ryzen™ AI Series chip with XDNA2 NPU. NPU driver version 32.0.203.258 or later is required. Internet access is needed for downloading optimized model kernels from HuggingFace.
Resource Footprint: Ultra-lightweight runtime (14 MB), installs within 20 seconds.
Links: Download, Docs, Discord.

Highlighted Details

Runs exclusively on AMD Ryzen™ AI NPUs, requiring no GPU or CPU load.
Supports context lengths up to 256k tokens.
Offers both CLI and server modes (REST and OpenAI API compatible).
Achieves over 10x power efficiency compared to alternatives.
Supports Vision and Mixture-of-Experts (MoE) models.

Maintenance & Community

FLM has been integrated into AMD's Lemonade Server. Community support and feedback are available via their Discord channel and by opening issues on GitHub.

Licensing & Compatibility

The orchestration code and CLI tools are open-source under the MIT License. However, the NPU-accelerated kernels are proprietary binaries, provided free for non-commercial use only. Commercial use requires contacting info@fastflowlm.com for licensing. Non-commercial users must acknowledge FastFlowLM in their projects.

Limitations & Caveats

The core NPU-accelerated kernels are proprietary and restricted to non-commercial use. The project specifically targets AMD Ryzen™ AI NPUs and requires a minimum NPU driver version, potentially limiting compatibility with older hardware or systems without the specified AMD NPUs.

FastFlowLM by FastFlowLM

Explore Similar Projects

MoE-Infinity by EfficientMoE

flashtensors by leoheuler

glake by antgroup

Nanoflow by efeslab

ROCmLibs-for-gfx1103-AMD780M-APU by likelovewant

yalm by andrewkchan

amd-strix-halo-toolboxes by kyuz0

marlin by IST-DASLab

optiml by NU-QRG

RyzenAI-SW by amd

ollm by Mega4alik

fastllm by ztxz16