llamacpp-rocm by lemonade-sdk

GPU-accelerated LLM inference for AMD hardware

Created 1 year ago

593 stars

Top 54.2% on SourcePulse

Project Summary

Fresh builds of llama.cpp with AMD ROCm™ 7 acceleration are provided by this project, targeting developers and users of AI applications needing high-performance inference on compatible AMD hardware. It aims to deliver cutting-edge, pre-compiled binaries with integrated ROCm runtimes, simplifying GPU acceleration setup for platforms like Lemonade and similar AI applications.

How It Works

This project leverages the efficient llama.cpp inference engine and integrates AMD's ROCm™ 7 platform for GPU acceleration. An automated GitHub Actions workflow compiles llama.cpp for Windows and Ubuntu, specifically targeting recent AMD GPU architectures (RDNA3, RDNA4, Ryzen AI). Each build includes the complete ROCm™ 7 runtime libraries, enabling a "just download and go" experience without requiring separate ROCm installation. This approach ensures users receive the freshest, cutting-edge builds with seamless integration for demanding AI workloads.

Quick Start & Requirements

Download the appropriate pre-built release for your GPU target from the project's releases page. Extract the archive and test inference using a GGUF model with the command: llama-server -m YOUR_GGUF_MODEL_PATH -ngl 99. This project requires specific AMD GPU architectures (gfx1151, gfx1150, gfx120X, gfx110X) and supports Windows and Ubuntu. Core dependencies include llama.cpp and ROCm SDK (bundled). Manual build instructions are available at docs/manual_instructions.md.

Highlighted Details

Automated nightly builds of llama.cpp with ROCm™ 7 acceleration.
Broad support for recent AMD GPUs including RDNA3, RDNA4, and Ryzen AI APUs.
ROCm™ 7 runtime libraries are bundled directly into releases.
Targets seamless integration with AI applications like Lemonade.

Maintenance & Community

The project is under active development, with a stated goal to contribute to the llama.cpp+ROCm ecosystem. While not set up for comprehensive technical support, collaborations, idea exchanges, and contributions are welcomed. Specific community channels or roadmaps are not detailed in the README.

Licensing & Compatibility

This project is licensed under the permissive MIT License. This license generally allows for commercial use and integration into closed-source applications without significant restrictions.

Limitations & Caveats

The project is under active development, meaning code and artifact structures are subject to change. Comprehensive technical support is not provided. Specific AMD GPU architectures are targeted, and users may need to configure kernel parameters (e.g., ttm.pages_limit) for optimal performance on certain APU models under Linux.

llamacpp-rocm by lemonade-sdk

Explore Similar Projects

zinc by zolotukhin

GPULlama3.java by beehive-lab

eLLM by lucienhuangfu

Nanoflow by efeslab

llama.cpp-deepseek-v4-flash by antirez

amd-strix-halo-vllm-toolboxes by kyuz0

FastFlowLM by FastFlowLM

amd-strix-halo-toolboxes by kyuz0

picolm by RightNow-AI

fastllm by ztxz16

LiteRT-LM by google-ai-edge

ipex-llm by intel