llamacpp-rocm  by lemonade-sdk

GPU-accelerated LLM inference for AMD hardware

Created 8 months ago
333 stars

Top 82.5% on SourcePulse

GitHubView on GitHub
Project Summary

Fresh builds of llama.cpp with AMD ROCm™ 7 acceleration are provided by this project, targeting developers and users of AI applications needing high-performance inference on compatible AMD hardware. It aims to deliver cutting-edge, pre-compiled binaries with integrated ROCm runtimes, simplifying GPU acceleration setup for platforms like Lemonade and similar AI applications.

How It Works

This project leverages the efficient llama.cpp inference engine and integrates AMD's ROCm™ 7 platform for GPU acceleration. An automated GitHub Actions workflow compiles llama.cpp for Windows and Ubuntu, specifically targeting recent AMD GPU architectures (RDNA3, RDNA4, Ryzen AI). Each build includes the complete ROCm™ 7 runtime libraries, enabling a "just download and go" experience without requiring separate ROCm installation. This approach ensures users receive the freshest, cutting-edge builds with seamless integration for demanding AI workloads.

Quick Start & Requirements

Download the appropriate pre-built release for your GPU target from the project's releases page. Extract the archive and test inference using a GGUF model with the command: llama-server -m YOUR_GGUF_MODEL_PATH -ngl 99. This project requires specific AMD GPU architectures (gfx1151, gfx1150, gfx120X, gfx110X) and supports Windows and Ubuntu. Core dependencies include llama.cpp and ROCm SDK (bundled). Manual build instructions are available at docs/manual_instructions.md.

Highlighted Details

  • Automated nightly builds of llama.cpp with ROCm™ 7 acceleration.
  • Broad support for recent AMD GPUs including RDNA3, RDNA4, and Ryzen AI APUs.
  • ROCm™ 7 runtime libraries are bundled directly into releases.
  • Targets seamless integration with AI applications like Lemonade.

Maintenance & Community

The project is under active development, with a stated goal to contribute to the llama.cpp+ROCm ecosystem. While not set up for comprehensive technical support, collaborations, idea exchanges, and contributions are welcomed. Specific community channels or roadmaps are not detailed in the README.

Licensing & Compatibility

This project is licensed under the permissive MIT License. This license generally allows for commercial use and integration into closed-source applications without significant restrictions.

Limitations & Caveats

The project is under active development, meaning code and artifact structures are subject to change. Comprehensive technical support is not provided. Specific AMD GPU architectures are targeted, and users may need to configure kernel parameters (e.g., ttm.pages_limit) for optimal performance on certain APU models under Linux.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
6
Issues (30d)
7
Star History
99 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.1%
4k
High-performance C++ LLM inference library
Created 2 years ago
Updated 2 days ago
Feedback? Help us improve.