amd-strix-halo-toolboxes by kyuz0

LLM inference toolboxes for AMD Ryzen AI Max

Created 7 months ago

1,026 stars

Top 36.3% on SourcePulse

Project Summary

Summary

This project provides pre-built containerized environments ("toolboxes") for running Large Language Models (LLMs) on AMD Ryzen AI Max “Strix Halo” integrated GPUs. It targets engineers and power users seeking a reproducible, flexible way to leverage AMD hardware for LLM inference using Llama.cpp across various compute backends.

How It Works

The project uses Toolbx containers for isolated LLM inference, powered by Llama.cpp. It supports multiple AMD backends: Vulkan (RADV, AMDVLK) and ROCm. This offers flexibility in choosing between stability, performance, or newer ROCm features, ensuring seamless integration and host system cleanliness. Containers automatically update with Llama.cpp changes.

Quick Start & Requirements

Installation: Create toolboxes with toolbox create (e.g., `docker.io/kyuz0

Health Check

Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)

5

Issues (30d)

7

Star History

199 stars in the last 30 days

Explore Similar Projects

FastFlowLM by FastFlowLM

LLM inference optimized for AMD Ryzen™ AI NPUs

Created 8 months ago

Updated 1 day ago

Starred by

Woosuk Kwon

Woosuk Kwon(Coauthor of vLLM),

Ying Sheng

Ying Sheng(Coauthor of SGLang), and

2 more.

Nanoflow by efeslab

LLM serving framework for high throughput

Created 1 year ago

Updated 3 months ago

sarathi-serve by microsoft

LLM serving engine for low-latency & high-throughput inference (OSDI'24 paper)

Created 2 years ago

Updated 1 month ago

ServerlessLLM by ServerlessLLM

Open-source framework for serverless LLM deployment

Created 2 years ago

Updated 3 days ago

picolm by RightNow-AI

Ultra-lightweight LLM inference for embedded systems

Created 1 week ago

Updated 3 days ago

Starred by

Ying Sheng

Ying Sheng(Coauthor of SGLang).

gpu_poor by RahulSChand

CLI tool for LLM memory and throughput estimation

Created 2 years ago

Updated 1 year ago

kvcached by ovg-project

Virtualizing LLM KV cache for elastic GPU sharing

Created 9 months ago

Updated 2 days ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

LiteRT-LM by google-ai-edge

C++ library for efficient on-device LLM execution

Created 10 months ago

Updated 1 day ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Johannes Hagemann

Johannes Hagemann(Cofounder of Prime Intellect), and

4 more.

S-LoRA by S-LoRA

System for scalable LoRA adapter serving

Created 2 years ago

Updated 2 years ago

Starred by

Simon Willison

Simon Willison(Coauthor of Django) and

Lysandre Debut

Lysandre Debut(Chief Open-Source Officer at Hugging Face).

ollm by Mega4alik

Large-context LLM inference on consumer hardware

Created 6 months ago

Updated 2 months ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and

Ying Sheng

Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

High-performance C++ LLM inference library

Created 2 years ago

Updated 20 hours ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic),

Didier Lopes

Didier Lopes(Founder of OpenBB), and

6 more.

ipex-llm by intel

LLM acceleration library for Intel XPU (GPU, NPU, CPU)

Created 9 years ago

Updated 4 weeks ago

Feedback? Help us improve.