zinc by zolotukhin

Efficient LLM inference for AMD GPUs and Apple Silicon

Created 3 months ago

456 stars

Top 65.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Luis Capelo

Cofounder of Lightning AI

Project Summary

Summary

ZINC addresses the challenge of running local Large Language Models (LLMs) efficiently on consumer AMD GPUs and Apple Silicon, platforms often underserved by existing inference engines. It provides a single, self-contained binary solution for users seeking high performance without complex dependencies like ROCm or Python. The project targets engineers and power users who want to leverage their hardware for LLM inference.

How It Works

This project is built entirely in Zig, compiling to a single, dependency-free binary. It leverages Vulkan for AMD GPUs on Linux and Metal for Apple Silicon on macOS, employing hand-tuned shaders specifically optimized for each architecture's strengths. ZINC automatically selects the correct backend at build time, offering an OpenAI-compatible API and an integrated browser-based chat UI for ease of use.

Quick Start & Requirements

Primary Install/Run: Requires Zig 0.15.2+ and Vulkan SDK. Build with zig build -Doptimize=ReleaseFast. Run preflight checks with ./zig-out/bin/zinc --check. Model management commands include list, pull, use, rm. Execute inference via ./zig-out/bin/zinc --model-id <model_id> --prompt "..." or launch the chat UI with ./zig-out/bin/zinc chat.
Prerequisites: Zig 0.15.2+, Vulkan loader/tools (Linux), glslc (Linux), Bun (for tests/docs).
Links: Zig downloads: ziglang.org/download.

Highlighted Details

Achieves ~38 tokens/sec on a Qwen3.5-35B-A3B-UD model on AMD RDNA4 (32GB) using ReleaseFast build.
Supports AMD RDNA3/RDNA4 GPUs on Linux and Apple Silicon (M1-M5) on macOS.
Eliminates dependencies on ROCm, CUDA, and Python, offering a single binary solution.
Provides an OpenAI-compatible API endpoint (/v1) and a built-in browser chat interface.
Focuses on a curated list of GGUF models, including Qwen3.5 variants.

Maintenance & Community

Recent validation snapshots (e.g., 2026-03-31) indicate active development. The README mentions CONTRIBUTING.md and a Code of Conduct, suggesting established development practices, though explicit community links (Discord/Slack) are not provided.

Licensing & Compatibility

License: MIT.
Compatibility: The MIT license is highly permissive, allowing for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The project is explicitly marked as "Still Rough." Key features like continuous batching and multi-tenant serving are still under development. Performance tuning for Apple Silicon is ongoing, with the RDNA4 path being more mature. The list of supported GGUF models is intentionally narrow.

Health Check

Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

54 stars in the last 30 days