zinc  by zolotukhin

Efficient LLM inference for AMD GPUs and Apple Silicon

Created 2 weeks ago

New!

297 stars

Top 89.4% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Summary

ZINC addresses the challenge of running local Large Language Models (LLMs) efficiently on consumer AMD GPUs and Apple Silicon, platforms often underserved by existing inference engines. It provides a single, self-contained binary solution for users seeking high performance without complex dependencies like ROCm or Python. The project targets engineers and power users who want to leverage their hardware for LLM inference.

How It Works

This project is built entirely in Zig, compiling to a single, dependency-free binary. It leverages Vulkan for AMD GPUs on Linux and Metal for Apple Silicon on macOS, employing hand-tuned shaders specifically optimized for each architecture's strengths. ZINC automatically selects the correct backend at build time, offering an OpenAI-compatible API and an integrated browser-based chat UI for ease of use.

Quick Start & Requirements

  • Primary Install/Run: Requires Zig 0.15.2+ and Vulkan SDK. Build with zig build -Doptimize=ReleaseFast. Run preflight checks with ./zig-out/bin/zinc --check. Model management commands include list, pull, use, rm. Execute inference via ./zig-out/bin/zinc --model-id <model_id> --prompt "..." or launch the chat UI with ./zig-out/bin/zinc chat.
  • Prerequisites: Zig 0.15.2+, Vulkan loader/tools (Linux), glslc (Linux), Bun (for tests/docs).
  • Links: Zig downloads: ziglang.org/download.

Highlighted Details

  • Achieves ~38 tokens/sec on a Qwen3.5-35B-A3B-UD model on AMD RDNA4 (32GB) using ReleaseFast build.
  • Supports AMD RDNA3/RDNA4 GPUs on Linux and Apple Silicon (M1-M5) on macOS.
  • Eliminates dependencies on ROCm, CUDA, and Python, offering a single binary solution.
  • Provides an OpenAI-compatible API endpoint (/v1) and a built-in browser chat interface.
  • Focuses on a curated list of GGUF models, including Qwen3.5 variants.

Maintenance & Community

Recent validation snapshots (e.g., 2026-03-31) indicate active development. The README mentions CONTRIBUTING.md and a Code of Conduct, suggesting established development practices, though explicit community links (Discord/Slack) are not provided.

Licensing & Compatibility

  • License: MIT.
  • Compatibility: The MIT license is highly permissive, allowing for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The project is explicitly marked as "Still Rough." Key features like continuous batching and multi-tenant serving are still under development. Performance tuning for Apple Silicon is ongoing, with the RDNA4 path being more mature. The list of supported GGUF models is intentionally narrow.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
3
Star History
298 stars in the last 18 days

Explore Similar Projects

Feedback? Help us improve.