uzu by trymirai

High-performance inference engine for Apple Silicon

Created 6 months ago

1,403 stars

Top 28.7% on SourcePulse

View on GitHub

2 Experts Love This Project

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Charlie Marsh

Founder of Astral; Author of Ruff, uv

Project Summary

Uzu is a high-performance inference engine designed for AI models on Apple Silicon, offering a simple API and a hybrid architecture for efficient computation. It targets developers and researchers working with AI models on macOS and iOS, enabling faster inference through optimized Metal and MPSGraph utilization.

How It Works

Uzu employs a hybrid architecture, allowing layers to be computed either as GPU kernels or via MPSGraph, which provides low-level access to Apple's Neural Engine (ANE). This approach, combined with unified memory on Apple devices, aims to maximize performance. It also features traceable computations for verifying correctness against source implementations and a unified model configuration system for easier model integration.

Quick Start & Requirements

Install: Add uzu as a dependency in your Cargo.toml or use the prebuilt Swift framework.
Prerequisites: Apple Silicon Mac, Rust toolchain.
Model Conversion: Use the lalamo tool (included) to convert models to Uzu's format (e.g., uv run lalamo convert meta-llama/Llama-3.2-1B-Instruct --precision float16).
Documentation: https://github.com/trymirai/uzu

Highlighted Details

Achieves competitive inference speeds, often outperforming llama.cpp on Apple Silicon for various models (e.g., 35.17 tokens/s for Llama-3.2-1B-Instruct on M2).
Supports model conversion and provides a CLI for running and serving models.
Offers a Swift framework for easier integration into macOS/iOS applications.
Utilizes bf16/f16 precision for performance benchmarks.

Maintenance & Community

No specific community links or contributor details are provided in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

Uzu uses its own model format, requiring a conversion step for existing models. Performance comparisons are based on specific configurations and may vary. The README notes that comparing quantized models directly can be complex due to differing quantization methods.

Health Check

Last Commit

18 hours ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

24 stars in the last 30 days