High-performance inference engine for Apple Silicon
Top 35.1% on sourcepulse
Uzu is a high-performance inference engine designed for AI models on Apple Silicon, offering a simple API and a hybrid architecture for efficient computation. It targets developers and researchers working with AI models on macOS and iOS, enabling faster inference through optimized Metal and MPSGraph utilization.
How It Works
Uzu employs a hybrid architecture, allowing layers to be computed either as GPU kernels or via MPSGraph, which provides low-level access to Apple's Neural Engine (ANE). This approach, combined with unified memory on Apple devices, aims to maximize performance. It also features traceable computations for verifying correctness against source implementations and a unified model configuration system for easier model integration.
Quick Start & Requirements
uzu
as a dependency in your Cargo.toml
or use the prebuilt Swift framework.lalamo
tool (included) to convert models to Uzu's format (e.g., uv run lalamo convert meta-llama/Llama-3.2-1B-Instruct --precision float16
).Highlighted Details
llama.cpp
on Apple Silicon for various models (e.g., 35.17 tokens/s for Llama-3.2-1B-Instruct on M2).Maintenance & Community
No specific community links or contributor details are provided in the README.
Licensing & Compatibility
Limitations & Caveats
Uzu uses its own model format, requiring a conversion step for existing models. Performance comparisons are based on specific configurations and may vary. The README notes that comparing quantized models directly can be complex due to differing quantization methods.
4 days ago
Inactive