uzu  by trymirai

High-performance inference engine for Apple Silicon

Created 2 months ago
1,312 stars

Top 30.5% on SourcePulse

GitHubView on GitHub
Project Summary

Uzu is a high-performance inference engine designed for AI models on Apple Silicon, offering a simple API and a hybrid architecture for efficient computation. It targets developers and researchers working with AI models on macOS and iOS, enabling faster inference through optimized Metal and MPSGraph utilization.

How It Works

Uzu employs a hybrid architecture, allowing layers to be computed either as GPU kernels or via MPSGraph, which provides low-level access to Apple's Neural Engine (ANE). This approach, combined with unified memory on Apple devices, aims to maximize performance. It also features traceable computations for verifying correctness against source implementations and a unified model configuration system for easier model integration.

Quick Start & Requirements

  • Install: Add uzu as a dependency in your Cargo.toml or use the prebuilt Swift framework.
  • Prerequisites: Apple Silicon Mac, Rust toolchain.
  • Model Conversion: Use the lalamo tool (included) to convert models to Uzu's format (e.g., uv run lalamo convert meta-llama/Llama-3.2-1B-Instruct --precision float16).
  • Documentation: https://github.com/trymirai/uzu

Highlighted Details

  • Achieves competitive inference speeds, often outperforming llama.cpp on Apple Silicon for various models (e.g., 35.17 tokens/s for Llama-3.2-1B-Instruct on M2).
  • Supports model conversion and provides a CLI for running and serving models.
  • Offers a Swift framework for easier integration into macOS/iOS applications.
  • Utilizes bf16/f16 precision for performance benchmarks.

Maintenance & Community

No specific community links or contributor details are provided in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

Uzu uses its own model format, requiring a conversion step for existing models. Performance comparisons are based on specific configurations and may vary. The README notes that comparing quantized models directly can be complex due to differing quantization methods.

Health Check
Last Commit

23 hours ago

Responsiveness

Inactive

Pull Requests (30d)
7
Issues (30d)
2
Star History
55 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.