ZML is a high-performance AI inference stack designed for production environments, enabling the deployment of diverse AI models across heterogeneous hardware. It targets AI engineers and researchers seeking a unified, efficient, and flexible platform for deploying models, particularly in distributed or multi-accelerator setups.
How It Works
ZML leverages the Zig programming language for its performance and safety features, combined with MLIR for flexible model compilation and optimization. The stack is built using Bazel for robust dependency management and cross-compilation. Its core advantage lies in its ability to abstract hardware complexities, allowing models to be compiled and run across different accelerators (NVIDIA, AMD, TPUs, etc.) with minimal code changes, facilitating distributed inference across geographically dispersed hardware.
Quick Start & Requirements
bazelisk
(macOS: brew install bazelisk
, Linux: download binary).cd examples && bazel run -c opt //mnist
cd examples && bazel run -c opt //llama:Llama-3.1-8B-Instruct -- --prompt="What is the capital of France?"
--@zml//runtimes:cuda=true
, --@zml//runtimes:rocm=true
, or --@zml//runtimes:tpu=true
to Bazel commands.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 day ago
1 day