Olive by microsoft

AI model optimization toolkit for ONNX Runtime

Created 6 years ago

2,224 stars

Top 20.2% on SourcePulse

View on GitHub

3 Experts Love This Project

Alex Chen

Cofounder of Nexa AI

Jeff Hammerbacher

Cofounder of Cloudera

Philipp Schmid

DevRel at Google DeepMind

Project Summary

Olive is an AI model optimization toolkit designed to simplify the finetuning, conversion, quantization, and optimization of models for efficient inferencing across various hardware targets (CPUs, GPUs, NPUs). It targets ML engineers and researchers seeking to reduce manual effort in model optimization by automating the selection and application of over 40 built-in optimization techniques, integrating seamlessly with Hugging Face and Azure AI.

How It Works

Olive operates by composing a pipeline of optimization techniques based on user-defined targets and constraints like accuracy and latency. It leverages ONNX Runtime as the core inference engine and supports a wide array of optimization components, including model compression, quantization (e.g., AWQ, RTN), and compilation. This approach automates the complex, trial-and-error process of finding optimal model configurations for specific hardware.

Quick Start & Requirements

Install via pip: pip install olive-ai[auto-opt] and pip install transformers==4.44.2 onnxruntime-genai.
Requires Python 3.x.
Supports CPU and GPU inference.
Documentation: https://microsoft.github.io/Olive/
Notebooks: https://microsoft.github.io/Olive/notebooks/

Highlighted Details

Supports automatic optimization of popular SLMs like Llama, Phi, Qwen, and Gemma.
Offers CLI for common optimization tasks and workflows for orchestrating transformations.
Enables compiling LoRA adapters for MultiLoRA serving.
Includes a caching mechanism for improved productivity.

Maintenance & Community

Developed and maintained by Microsoft.
Contributions welcome via GitHub Issues and Discussions.
Roadmap and community channels are available via GitHub.

Licensing & Compatibility

Licensed under the MIT License.
Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The README does not explicitly detail performance benchmarks or specific hardware compatibility beyond general CPU/GPU/NPU mentions. While it supports many popular models, optimizing custom architectures may require providing explicit input/output configurations.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

23 stars in the last 30 days