Olive  by microsoft

AI model optimization toolkit for ONNX Runtime

created 6 years ago
2,034 stars

Top 22.3% on sourcepulse

GitHubView on GitHub
Project Summary

Olive is an AI model optimization toolkit designed to simplify the finetuning, conversion, quantization, and optimization of models for efficient inferencing across various hardware targets (CPUs, GPUs, NPUs). It targets ML engineers and researchers seeking to reduce manual effort in model optimization by automating the selection and application of over 40 built-in optimization techniques, integrating seamlessly with Hugging Face and Azure AI.

How It Works

Olive operates by composing a pipeline of optimization techniques based on user-defined targets and constraints like accuracy and latency. It leverages ONNX Runtime as the core inference engine and supports a wide array of optimization components, including model compression, quantization (e.g., AWQ, RTN), and compilation. This approach automates the complex, trial-and-error process of finding optimal model configurations for specific hardware.

Quick Start & Requirements

Highlighted Details

  • Supports automatic optimization of popular SLMs like Llama, Phi, Qwen, and Gemma.
  • Offers CLI for common optimization tasks and workflows for orchestrating transformations.
  • Enables compiling LoRA adapters for MultiLoRA serving.
  • Includes a caching mechanism for improved productivity.

Maintenance & Community

  • Developed and maintained by Microsoft.
  • Contributions welcome via GitHub Issues and Discussions.
  • Roadmap and community channels are available via GitHub.

Licensing & Compatibility

  • Licensed under the MIT License.
  • Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The README does not explicitly detail performance benchmarks or specific hardware compatibility beyond general CPU/GPU/NPU mentions. While it supports many popular models, optimizing custom architectures may require providing explicit input/output configurations.

Health Check
Last commit

19 hours ago

Responsiveness

Inactive

Pull Requests (30d)
112
Issues (30d)
4
Star History
157 stars in the last 90 days

Explore Similar Projects

Starred by Logan Kilpatrick Logan Kilpatrick(Product Lead on Google AI Studio), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

catalyst by catalyst-team

0%
3k
PyTorch framework for accelerated deep learning R&D
created 7 years ago
updated 1 month ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 20 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 9 hours ago
Feedback? Help us improve.