GenAI extension for running LLMs with ONNX Runtime
Top 46.2% on sourcepulse
This project provides Generative AI extensions for ONNX Runtime, enabling efficient on-device execution of Large Language Models (LLMs). It targets developers and researchers seeking a flexible and performant solution for LLM inference, offering a complete generative AI loop including pre/post-processing, inference, and sampling.
How It Works
The library implements the full generative AI loop for ONNX models. It handles tokenization, inference via ONNX Runtime, logits processing, search and sampling strategies, and KV cache management. This integrated approach simplifies the deployment of LLMs by abstracting complex pipeline components, allowing users to focus on model integration and application logic.
Quick Start & Requirements
pip install numpy
and pip install --pre onnxruntime-genai
Highlighted Details
Maintenance & Community
The project is actively maintained by Microsoft. Discussions for feature requests and community engagement are available via GitHub Discussions. Contributions are welcome, subject to a Contributor License Agreement (CLA).
Licensing & Compatibility
The project is released under a permissive license, allowing for commercial use and integration with closed-source applications.
Limitations & Caveats
A breaking API change occurred between release candidates 0.7.0-rc2 and release 0.7.0 in the tokenizer.encode
method, which now returns NumPy arrays instead of Python lists. Support for certain platforms (iOS) and hardware acceleration (ROCm) is under development or requires building from source. Examples in the main branch may not always align with the latest stable release.
1 day ago
1 day