OpenVINO GenAI is a library for running generative AI models
Top 87.5% on sourcepulse
This library provides a unified C++/Python API for running popular Generative AI models, including LLMs, diffusion models, and speech recognition models, optimized for local execution on CPUs and GPUs. It targets developers and researchers seeking efficient, low-resource inference for tasks like text generation, image creation, and speech-to-text.
How It Works
The library leverages OpenVINO Runtime for high-performance inference across various hardware. It integrates state-of-the-art optimizations like speculative decoding and KVCache token eviction for LLMs, and supports features like LoRA adapter loading and continuous batching for serving. Models are converted and optimized using optimum-cli
, with support for quantization (FP16, INT4, INT8).
Quick Start & Requirements
pip install openvino-genai
and pip install optimum-intel@git+https://github.com/huggingface/optimum-intel.git
.optimum-cli
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README mentions "TBD" for Model Scope support and provides links to C++ installation details that may require additional setup beyond the basic pip install. Some models may work but have not been officially tested.
1 day ago
1 day