openvino.genai  by openvinotoolkit

OpenVINO GenAI is a library for running generative AI models

Created 2 years ago
440 stars

Top 67.9% on SourcePulse

GitHubView on GitHub
Project Summary

This library provides a unified C++/Python API for running popular Generative AI models, including LLMs, diffusion models, and speech recognition models, optimized for local execution on CPUs and GPUs. It targets developers and researchers seeking efficient, low-resource inference for tasks like text generation, image creation, and speech-to-text.

How It Works

The library leverages OpenVINO Runtime for high-performance inference across various hardware. It integrates state-of-the-art optimizations like speculative decoding and KVCache token eviction for LLMs, and supports features like LoRA adapter loading and continuous batching for serving. Models are converted and optimized using optimum-cli, with support for quantization (FP16, INT4, INT8).

Quick Start & Requirements

  • Install via pip: pip install openvino-genai and pip install optimum-intel@git+https://github.com/huggingface/optimum-intel.git.
  • Model conversion requires optimum-cli.
  • C++ usage requires a compatible C++ package installation.
  • See Generative AI workflow and OpenVINO Notebooks for samples.

Highlighted Details

  • Supports text generation (LLMs), image generation (Stable Diffusion), visual language models (LLaVa), and speech recognition (Whisper).
  • Integrates advanced LLM optimizations: speculative decoding, KVCache eviction, prefix caching.
  • Enables LoRA adapter loading and mixing for text and image generation.
  • Offers continuous batching for LLM serving via OpenVINO Model Server.

Maintenance & Community

  • Developed by the OpenVINO Toolkit team.
  • Samples and workflows are available via OpenVINO Notebooks.

Licensing & Compatibility

  • Licensed under Apache License Version 2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The README mentions "TBD" for Model Scope support and provides links to C++ installation details that may require additional setup beyond the basic pip install. Some models may work but have not been officially tested.

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
170
Issues (30d)
21
Star History
18 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI) and Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory).

ZhiLight by zhihu

0%
906
LLM inference engine for Llama and variants, optimized for PCIe GPUs
Created 1 year ago
Updated 1 day ago
Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Elvis Saravia Elvis Saravia(Founder of DAIR.AI), and
2 more.

vllm-omni by vllm-project

1.6%
3k
Omni-modality model inference and serving framework
Created 5 months ago
Updated 22 hours ago
Feedback? Help us improve.