openvino.genai  by openvinotoolkit

OpenVINO GenAI is a library for running generative AI models

created 1 year ago
312 stars

Top 87.5% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides a unified C++/Python API for running popular Generative AI models, including LLMs, diffusion models, and speech recognition models, optimized for local execution on CPUs and GPUs. It targets developers and researchers seeking efficient, low-resource inference for tasks like text generation, image creation, and speech-to-text.

How It Works

The library leverages OpenVINO Runtime for high-performance inference across various hardware. It integrates state-of-the-art optimizations like speculative decoding and KVCache token eviction for LLMs, and supports features like LoRA adapter loading and continuous batching for serving. Models are converted and optimized using optimum-cli, with support for quantization (FP16, INT4, INT8).

Quick Start & Requirements

  • Install via pip: pip install openvino-genai and pip install optimum-intel@git+https://github.com/huggingface/optimum-intel.git.
  • Model conversion requires optimum-cli.
  • C++ usage requires a compatible C++ package installation.
  • See Generative AI workflow and OpenVINO Notebooks for samples.

Highlighted Details

  • Supports text generation (LLMs), image generation (Stable Diffusion), visual language models (LLaVa), and speech recognition (Whisper).
  • Integrates advanced LLM optimizations: speculative decoding, KVCache eviction, prefix caching.
  • Enables LoRA adapter loading and mixing for text and image generation.
  • Offers continuous batching for LLM serving via OpenVINO Model Server.

Maintenance & Community

  • Developed by the OpenVINO Toolkit team.
  • Samples and workflows are available via OpenVINO Notebooks.

Licensing & Compatibility

  • Licensed under Apache License Version 2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The README mentions "TBD" for Model Scope support and provides links to C++ installation details that may require additional setup beyond the basic pip install. Some models may work but have not been officially tested.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
142
Issues (30d)
11
Star History
52 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

ktransformers by kvcache-ai

0.4%
15k
Framework for LLM inference optimization experimentation
created 1 year ago
updated 3 days ago
Feedback? Help us improve.