OpenArc by SearchSavior

Local AI inference engine for Intel devices serving diverse models

Created 11 months ago

284 stars

Top 92.3% on SourcePulse

Project Summary

OpenArc is an inference engine designed for Intel devices, enabling local and private deployment of AI models. It serves Large Language Models (LLMs), Vision-Language Models (VLMs), Whisper, Kokoro-TTS, Embedding, and Reranker models through OpenAI-compatible API endpoints, powered by OpenVINO. This project offers a performant and accessible solution for leveraging Intel hardware for diverse AI workloads.

How It Works

The engine utilizes OpenVINO for optimized inference across Intel CPUs, GPUs, and NPUs. It exposes a suite of OpenAI-compatible endpoints, including /v1/chat/completions and /v1/audio/transcriptions, facilitating integration with existing AI application frameworks. Key architectural choices include multi-engine support, pipeline parallelism for multi-GPU setups, CPU offloading, and automatic model management, aiming for efficient resource utilization and high throughput on target hardware.

Quick Start & Requirements

Installation involves cloning the repository and using uv for dependency management (uv sync). Key dependencies include optimum-intel[openvino] and openvino-genai (nightly wheels recommended). OpenVINO requires device-specific drivers; consult the OpenVINO System Requirements page. An OPENARC_API_KEY environment variable is necessary. The CLI tool openarc is used for managing models (add, list, load, serve) and benchmarking (bench). Links to extensive documentation and OpenVINO resources are provided.

Highlighted Details

Supports a broad range of models: LLMs, VLMs, Whisper, Kokoro-TTS, Embeddings, and Rerankers.
Provides OpenAI compatible endpoints for common AI tasks.
Enables advanced hardware acceleration: Multi-GPU, NPU device support, pipeline parallelism, and CPU offload/hybrid modes.
Features llama-bench style benchmarking with SQLite metrics for performance analysis.
Includes support for OpenAI compatible tool calls with streaming and parallel parsing.

Maintenance & Community

OpenArc is noted as being under active development. A Discord community has formed around the project, offering support and collaboration. Contributions are welcomed via GitHub issues before pull requests.

Licensing & Compatibility

The provided README does not explicitly state the project's license. Users should verify licensing terms before commercial use or integration into closed-source projects.

Limitations & Caveats

The project is under active development, implying potential for breaking changes or evolving features. Specific hardware drivers are mandatory for OpenVINO. Advanced configurations like VLM pipeline usage may require consulting source code due to limited documentation. Tensor parallelism necessitates multi-socket CPUs.

OpenArc by SearchSavior

Explore Similar Projects

FlagPerf by flagos-ai

ai-performance-engineering by cfregly

cortex.cpp by janhq

model_server by openvinotoolkit

LitServe by Lightning-AI

shimmy by Michael-A-Kuykendall

dynamo by ai-dynamo

ktransformers by kvcache-ai

mlc-llm by mlc-ai

exo by exo-explore

openvino by openvinotoolkit

LocalAI by mudler