OpenArc  by SearchSavior

Local AI inference engine for Intel devices serving diverse models

Created 9 months ago
251 stars

Top 99.9% on SourcePulse

GitHubView on GitHub
Project Summary

OpenArc is an inference engine designed for Intel devices, enabling local and private deployment of AI models. It serves Large Language Models (LLMs), Vision-Language Models (VLMs), Whisper, Kokoro-TTS, Embedding, and Reranker models through OpenAI-compatible API endpoints, powered by OpenVINO. This project offers a performant and accessible solution for leveraging Intel hardware for diverse AI workloads.

How It Works

The engine utilizes OpenVINO for optimized inference across Intel CPUs, GPUs, and NPUs. It exposes a suite of OpenAI-compatible endpoints, including /v1/chat/completions and /v1/audio/transcriptions, facilitating integration with existing AI application frameworks. Key architectural choices include multi-engine support, pipeline parallelism for multi-GPU setups, CPU offloading, and automatic model management, aiming for efficient resource utilization and high throughput on target hardware.

Quick Start & Requirements

Installation involves cloning the repository and using uv for dependency management (uv sync). Key dependencies include optimum-intel[openvino] and openvino-genai (nightly wheels recommended). OpenVINO requires device-specific drivers; consult the OpenVINO System Requirements page. An OPENARC_API_KEY environment variable is necessary. The CLI tool openarc is used for managing models (add, list, load, serve) and benchmarking (bench). Links to extensive documentation and OpenVINO resources are provided.

Highlighted Details

  • Supports a broad range of models: LLMs, VLMs, Whisper, Kokoro-TTS, Embeddings, and Rerankers.
  • Provides OpenAI compatible endpoints for common AI tasks.
  • Enables advanced hardware acceleration: Multi-GPU, NPU device support, pipeline parallelism, and CPU offload/hybrid modes.
  • Features llama-bench style benchmarking with SQLite metrics for performance analysis.
  • Includes support for OpenAI compatible tool calls with streaming and parallel parsing.

Maintenance & Community

OpenArc is noted as being under active development. A Discord community has formed around the project, offering support and collaboration. Contributions are welcomed via GitHub issues before pull requests.

Licensing & Compatibility

The provided README does not explicitly state the project's license. Users should verify licensing terms before commercial use or integration into closed-source projects.

Limitations & Caveats

The project is under active development, implying potential for breaking changes or evolving features. Specific hardware drivers are mandatory for OpenVINO. Advanced configurations like VLM pipeline usage may require consulting source code due to limited documentation. Tensor parallelism necessitates multi-socket CPUs.

Health Check
Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
4
Star History
19 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

0.1%
4k
AI inference pipeline framework
Created 1 year ago
Updated 6 days ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
4 more.

ktransformers by kvcache-ai

0.8%
16k
Framework for LLM inference optimization experimentation
Created 1 year ago
Updated 3 hours ago
Starred by Anton Bukov Anton Bukov(Cofounder of 1inch Network), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
14 more.

exo by exo-explore

0.3%
33k
AI cluster for running models on diverse devices
Created 1 year ago
Updated 4 weeks ago
Feedback? Help us improve.