OmniInfer by omnimind-ai

Easy, fast, and private LLM & VLM inference for every device

Created 3 months ago

811 stars

Top 42.8% on SourcePulse

Project Summary

Easy, fast, and private LLM & VLM inference for every device. OmniInfer is a cross-platform inference engine designed to simplify the deployment and execution of Large Language Models (LLMs) and Vision-Language Models (VLMs) locally, abstracting complexities like model compilation and hardware adaptation for efficient, minimal-configuration inference. It targets developers and users needing to run models on diverse hardware, from desktops to mobile and edge devices.

How It Works

OmniInfer employs a multi-backend approach, supporting engines such as llama.cpp, mnn, et, mlx, and its own Native engine, allowing seamless switching for optimal performance. It features hardware-aware adaptation and optimization for token generation speed and memory footprint. The engine supports LLMs, VLMs, and World Models, offering fine-grained control over parameters like context length and GPU offloading.

Quick Start & Requirements

The README indicates sections for "Getting Started," "Documentation," and "Architecture," but specific installation commands, prerequisites (e.g., Python versions, GPU requirements), or resource footprints are not detailed in the provided text.

Highlighted Details

Supports LLM, VLM, and World Models.
Offers an OpenAI-compatible API server for easy integration.
Achieves fast inference through optimized token generation and hardware-aware adaptations.
Provides fine-grained parameter control for inference tuning.

Maintenance & Community

The project welcomes contributions and directs users to a "Contributing to OmniInfer" guide for involvement. Specific community channels (like Discord/Slack) or roadmap links are not present in the provided description.

Licensing & Compatibility

This project is licensed under the Apache License 2.0. This license is generally permissive and compatible with commercial use and closed-source applications.

Limitations & Caveats

The provided description focuses on inference capabilities and does not detail support for model training or fine-tuning. Specific performance benchmarks or comparisons against other inference engines are not included in the summary.

OmniInfer by omnimind-ai

Explore Similar Projects

LLM-inference-optimization-paper by chenhongyu2048

vllm-swift by TheTom

Kolosal by KolosalAI

OpenArc by SearchSavior

eLLM by lucienhuangfu

llmaz by InftyAI

llama.cpp-deepseek-v4-flash by antirez

xFasterTransformer by intel

distributed-llama by b4rtaz

executorch by pytorch

ktransformers by kvcache-ai

mlc-llm by mlc-ai