OmniInfer  by omnimind-ai

Easy, fast, and private LLM & VLM inference for every device

Created 3 weeks ago

New!

363 stars

Top 77.4% on SourcePulse

GitHubView on GitHub
Project Summary

Easy, fast, and private LLM & VLM inference for every device. OmniInfer is a cross-platform inference engine designed to simplify the deployment and execution of Large Language Models (LLMs) and Vision-Language Models (VLMs) locally, abstracting complexities like model compilation and hardware adaptation for efficient, minimal-configuration inference. It targets developers and users needing to run models on diverse hardware, from desktops to mobile and edge devices.

How It Works

OmniInfer employs a multi-backend approach, supporting engines such as llama.cpp, mnn, et, mlx, and its own Native engine, allowing seamless switching for optimal performance. It features hardware-aware adaptation and optimization for token generation speed and memory footprint. The engine supports LLMs, VLMs, and World Models, offering fine-grained control over parameters like context length and GPU offloading.

Quick Start & Requirements

The README indicates sections for "Getting Started," "Documentation," and "Architecture," but specific installation commands, prerequisites (e.g., Python versions, GPU requirements), or resource footprints are not detailed in the provided text.

Highlighted Details

  • Supports LLM, VLM, and World Models.
  • Offers an OpenAI-compatible API server for easy integration.
  • Achieves fast inference through optimized token generation and hardware-aware adaptations.
  • Provides fine-grained parameter control for inference tuning.

Maintenance & Community

The project welcomes contributions and directs users to a "Contributing to OmniInfer" guide for involvement. Specific community channels (like Discord/Slack) or roadmap links are not present in the provided description.

Licensing & Compatibility

This project is licensed under the Apache License 2.0. This license is generally permissive and compatible with commercial use and closed-source applications.

Limitations & Caveats

The provided description focuses on inference capabilities and does not detail support for model training or fine-tuning. Specific performance benchmarks or comparisons against other inference engines are not included in the summary.

Health Check
Last Commit

23 hours ago

Responsiveness

Inactive

Pull Requests (30d)
14
Issues (30d)
2
Star History
542 stars in the last 22 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Gabriel Almeida Gabriel Almeida(Cofounder of Langflow), and
2 more.

torchchat by pytorch

0.1%
4k
PyTorch-native SDK for local LLM inference across diverse platforms
Created 2 years ago
Updated 7 months ago
Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
7 more.

executorch by pytorch

0.6%
4k
On-device AI framework for PyTorch inference and training
Created 4 years ago
Updated 1 day ago
Feedback? Help us improve.