mlx-omni-server by madroidmaq

Local inference server for Apple Silicon, using MLX framework

Created 1 year ago

634 stars

Top 52.3% on SourcePulse

3 Experts Love This Project

developit

Author of Preact

simonw

Coauthor of Django

awni

Author of MLX; Research Scientist at Apple

Project Summary

MLX Omni Server provides a local inference solution for Apple Silicon Macs, offering OpenAI-compatible API endpoints for various AI tasks. It targets developers and researchers seeking to run models locally, benefiting from enhanced privacy and performance without relying on cloud services.

How It Works

The server leverages Apple's MLX framework, optimized for M-series chips, to deliver high-performance local inference. It exposes OpenAI-compatible REST API endpoints, allowing seamless integration with existing OpenAI SDK clients. This approach simplifies adoption for users familiar with the OpenAI ecosystem while enabling them to utilize local hardware for AI processing.

Quick Start & Requirements

Install: pip install mlx-omni-server
Prerequisites: Apple Silicon (M1/M2/M3/M4) Mac.
Run: mlx-omni-server (default port 10240).
Docs: examples

Highlighted Details

OpenAI-compatible API endpoints for Chat Completions, Text-to-Speech, Speech-to-Text, and Image Generation.
Supports tools, function calling, structured output, and log probabilities for chat completions.
Optimized for Apple Silicon (M1/M2/M3/M4) via the MLX framework.
Privacy-first design with all processing occurring locally.

Maintenance & Community

The project is open-source and welcomes contributions.
Development guide available for contributors.

Licensing & Compatibility

MIT License.
Compatible with commercial use and closed-source applications.

Limitations & Caveats

The project is an independent implementation and not affiliated with OpenAI or Apple. Specific model compatibility and performance may vary.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)

6

Issues (30d)

6

Star History

16 stars in the last 30 days

Explore Similar Projects

Starred by

Thomas Wolf

Thomas Wolf(Cofounder of Hugging Face).

edgen by edgenai

Local GenAI server for private, offline AI

Created 1 year ago

Updated 1 year ago

OllamaTalk by shinhyo

Cross-platform AI chat app powered by Ollama for local processing

Created 1 year ago

Updated 7 months ago

apple-on-device-openai by gety-ai

Local OpenAI API server

Created 7 months ago

Updated 3 months ago

Starred by

Sam Partee

Sam Partee(Cofounder of Arcade) and

Deshraj Yadav

Deshraj Yadav(Cofounder of Mem0).

groq-desktop-beta by groq

Desktop chat app with local MCP server support

Created 9 months ago

Updated 3 days ago

MackingJAI by 0ssamaak0

Mock OpenAI/Ollama API for local LLMs

Created 9 months ago

Updated 4 months ago

osaurus by dinoki-ai

Local LLM server for Apple Silicon

Created 4 months ago

Updated 2 days ago

openai-realtime-embedded by openai

Instructions for using the OpenAI Realtime API on microcontrollers

Created 1 year ago

Updated 9 months ago

Foundry-Local by microsoft

Local inference runtime for generative AI models

Created 9 months ago

Updated 2 weeks ago

openai by betalgo

.NET SDK for OpenAI API access

Created 4 years ago

Updated 1 month ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Nat Friedman

Nat Friedman(Former CEO of GitHub), and

17 more.

fauxpilot by fauxpilot

Locally hosted code completion server

Created 3 years ago

Updated 1 year ago

Starred by

Zack Li

Zack Li(Cofounder of Nexa AI),

Alex Chen

Alex Chen(Cofounder of Nexa AI), and

1 more.

nexa-sdk by NexaAI

Nexa SDK: local inference framework for GGML/ONNX models

Created 1 year ago

Updated 2 days ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Simon Willison

Simon Willison(Coauthor of Django), and

12 more.

jan by janhq

Local AI assistant for offline LLM use

Created 2 years ago

Updated 1 day ago

Feedback? Help us improve.