mlx-omni-server  by madroidmaq

Local inference server for Apple Silicon, using MLX framework

created 9 months ago
489 stars

Top 64.0% on sourcepulse

GitHubView on GitHub
Project Summary

MLX Omni Server provides a local inference solution for Apple Silicon Macs, offering OpenAI-compatible API endpoints for various AI tasks. It targets developers and researchers seeking to run models locally, benefiting from enhanced privacy and performance without relying on cloud services.

How It Works

The server leverages Apple's MLX framework, optimized for M-series chips, to deliver high-performance local inference. It exposes OpenAI-compatible REST API endpoints, allowing seamless integration with existing OpenAI SDK clients. This approach simplifies adoption for users familiar with the OpenAI ecosystem while enabling them to utilize local hardware for AI processing.

Quick Start & Requirements

  • Install: pip install mlx-omni-server
  • Prerequisites: Apple Silicon (M1/M2/M3/M4) Mac.
  • Run: mlx-omni-server (default port 10240).
  • Docs: examples

Highlighted Details

  • OpenAI-compatible API endpoints for Chat Completions, Text-to-Speech, Speech-to-Text, and Image Generation.
  • Supports tools, function calling, structured output, and log probabilities for chat completions.
  • Optimized for Apple Silicon (M1/M2/M3/M4) via the MLX framework.
  • Privacy-first design with all processing occurring locally.

Maintenance & Community

  • The project is open-source and welcomes contributions.
  • Development guide available for contributors.

Licensing & Compatibility

  • MIT License.
  • Compatible with commercial use and closed-source applications.

Limitations & Caveats

The project is an independent implementation and not affiliated with OpenAI or Apple. Specific model compatibility and performance may vary.

Health Check
Last commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)
3
Issues (30d)
4
Star History
148 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.