Discover and explore top open-source AI tools and projects—updated daily.
cubist38OpenAI-compatible API server for local MLX model inference
Top 98.2% on SourcePulse
This project provides a high-performance, OpenAI-compatible API server for MLX models, enabling developers to run text, vision, audio, and image generation models locally on Apple Silicon hardware. It serves as a drop-in replacement for OpenAI services, offering local control, enhanced privacy, and optimized performance for MLX-based AI workloads. The target audience includes engineers and researchers who need to integrate local ML models into existing applications or experiment with advanced AI capabilities without relying on external cloud APIs.
How It Works
The server leverages Python and the FastAPI framework to expose OpenAI-compatible endpoints. A core architectural decision for multi-model deployment involves spawning each model handler in a separate subprocess using multiprocessing.get_context("spawn"). This approach isolates MLX Metal/GPU contexts, preventing semaphore leaks that can occur with traditional fork processes on macOS, thereby ensuring stability and efficient resource management. Requests are proxied between the main FastAPI process and these dedicated child handler processes.
Quick Start & Requirements
ffmpeg is required for audio transcription (brew install ffmpeg).pip install mlx-openai-server.mlx-openai-server launch --model-path <path> --model-type <type>. For example, mlx-openai-server launch --model-path mlx-community/Qwen3-Coder-Next-4bit --model-type lm.examples/ directory.Highlighted Details
model_id.Maintenance & Community
Contributions are welcomed via pull requests following Conventional Commits. Support and discussions are primarily handled through GitHub Issues and Discussions. The project is built upon the MLX framework and related MLX libraries.
Licensing & Compatibility
The project is released under the MIT License, permitting broad use, including commercial applications. It is designed for compatibility with standard OpenAI client SDKs.
Limitations & Caveats
This project is strictly limited to macOS with M-series Apple Silicon chips. Users may encounter memory issues with large models, which can be mitigated through quantization or reduced context lengths. Metal/semaphore warnings, a known issue with MLX on macOS, are addressed by the multi-handler process isolation.
1 day ago
Inactive
xorbitsai