FastMLX provides a production-ready, high-performance API for hosting MLX models, including Vision Language Models (VLMs) and Language Models (LMs). It targets developers needing to integrate MLX models into existing applications, offering an OpenAI-compatible interface for seamless adoption and enhanced throughput via dynamic model loading and multi-worker support.
How It Works
FastMLX leverages FastAPI and Uvicorn to serve MLX models. It supports dynamic model loading, allowing models to be loaded on-the-fly or kept pre-loaded for performance. The API is designed to be OpenAI-compatible, mimicking endpoints like /v1/chat/completions
. It includes robust error handling and efficient resource management, with options for multi-worker configurations to scale processing power.
Quick Start & Requirements
pip install fastmlx
fastmlx
or uvicorn fastmlx:app --reload --workers 0
(development only)fastmlx --workers <N>
or uvicorn fastmlx:app --workers <N>
(where N is an integer or fraction of CPU cores).Highlighted Details
Maintenance & Community
No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.
Licensing & Compatibility
Limitations & Caveats
The --reload
flag is not compatible with multi-worker setups. Streaming for function calling is under development and not yet fully OpenAI-compliant.
4 months ago
Inactive