Discover and explore top open-source AI tools and projects—updated daily.
FastMLX provides a production-ready, high-performance API for hosting MLX models, including Vision Language Models (VLMs) and Language Models (LMs). It targets developers needing to integrate MLX models into existing applications, offering an OpenAI-compatible interface for seamless adoption and enhanced throughput via dynamic model loading and multi-worker support.
How It Works
FastMLX leverages FastAPI and Uvicorn to serve MLX models. It supports dynamic model loading, allowing models to be loaded on-the-fly or kept pre-loaded for performance. The API is designed to be OpenAI-compatible, mimicking endpoints like /v1/chat/completions. It includes robust error handling and efficient resource management, with options for multi-worker configurations to scale processing power.
Quick Start & Requirements
pip install fastmlxfastmlx or uvicorn fastmlx:app --reload --workers 0 (development only)fastmlx --workers <N> or uvicorn fastmlx:app --workers <N> (where N is an integer or fraction of CPU cores).Highlighted Details
Maintenance & Community
No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.
Licensing & Compatibility
Limitations & Caveats
The --reload flag is not compatible with multi-worker setups. Streaming for function calling is under development and not yet fully OpenAI-compliant.
7 months ago
1 day
sgomez
lemonade-sdk
xorbitsai