fastmlx by arcee-ai

API for hosting MLX models

Created 1 year ago

337 stars

Top 81.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Awni Hannun

Author of MLX; Research Scientist at Apple

Project Summary

FastMLX provides a production-ready, high-performance API for hosting MLX models, including Vision Language Models (VLMs) and Language Models (LMs). It targets developers needing to integrate MLX models into existing applications, offering an OpenAI-compatible interface for seamless adoption and enhanced throughput via dynamic model loading and multi-worker support.

How It Works

FastMLX leverages FastAPI and Uvicorn to serve MLX models. It supports dynamic model loading, allowing models to be loaded on-the-fly or kept pre-loaded for performance. The API is designed to be OpenAI-compatible, mimicking endpoints like /v1/chat/completions. It includes robust error handling and efficient resource management, with options for multi-worker configurations to scale processing power.

Quick Start & Requirements

Install: pip install fastmlx
Run: fastmlx or uvicorn fastmlx:app --reload --workers 0 (development only)
Multi-worker: fastmlx --workers <N> or uvicorn fastmlx:app --workers <N> (where N is an integer or fraction of CPU cores).
Documentation: https://Blaizzy.github.io/fastmlx

Highlighted Details

OpenAI-compatible API for chat completions, supporting text and image inputs for VLMs.
Dynamic model loading and unloading via API endpoints.
Supports multiple MLX model architectures, including Llama 3.1 for function calling.
Offers streaming responses for both text and VLM outputs.

Maintenance & Community

No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

License: Apache Software License 2.0
Compatibility: Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

The --reload flag is not compatible with multi-worker setups. Streaming for function calling is under development and not yet fully OpenAI-compliant.

Health Check

Last Commit

9 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days