fastmlx  by arcee-ai

API for hosting MLX models

created 1 year ago
319 stars

Top 86.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

FastMLX provides a production-ready, high-performance API for hosting MLX models, including Vision Language Models (VLMs) and Language Models (LMs). It targets developers needing to integrate MLX models into existing applications, offering an OpenAI-compatible interface for seamless adoption and enhanced throughput via dynamic model loading and multi-worker support.

How It Works

FastMLX leverages FastAPI and Uvicorn to serve MLX models. It supports dynamic model loading, allowing models to be loaded on-the-fly or kept pre-loaded for performance. The API is designed to be OpenAI-compatible, mimicking endpoints like /v1/chat/completions. It includes robust error handling and efficient resource management, with options for multi-worker configurations to scale processing power.

Quick Start & Requirements

  • Install: pip install fastmlx
  • Run: fastmlx or uvicorn fastmlx:app --reload --workers 0 (development only)
  • Multi-worker: fastmlx --workers <N> or uvicorn fastmlx:app --workers <N> (where N is an integer or fraction of CPU cores).
  • Documentation: https://Blaizzy.github.io/fastmlx

Highlighted Details

  • OpenAI-compatible API for chat completions, supporting text and image inputs for VLMs.
  • Dynamic model loading and unloading via API endpoints.
  • Supports multiple MLX model architectures, including Llama 3.1 for function calling.
  • Offers streaming responses for both text and VLM outputs.

Maintenance & Community

No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

  • License: Apache Software License 2.0
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

The --reload flag is not compatible with multi-worker setups. Streaming for function calling is under development and not yet fully OpenAI-compliant.

Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
24 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.