mlx-vlm by Blaizzy

Vision-language model package for inference/fine-tuning on Macs

Created 1 year ago

2,170 stars

Top 20.3% on SourcePulse

View on GitHub

7 Experts Love This Project

Lysandre Debut

Chief Open-Source Officer at Hugging Face

Omar Sanseviero

DevRel at Google DeepMind

and 3 more!

Project Summary

MLX-VLM provides a Python package for running and fine-tuning Vision Language Models (VLMs) on Apple Silicon Macs using the MLX framework. It targets ML engineers and researchers who want to leverage VLMs locally on their Macs, offering efficient inference and fine-tuning capabilities.

How It Works

The package utilizes the MLX framework, Apple's array computation library, to enable efficient VLM operations on Apple Silicon hardware. It supports loading models from Hugging Face, processing images and text inputs, and generating outputs. The architecture is designed for ease of use, offering both a command-line interface (CLI) and a Gradio-based chat UI for interactive use.

Quick Start & Requirements

Primary install / run command: pip install mlx-vlm
Prerequisites: Apple Silicon Mac.
Links: CLI Usage, Chat UI, Python Script Usage, Server Usage

Highlighted Details

Supports multi-image analysis and video understanding with select models.
Enables fine-tuning using LoRA and QLoRA.
Includes a FastAPI server for dynamic model loading and inference.
Offers a Gradio-based chat UI for interactive VLM use.

Maintenance & Community

The project is part of the ml-explore organization, indicating active development and community involvement.

Licensing & Compatibility

The README does not explicitly state a license.

Limitations & Caveats

The project is specifically designed for Apple Silicon Macs, limiting its use on other hardware architectures. Support for specific VLM features (like multi-image or video) is model-dependent.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

140 stars in the last 30 days