mlx-vlm  by Blaizzy

Vision-language model package for inference/fine-tuning on Macs

Created 1 year ago
1,639 stars

Top 25.7% on SourcePulse

GitHubView on GitHub
Project Summary

MLX-VLM provides a Python package for running and fine-tuning Vision Language Models (VLMs) on Apple Silicon Macs using the MLX framework. It targets ML engineers and researchers who want to leverage VLMs locally on their Macs, offering efficient inference and fine-tuning capabilities.

How It Works

The package utilizes the MLX framework, Apple's array computation library, to enable efficient VLM operations on Apple Silicon hardware. It supports loading models from Hugging Face, processing images and text inputs, and generating outputs. The architecture is designed for ease of use, offering both a command-line interface (CLI) and a Gradio-based chat UI for interactive use.

Quick Start & Requirements

Highlighted Details

  • Supports multi-image analysis and video understanding with select models.
  • Enables fine-tuning using LoRA and QLoRA.
  • Includes a FastAPI server for dynamic model loading and inference.
  • Offers a Gradio-based chat UI for interactive VLM use.

Maintenance & Community

The project is part of the ml-explore organization, indicating active development and community involvement.

Licensing & Compatibility

The README does not explicitly state a license.

Limitations & Caveats

The project is specifically designed for Apple Silicon Macs, limiting its use on other hardware architectures. Support for specific VLM features (like multi-image or video) is model-dependent.

Health Check
Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
17
Issues (30d)
29
Star History
65 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), Elvis Saravia Elvis Saravia(Founder of DAIR.AI), and
7 more.

CogVLM by zai-org

0.0%
7k
VLM for image understanding and multi-turn dialogue
Created 2 years ago
Updated 1 year ago
Starred by Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), Forrest Iandola Forrest Iandola(Author of SqueezeNet; Research Scientist at Meta), and
17 more.

MiniGPT-4 by Vision-CAIR

0.0%
26k
Vision-language model for multi-task learning
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.