Phi-3-Vision-MLX  by JosefAlbers

Apple Silicon framework for language and vision models

Created 1 year ago
272 stars

Top 94.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a Python framework for running Phi-3 language and vision models locally on Apple Silicon Macs, optimized with the MLX framework. It targets developers and researchers needing efficient, on-device AI capabilities for tasks like visual question answering, text generation, and agent-based workflows, offering significant performance gains through MLX optimization and quantization.

How It Works

Phi-3-MLX leverages the MLX framework, Apple's array computation library, to achieve high performance on Apple Silicon. It integrates the Phi-3-Vision multimodal model and Phi-3-Mini-128K language model, supporting features like batched generation, model quantization for reduced memory footprint, and LoRA fine-tuning. The framework also includes a flexible agent system that can utilize custom toolchains and external APIs for advanced tasks.

Quick Start & Requirements

  • Install via pip: pip install phi-3-vision-mlx
  • For the latest version, clone the repo and install: git clone https://github.com/JosefAlbers/Phi-3-Vision-MLX.git && cd Phi-3-Vision-MLX && pip install -e .
  • Requires Apple Silicon Mac (M1, M2, or later).
  • Minimum 8GB RAM (16GB+ recommended for optimal performance).
  • Documentation: https://josefalbers.github.io/Phi-3-Vision-MLX/

Highlighted Details

  • Optimized performance on Apple Silicon via MLX.
  • Supports Phi-3-Vision (multimodal) and Phi-3-Mini-128K (language).
  • Features include quantization, batched generation, LoRA fine-tuning, and an agent system.
  • Benchmarks show significant speedups with quantization (e.g., 61.01 tps for text generation on quantized model vs. 25.02 tps vanilla).

Maintenance & Community

  • Project appears actively maintained by JosefAlbers.
  • Community resources are not explicitly mentioned in the README.

Licensing & Compatibility

  • Licensed under the MIT License.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The PyPI version may not always be up-to-date, recommending installation directly from the repository. Specific performance metrics are provided for an M1 Max 64GB, and performance may vary on other Apple Silicon configurations.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

DeepSeek-VL2 by deepseek-ai

0.1%
5k
MoE vision-language model for multimodal understanding
Created 9 months ago
Updated 6 months ago
Feedback? Help us improve.