mlx-lm by ml-explore

Python package for LLM text generation and fine-tuning on Apple silicon

Created 10 months ago

3,267 stars

Top 14.6% on SourcePulse

View on GitHub

9 Experts Love This Project

Chaoyu Yang

Founder of Bento

Luis Capelo

Cofounder of Lightning AI

Didier Lopes

Founder of OpenBB

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

and 5 more!

Project Summary

This package enables running and fine-tuning large language models (LLMs) on Apple Silicon using the MLX framework. It targets developers and researchers who want to leverage their Apple hardware for efficient LLM experimentation, offering seamless integration with Hugging Face Hub for model access and quantization capabilities.

How It Works

MLX LM leverages MLX, a GPU-accelerated array framework designed for Apple Silicon. It provides a Python API and command-line tools for loading, generating text with, and fine-tuning LLMs. Key features include support for model quantization (e.g., 4-bit) to reduce memory footprint and improve inference speed, efficient handling of long contexts via rotating KV caches and prompt caching, and distributed inference/fine-tuning using mx.distributed.

Quick Start & Requirements

Install via pip: pip install mlx-lm or conda: conda install -c conda-forge mlx-lm.
Requires macOS 13.0+ for basic functionality, and macOS 15.0+ for optimized handling of large models (wired memory).
Official documentation and examples are available.

Highlighted Details

Seamless integration with Hugging Face Hub for thousands of LLMs.
Supports low-rank and full model fine-tuning, including quantized models.
Enables model quantization and uploading to Hugging Face Hub.
Features efficient long prompt and generation handling with KV caching and prompt caching.
Supports distributed inference and fine-tuning.

Maintenance & Community

The project is maintained by the mlx-explore community. Links to community resources like Discord/Slack are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Performance with very large models may be slow if they exceed available RAM, though macOS 15+ offers optimizations. Some models (e.g., Qwen, plamo) require trust_remote_code=True and potentially specifying eos_token, which can introduce security considerations.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

253 stars in the last 30 days