mlx-llm  by riccardomusmeci

LLM tools/apps for Apple Silicon using MLX

created 1 year ago
451 stars

Top 67.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a Python library for running Large Language Models (LLMs) on Apple Silicon using the MLX framework, enabling real-time inference and applications. It targets developers and researchers working with Apple hardware who need efficient LLM deployment.

How It Works

The library leverages Apple's MLX framework, which is designed for efficient tensor computations on Apple Silicon. It offers a streamlined API for loading pre-trained models from HuggingFace, quantizing them for reduced memory footprint and faster inference, and extracting embeddings. The architecture supports direct integration with MLX's array operations for custom model manipulation and fine-tuning.

Quick Start & Requirements

Highlighted Details

  • Supports a wide range of LLM families including LLaMA, Mistral, Phi3, Gemma, and OpenELM.
  • Enables quantization to 4-bit for significant performance gains.
  • Provides utilities for extracting model embeddings.
  • Includes a chat interface for interactive LLM conversations.

Maintenance & Community

  • Maintained by riccardomusmeci.
  • Contact email provided for questions.

Licensing & Compatibility

  • License not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The OpenELM chat-mode is noted as broken and under active development for a fix. The README does not specify the exact license, which may impact commercial adoption.

Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.