mlx-llm by riccardomusmeci

LLM tools/apps for Apple Silicon using MLX

Created 2 years ago

459 stars

Top 65.9% on SourcePulse

3 Experts Love This Project

rwightman

Author of timm; CV at Hugging Face

awni

Author of MLX; Research Scientist at Apple

osanseviero

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This repository provides a Python library for running Large Language Models (LLMs) on Apple Silicon using the MLX framework, enabling real-time inference and applications. It targets developers and researchers working with Apple hardware who need efficient LLM deployment.

How It Works

The library leverages Apple's MLX framework, which is designed for efficient tensor computations on Apple Silicon. It offers a streamlined API for loading pre-trained models from HuggingFace, quantizing them for reduced memory footprint and faster inference, and extracting embeddings. The architecture supports direct integration with MLX's array operations for custom model manipulation and fine-tuning.

Quick Start & Requirements

Install via pip: pip install mlx-llm
Requires Apple Silicon hardware.
Supports loading models from HuggingFace repositories.
Documentation: https://github.com/riccardomusmeci/mlx-llm

Highlighted Details

Supports a wide range of LLM families including LLaMA, Mistral, Phi3, Gemma, and OpenELM.
Enables quantization to 4-bit for significant performance gains.
Provides utilities for extracting model embeddings.
Includes a chat interface for interactive LLM conversations.

Maintenance & Community

Maintained by riccardomusmeci.
Contact email provided for questions.

Licensing & Compatibility

License not explicitly stated in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The OpenELM chat-mode is noted as broken and under active development for a fix. The README does not specify the exact license, which may impact commercial adoption.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

ToolkenGPT by Ber666

Research code for augmenting frozen LLMs with tools via embeddings

Created 2 years ago

Updated 1 year ago

Starred by

Artidoro Pagnoni

Artidoro Pagnoni(Coauthor of QLoRA; Research Scientist at Meta),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

2 more.

SpQR by Vahe1994

Weight compression research paper for near-lossless LLM quantization

Created 2 years ago

Updated 1 year ago

Starred by

Alex Chen

Alex Chen(Cofounder of Nexa AI),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

1 more.

Anemll by Anemll

Framework for porting LLMs to Apple Neural Engine (ANE)

Created 1 year ago

Updated 1 week ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

minillm by kuleshov

Minimal system for running LLMs on consumer GPUs (research project)

Created 2 years ago

Updated 2 years ago

InferLLM by MegEngine

Lightweight LLM inference framework

Created 2 years ago

Updated 1 year ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

Jlama by tjake

LLM inference engine for Java applications

Created 2 years ago

Updated 3 months ago

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen),

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp), and

1 more.

LLMFarm by guinmoon

iOS/MacOS app for local LLM inference

Created 2 years ago

Updated 1 month ago

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

3 more.

neural-compressor by intel

Python library for model compression (quantization, pruning, distillation, NAS)

Created 5 years ago

Updated 16 hours ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Dan Guido

Dan Guido(Cofounder of Trail of Bits), and

6 more.

llm-compressor by vllm-project

Transformers-compatible library for LLM compression, optimized for vLLM deployment

Created 1 year ago

Updated 18 hours ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

7 more.

mlx-lm by ml-explore

Python package for LLM text generation and fine-tuning on Apple silicon

Created 10 months ago

Updated 1 day ago

Starred by

Eric Zhu

Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research),

Eugene Yan

Eugene Yan(AI Scientist at AWS), and

1 more.

ms-swift by modelscope

SDK for fine-tuning and deploying LLMs/MLLMs

Created 2 years ago

Updated 1 day ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind) and

Romain Huet

Romain Huet(Head of Developer Experience at OpenAI).

llama-models by meta-llama

Utilities for Llama models

Created 1 year ago

Updated 3 days ago

Feedback? Help us improve.