tiny-llm  by skyzh

LLM inference tutorial for systems engineers on Apple Silicon

created 3 months ago
2,818 stars

Top 17.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a tutorial and codebase for serving Large Language Models (LLMs) on Apple Silicon using the MLX framework. It targets systems engineers and researchers interested in understanding and optimizing LLM inference from the ground up, by building serving infrastructure using low-level MLX array APIs rather than high-level libraries.

How It Works

The project focuses on implementing LLM components like attention mechanisms, RoPE, and normalization layers directly with MLX array operations. This approach allows for deep dives into optimization techniques, such as quantized matrix multiplications and efficient KV caching, tailored for Apple Silicon's Metal Performance Shaders. The goal is to demystify LLM serving by building it from fundamental building blocks.

Quick Start & Requirements

  • Install: MLX is the primary dependency. Installation instructions are available in the MLX documentation.
  • Prerequisites: macOS, Apple Silicon (M1/M2/M3 series).
  • Resources: The project is designed for local development on macOS, making it accessible without specialized hardware beyond an Apple Silicon Mac.
  • Docs: The accompanying book is available at https://skyzh.github.io/tiny-llm/.

Highlighted Details

  • Focuses on building LLM serving infrastructure from scratch using MLX array APIs.
  • Targets optimization techniques specific to Apple Silicon hardware.
  • Covers fundamental LLM components and advanced serving strategies like continuous batching and speculative decoding.
  • Aims to provide a comprehensive understanding of LLM inference serving.

Maintenance & Community

The project is marked as "WIP" (Work In Progress) with a detailed roadmap indicating ongoing development. A Discord server is available for community engagement and study.

Licensing & Compatibility

The project's license is not explicitly stated in the provided README snippet. Compatibility is limited to macOS on Apple Silicon hardware.

Limitations & Caveats

The project is in a very early stage ("WIP") with many components and features still under development, as indicated by the roadmap's "🚧" markers. The codebase relies exclusively on MLX, limiting its applicability to users within the Apple Silicon ecosystem.

Health Check
Last commit

6 days ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
2
Star History
1,086 stars in the last 90 days

Explore Similar Projects

Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
21 more.

mlx by ml-explore

0.5%
22k
Array framework for machine learning on Apple silicon
created 1 year ago
updated 1 day ago
Feedback? Help us improve.