tiny-llm by skyzh

LLM inference tutorial for systems engineers on Apple Silicon

Created 8 months ago

3,658 stars

Top 13.1% on SourcePulse

View on GitHub

5 Experts Love This Project

Eric Zhang

Founding Engineer at Modal

Jerry Tworek

VP Research at OpenAI

Jonathan Ragan-Kelley

Professor at MIT

Simon Willison

Coauthor of Django

and 1 more!

Project Summary

This project provides a tutorial and codebase for serving Large Language Models (LLMs) on Apple Silicon using the MLX framework. It targets systems engineers and researchers interested in understanding and optimizing LLM inference from the ground up, by building serving infrastructure using low-level MLX array APIs rather than high-level libraries.

How It Works

The project focuses on implementing LLM components like attention mechanisms, RoPE, and normalization layers directly with MLX array operations. This approach allows for deep dives into optimization techniques, such as quantized matrix multiplications and efficient KV caching, tailored for Apple Silicon's Metal Performance Shaders. The goal is to demystify LLM serving by building it from fundamental building blocks.

Quick Start & Requirements

Install: MLX is the primary dependency. Installation instructions are available in the MLX documentation.
Prerequisites: macOS, Apple Silicon (M1/M2/M3 series).
Resources: The project is designed for local development on macOS, making it accessible without specialized hardware beyond an Apple Silicon Mac.
Docs: The accompanying book is available at https://skyzh.github.io/tiny-llm/.

Highlighted Details

Focuses on building LLM serving infrastructure from scratch using MLX array APIs.
Targets optimization techniques specific to Apple Silicon hardware.
Covers fundamental LLM components and advanced serving strategies like continuous batching and speculative decoding.
Aims to provide a comprehensive understanding of LLM inference serving.

Maintenance & Community

The project is marked as "WIP" (Work In Progress) with a detailed roadmap indicating ongoing development. A Discord server is available for community engagement and study.

Licensing & Compatibility

The project's license is not explicitly stated in the provided README snippet. Compatibility is limited to macOS on Apple Silicon hardware.

Limitations & Caveats

The project is in a very early stage ("WIP") with many components and features still under development, as indicated by the roadmap's "🚧" markers. The codebase relies exclusively on MLX, limiting its applicability to users within the Apple Silicon ecosystem.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

186 stars in the last 30 days