mlc-llm by mlc-ai

Universal LLM deployment engine with ML compilation

Created 2 years ago

21,856 stars

Top 2.0% on SourcePulse

View on GitHub

26 Experts Love This Project

Tobi Lutke

Cofounder of Shopify

Omar Sanseviero

DevRel at Google DeepMind

and 22 more!

Project Summary

MLC LLM is a universal deployment engine and compiler for large language models, targeting developers and researchers seeking to optimize and deploy AI models natively across diverse hardware platforms. It provides a unified, high-performance inference engine (MLCEngine) with an OpenAI-compatible API, enabling efficient LLM execution on everything from servers to mobile devices and web browsers.

How It Works

MLC LLM leverages a machine learning compiler stack, including TensorIR and MetaSchedule, to automatically optimize and compile LLMs for specific hardware backends. This approach allows for high-performance inference by generating tailored code, abstracting away hardware complexities, and ensuring consistent API behavior across supported platforms.

Quick Start & Requirements

Installation: Follow the Quick Start Guide.
Prerequisites: Python, C++ compiler. Specific hardware backends may require additional drivers (e.g., CUDA for NVIDIA GPUs, ROCm for AMD GPUs).
Supported Platforms: Linux, macOS, Windows, Web Browsers (WebGPU/WASM), iOS, Android.
Documentation: https://llm.mlc.ai/docs/

Highlighted Details

Supports a wide range of hardware including NVIDIA, AMD, Apple GPUs, and Intel GPUs.
Enables deployment on web browsers via WebGPU and WASM.
Provides an OpenAI-compatible REST server API.
Built upon foundational technologies like TVM, TensorIR, and MetaSchedule.

Maintenance & Community

Active development by the MLC team.
Community support available via Discord.
Related project: WebLLM.

Licensing & Compatibility

Licensed under Apache License 2.0.
Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The project is under active development, and while it supports numerous platforms, specific model compilation or inference performance may vary. Users should consult the documentation for the latest compatibility and performance benchmarks.

Health Check

Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

148 stars in the last 30 days