mlc-llm  by mlc-ai

Universal LLM deployment engine with ML compilation

created 2 years ago
21,044 stars

Top 2.1% on sourcepulse

GitHubView on GitHub
Project Summary

MLC LLM is a universal deployment engine and compiler for large language models, targeting developers and researchers seeking to optimize and deploy AI models natively across diverse hardware platforms. It provides a unified, high-performance inference engine (MLCEngine) with an OpenAI-compatible API, enabling efficient LLM execution on everything from servers to mobile devices and web browsers.

How It Works

MLC LLM leverages a machine learning compiler stack, including TensorIR and MetaSchedule, to automatically optimize and compile LLMs for specific hardware backends. This approach allows for high-performance inference by generating tailored code, abstracting away hardware complexities, and ensuring consistent API behavior across supported platforms.

Quick Start & Requirements

  • Installation: Follow the Quick Start Guide.
  • Prerequisites: Python, C++ compiler. Specific hardware backends may require additional drivers (e.g., CUDA for NVIDIA GPUs, ROCm for AMD GPUs).
  • Supported Platforms: Linux, macOS, Windows, Web Browsers (WebGPU/WASM), iOS, Android.
  • Documentation: https://llm.mlc.ai/docs/

Highlighted Details

  • Supports a wide range of hardware including NVIDIA, AMD, Apple GPUs, and Intel GPUs.
  • Enables deployment on web browsers via WebGPU and WASM.
  • Provides an OpenAI-compatible REST server API.
  • Built upon foundational technologies like TVM, TensorIR, and MetaSchedule.

Maintenance & Community

  • Active development by the MLC team.
  • Community support available via Discord.
  • Related project: WebLLM.

Licensing & Compatibility

  • Licensed under Apache License 2.0.
  • Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The project is under active development, and while it supports numerous platforms, specific model compilation or inference performance may vary. Users should consult the documentation for the latest compatibility and performance benchmarks.

Health Check
Last commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)
9
Issues (30d)
19
Star History
594 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Philipp Schmid Philipp Schmid(DevRel at Google DeepMind), and
2 more.

LightLLM by ModelTC

0.7%
3k
Python framework for LLM inference and serving
created 2 years ago
updated 11 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

gpustack by gpustack

1.6%
3k
GPU cluster manager for AI model deployment
created 1 year ago
updated 2 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
5 more.

TensorRT-LLM by NVIDIA

0.6%
11k
LLM inference optimization SDK for NVIDIA GPUs
created 1 year ago
updated 14 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 10 hours ago
Feedback? Help us improve.