mlc-llm  by mlc-ai

Universal LLM deployment engine with ML compilation

Created 2 years ago
21,557 stars

Top 2.0% on SourcePulse

GitHubView on GitHub
Project Summary

MLC LLM is a universal deployment engine and compiler for large language models, targeting developers and researchers seeking to optimize and deploy AI models natively across diverse hardware platforms. It provides a unified, high-performance inference engine (MLCEngine) with an OpenAI-compatible API, enabling efficient LLM execution on everything from servers to mobile devices and web browsers.

How It Works

MLC LLM leverages a machine learning compiler stack, including TensorIR and MetaSchedule, to automatically optimize and compile LLMs for specific hardware backends. This approach allows for high-performance inference by generating tailored code, abstracting away hardware complexities, and ensuring consistent API behavior across supported platforms.

Quick Start & Requirements

  • Installation: Follow the Quick Start Guide.
  • Prerequisites: Python, C++ compiler. Specific hardware backends may require additional drivers (e.g., CUDA for NVIDIA GPUs, ROCm for AMD GPUs).
  • Supported Platforms: Linux, macOS, Windows, Web Browsers (WebGPU/WASM), iOS, Android.
  • Documentation: https://llm.mlc.ai/docs/

Highlighted Details

  • Supports a wide range of hardware including NVIDIA, AMD, Apple GPUs, and Intel GPUs.
  • Enables deployment on web browsers via WebGPU and WASM.
  • Provides an OpenAI-compatible REST server API.
  • Built upon foundational technologies like TVM, TensorIR, and MetaSchedule.

Maintenance & Community

  • Active development by the MLC team.
  • Community support available via Discord.
  • Related project: WebLLM.

Licensing & Compatibility

  • Licensed under Apache License 2.0.
  • Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The project is under active development, and while it supports numerous platforms, specific model compilation or inference performance may vary. Users should consult the documentation for the latest compatibility and performance benchmarks.

Health Check
Last Commit

6 days ago

Responsiveness

1 day

Pull Requests (30d)
9
Issues (30d)
6
Star History
143 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), and
5 more.

dbrx by databricks

0.0%
3k
Large language model for research/commercial use
Created 1 year ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

1.2%
4k
AI inference pipeline framework
Created 1 year ago
Updated 8 hours ago
Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
12 more.

mistral.rs by EricLBuehler

0.3%
6k
LLM inference engine for blazing fast performance
Created 1 year ago
Updated 11 hours ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
4 more.

ktransformers by kvcache-ai

0.3%
15k
Framework for LLM inference optimization experimentation
Created 1 year ago
Updated 7 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
20 more.

TensorRT-LLM by NVIDIA

0.4%
12k
LLM inference optimization SDK for NVIDIA GPUs
Created 2 years ago
Updated 4 hours ago
Feedback? Help us improve.