mlc-llm  by mlc-ai

Universal LLM deployment engine with ML compilation

Created 2 years ago
22,430 stars

Top 2.1% on SourcePulse

GitHubView on GitHub
Project Summary

MLC LLM is a universal deployment engine and compiler for large language models, targeting developers and researchers seeking to optimize and deploy AI models natively across diverse hardware platforms. It provides a unified, high-performance inference engine (MLCEngine) with an OpenAI-compatible API, enabling efficient LLM execution on everything from servers to mobile devices and web browsers.

How It Works

MLC LLM leverages a machine learning compiler stack, including TensorIR and MetaSchedule, to automatically optimize and compile LLMs for specific hardware backends. This approach allows for high-performance inference by generating tailored code, abstracting away hardware complexities, and ensuring consistent API behavior across supported platforms.

Quick Start & Requirements

  • Installation: Follow the Quick Start Guide.
  • Prerequisites: Python, C++ compiler. Specific hardware backends may require additional drivers (e.g., CUDA for NVIDIA GPUs, ROCm for AMD GPUs).
  • Supported Platforms: Linux, macOS, Windows, Web Browsers (WebGPU/WASM), iOS, Android.
  • Documentation: https://llm.mlc.ai/docs/

Highlighted Details

  • Supports a wide range of hardware including NVIDIA, AMD, Apple GPUs, and Intel GPUs.
  • Enables deployment on web browsers via WebGPU and WASM.
  • Provides an OpenAI-compatible REST server API.
  • Built upon foundational technologies like TVM, TensorIR, and MetaSchedule.

Maintenance & Community

  • Active development by the MLC team.
  • Community support available via Discord.
  • Related project: WebLLM.

Licensing & Compatibility

  • Licensed under Apache License 2.0.
  • Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The project is under active development, and while it supports numerous platforms, specific model compilation or inference performance may vary. Users should consult the documentation for the latest compatibility and performance benchmarks.

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
20
Issues (30d)
6
Star History
296 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), and
5 more.

dbrx by databricks

0%
3k
Large language model for research/commercial use
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

0.1%
4k
AI inference pipeline framework
Created 2 years ago
Updated 2 days ago
Feedback? Help us improve.