binary-mlc-llm-libs by mlc-ai

Pre-compiled LLM libraries for efficient deployment

Created 2 years ago

281 stars

Top 92.8% on SourcePulse

Project Summary

This repository provides pre-compiled, quantized libraries for various Large Language Models (LLMs) designed for efficient deployment across diverse hardware platforms. It targets developers and researchers seeking to run LLMs locally with optimized performance and reduced resource requirements.

How It Works

The project stores model libraries in a structured naming convention: {model_name}/{model_name}-{quantization}-{metadata}-{platform}.{suffix}. Metadata includes context window size, sliding window size, and prefill chunk size, with defaults omitted for brevity. This approach allows for easy discovery and selection of pre-optimized model variants tailored for specific hardware and performance needs.

Highlighted Details

Supports popular models like Llama-3, Llama-2, Mistral, RedPajama, Phi, and GPT-2.
Includes quantization options for reduced memory footprint and faster inference.
Provides metadata for context window, sliding window, and prefill chunk sizes.
Organized by model, quantization, metadata, and target platform for streamlined selection.

Maintenance & Community

This repository appears to be a component of the broader MLC LLM project. Further community and maintenance details would likely be found on the main MLC LLM repository.

Licensing & Compatibility

The specific license for these binary libraries is not detailed in the provided README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

The README does not specify the exact quantization methods used or the supported platforms beyond a general mention. The absence of explicit licensing information is a significant caveat for adoption.

binary-mlc-llm-libs by mlc-ai

Explore Similar Projects

varuna by microsoft

calm by zeux

LLM-Viewer by hahnyuan

InferLLM by MegEngine

LiteRT-LM by google-ai-edge

bolt by huawei-noah

awesome-emdl by csarron

Olive by microsoft

CTranslate2 by OpenNMT

airllm by lyogavin

FastDeploy by PaddlePaddle

gpt-oss by openai