LiteRT-LM  by google-ai-edge

C++ library for efficient on-device LLM execution

Created 5 months ago
336 stars

Top 81.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

LiteRT-LM is a C++ library designed for efficient on-device execution of language model pipelines across edge platforms. It targets developers building applications that require local LLM inference, offering cross-platform compatibility and hardware acceleration. The library aims to simplify the deployment of complex LLM workflows, enabling greater flexibility and performance on diverse hardware.

How It Works

LiteRT-LM builds upon the LiteRT runtime, providing a C++ API to manage LLM pipelines. It supports customizability for specific features and leverages hardware acceleration (CPU, GPU, NPU) for performance gains. Models are converted to a proprietary .litertlm format, which is optimized for efficient loading and execution on target devices.

Quick Start & Requirements

  • Installation: Clone the repository and use Bazel (v7.6.1) for building. Pre-built binaries are available for Android, macOS, Linux, and Windows.
  • Prerequisites: Git, Bazel (v7.6.1), and platform-specific build tools (e.g., Visual Studio 2022 for Windows, Android NDK for Android development).
  • Models: Requires models in .litertlm format.
  • Documentation: Supported Models and Performance, Build and Run the Command Line Demo, LiteRT-LM API.

Highlighted Details

  • Supports Gemma models with NPU acceleration on Qualcomm and MediaTek chipsets (Early Access Program).
  • Provides benchmarks showing significant performance improvements with GPU and NPU acceleration on mobile devices.
  • Offers both high-level GenerateContent and granular RunPrefill/RunDecode C++ APIs for flexible inference control.
  • The .litertlm format is an evolution of .task files, designed for better compression and metadata inclusion.

Maintenance & Community

  • The project is from Google AI Edge.
  • Early preview status, with a first full release expected late summer/early fall.
  • Issues and feature requests should be reported via GitHub Issues.

Licensing & Compatibility

  • The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • NPU acceleration is currently only available through an Early Access Program.
  • GPU support is listed as "Coming Soon" for macOS, Windows, Linux, and Embedded platforms.
  • The .litertlm model format is proprietary and specific to this library.
Health Check
Last Commit

16 hours ago

Responsiveness

Inactive

Pull Requests (30d)
126
Issues (30d)
4
Star History
32 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Ying Sheng Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

0.4%
4k
High-performance C++ LLM inference library
Created 2 years ago
Updated 1 week ago
Feedback? Help us improve.