LiteRT-LM by google-ai-edge

C++ library for efficient on-device LLM execution

Created 9 months ago

715 stars

Top 48.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

LiteRT-LM is a C++ library designed for efficient on-device execution of language model pipelines across edge platforms. It targets developers building applications that require local LLM inference, offering cross-platform compatibility and hardware acceleration. The library aims to simplify the deployment of complex LLM workflows, enabling greater flexibility and performance on diverse hardware.

How It Works

LiteRT-LM builds upon the LiteRT runtime, providing a C++ API to manage LLM pipelines. It supports customizability for specific features and leverages hardware acceleration (CPU, GPU, NPU) for performance gains. Models are converted to a proprietary .litertlm format, which is optimized for efficient loading and execution on target devices.

Quick Start & Requirements

Installation: Clone the repository and use Bazel (v7.6.1) for building. Pre-built binaries are available for Android, macOS, Linux, and Windows.
Prerequisites: Git, Bazel (v7.6.1), and platform-specific build tools (e.g., Visual Studio 2022 for Windows, Android NDK for Android development).
Models: Requires models in .litertlm format.
Documentation: Supported Models and Performance, Build and Run the Command Line Demo, LiteRT-LM API.

Highlighted Details

Supports Gemma models with NPU acceleration on Qualcomm and MediaTek chipsets (Early Access Program).
Provides benchmarks showing significant performance improvements with GPU and NPU acceleration on mobile devices.
Offers both high-level GenerateContent and granular RunPrefill/RunDecode C++ APIs for flexible inference control.
The .litertlm format is an evolution of .task files, designed for better compression and metadata inclusion.

Maintenance & Community

The project is from Google AI Edge.
Early preview status, with a first full release expected late summer/early fall.
Issues and feature requests should be reported via GitHub Issues.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

NPU acceleration is currently only available through an Early Access Program.
GPU support is listed as "Coming Soon" for macOS, Windows, Linux, and Embedded platforms.
The .litertlm model format is proprietary and specific to this library.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

179 stars in the last 30 days