picollm  by Picovoice

On-device LLM inference SDK for quantized models

Created 1 year ago
268 stars

Top 95.7% on SourcePulse

GitHubView on GitHub
Project Summary

picoLLM is an on-device inference engine for large language models (LLMs), designed for high accuracy and privacy. It targets developers and researchers needing to run LLMs locally across various platforms, including desktops, mobile, and web browsers, with a key benefit being its efficient X-Bit quantization for reduced resource consumption.

How It Works

picoLLM utilizes a novel quantization algorithm called picoLLM Compression, which automatically optimizes bit allocation across LLM weights based on a task-specific cost function. This adaptive approach surpasses fixed-bit schemes like GPTQ, recovering significantly more accuracy (e.g., 91-100% MMLU score recovery at 2-4 bits for Llama-3-8b). Inference runs entirely locally, ensuring data privacy.

Quick Start & Requirements

  • Install: pip3 install picollm (Python), dotnet add package PicoLLM (.NET), yarn add @picovoice/picollm-node (Node.js), etc.
  • Prerequisites: An AccessKey from Picovoice Console is required for authentication, though inference is free for open-weight models. Model files must be downloaded separately from the Picovoice Console.
  • Demos: Available for Python, .NET, Node.js, Android, iOS, Web, and C.
  • Docs: https://picovoice.ai/docs/picollm/

Highlighted Details

  • Supports a wide range of open-weight models including Gemma, Llama (2 & 3), Mistral, Mixtral, Phi-2, and Phi-3.
  • Cross-platform compatibility: Linux, macOS, Windows, Raspberry Pi (4 & 5), Android, iOS, and major web browsers.
  • Runs on both CPU and GPU.
  • Offers an interrupt() function for halting generation.

Maintenance & Community

  • Actively maintained by Picovoice. Recent releases include performance improvements and support for new models like Phi-3.5.
  • No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • The SDKs and inference engine are free for open-weight models.
  • An AccessKey is required, implying a licensing/authentication mechanism tied to Picovoice services, even for offline inference.

Limitations & Caveats

  • An internet connection is required to validate the AccessKey, even though inference is offline.
  • Model files must be downloaded separately from the Picovoice Console.
Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Gabriel Almeida Gabriel Almeida(Cofounder of Langflow), and
2 more.

torchchat by pytorch

0.1%
4k
PyTorch-native SDK for local LLM inference across diverse platforms
Created 1 year ago
Updated 1 week ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

llm-awq by mit-han-lab

0.3%
3k
Weight quantization research paper for LLM compression/acceleration
Created 2 years ago
Updated 2 months ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

airllm by lyogavin

0.1%
6k
Inference optimization for LLMs on low-resource hardware
Created 2 years ago
Updated 2 weeks ago
Feedback? Help us improve.