Discover and explore top open-source AI tools and projects—updated daily.
On-device LLM inference SDK for quantized models
Top 95.7% on SourcePulse
picoLLM is an on-device inference engine for large language models (LLMs), designed for high accuracy and privacy. It targets developers and researchers needing to run LLMs locally across various platforms, including desktops, mobile, and web browsers, with a key benefit being its efficient X-Bit quantization for reduced resource consumption.
How It Works
picoLLM utilizes a novel quantization algorithm called picoLLM Compression, which automatically optimizes bit allocation across LLM weights based on a task-specific cost function. This adaptive approach surpasses fixed-bit schemes like GPTQ, recovering significantly more accuracy (e.g., 91-100% MMLU score recovery at 2-4 bits for Llama-3-8b). Inference runs entirely locally, ensuring data privacy.
Quick Start & Requirements
pip3 install picollm
(Python), dotnet add package PicoLLM
(.NET), yarn add @picovoice/picollm-node
(Node.js), etc.Highlighted Details
interrupt()
function for halting generation.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 month ago
1 day