Lightweight C++ inference library for ONNX models, targeting low-resource devices
Top 22.8% on sourcepulse
OnnxStream is a C++ inference library designed for minimal memory footprint, enabling large AI models like Stable Diffusion XL and Mistral 7B to run on resource-constrained devices such as the Raspberry Pi Zero 2. It supports ARM, x86, WASM, and RISC-V architectures, accelerated by XNNPACK.
How It Works
OnnxStream decouples the inference engine from model weights via a WeightsProvider
interface, allowing for flexible data loading (e.g., streaming from HTTP). It employs techniques like attention slicing and dynamic quantization to drastically reduce RAM usage, achieving up to 55x less memory consumption than OnnxRuntime for certain models, albeit with a potential latency increase.
Quick Start & Requirements
onnx2txt.ipynb
notebook. Pre-converted models for SD 1.5, SDXL 1.0, and SDXL Turbo are available on Hugging Face.Highlighted Details
Maintenance & Community
OnnxStreamGui
, Auto epaper art
, and PaperPiAI
.Licensing & Compatibility
vitoplantamura/OnnxStream
. XNNPACK is typically Apache 2.0.Limitations & Caveats
MAX_SPEED
build option can increase memory usage during compilation and may cause issues on some platforms like Termux.1 week ago
1 day