OnnxStream  by vitoplantamura

Lightweight C++ inference library for ONNX models, targeting low-resource devices

created 2 years ago
1,971 stars

Top 22.8% on sourcepulse

GitHubView on GitHub
Project Summary

OnnxStream is a C++ inference library designed for minimal memory footprint, enabling large AI models like Stable Diffusion XL and Mistral 7B to run on resource-constrained devices such as the Raspberry Pi Zero 2. It supports ARM, x86, WASM, and RISC-V architectures, accelerated by XNNPACK.

How It Works

OnnxStream decouples the inference engine from model weights via a WeightsProvider interface, allowing for flexible data loading (e.g., streaming from HTTP). It employs techniques like attention slicing and dynamic quantization to drastically reduce RAM usage, achieving up to 55x less memory consumption than OnnxRuntime for certain models, albeit with a potential latency increase.

Quick Start & Requirements

  • Build: Requires CMake and a compatible XNNPACK version (specific commit provided). Build commands for Linux, Mac, Windows, Termux, and FreeBSD are detailed.
  • Dependencies: XNNPACK is a core dependency. GPU support (cuBLAS) is available for LLM applications. WebAssembly builds are also supported.
  • Models: ONNX models need to be exported and converted using the provided onnx2txt.ipynb notebook. Pre-converted models for SD 1.5, SDXL 1.0, and SDXL Turbo are available on Hugging Face.
  • Resources: Running SD 1.5 on RPi Zero 2 requires ~260MB RAM (VAE quantized). SDXL 1.0 requires ~300MB RAM with tiled decoding.
  • Docs: onnx2txt.ipynb, Stable Diffusion example, custom model conversion

Highlighted Details

  • Achieves ~300MB RAM usage for SDXL 1.0 on RPi Zero 2 via tiled VAE decoding.
  • Supports dynamic quantization (8-bit) and attention slicing for memory reduction.
  • Offers WebAssembly builds for browser-based inference (e.g., Whisper, YOLOv8).
  • Initial GPU support via cuBLAS for LLM applications.

Maintenance & Community

  • Active development with recent updates for WebAssembly support and LLM applications.
  • Related projects include OnnxStreamGui, Auto epaper art, and PaperPiAI.

Licensing & Compatibility

  • The library itself is likely under a permissive license (e.g., MIT, Apache) given its nature, but the README does not explicitly state the license for vitoplantamura/OnnxStream. XNNPACK is typically Apache 2.0.

Limitations & Caveats

  • OnnxStream does not support inputs with a batch size other than 1.
  • Some ONNX operators might not be implemented yet (e.g., Einsum is noted as unsupported for custom SD 1.5 models).
  • The MAX_SPEED build option can increase memory usage during compilation and may cause issues on some platforms like Termux.
  • FreeBSD build requires specific CMake modifications due to XNNPACK limitations.
Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
38 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.