CPU inference runtime for sparse deep learning models
Top 15.6% on sourcepulse
DeepSparse is a CPU inference runtime designed to accelerate neural network performance by leveraging model sparsity. It targets researchers and developers seeking to optimize inference speed and memory usage on CPUs, offering significant gains through techniques like pruning and quantization, with recent expansion into Large Language Models (LLMs).
How It Works
DeepSparse utilizes sparsity-aware kernels and optimizations to accelerate inference on CPUs. It works by exploiting the presence of zero-valued weights and activations, often introduced through pruning and quantization techniques managed by the companion SparseML library. This approach allows for reduced computation and memory bandwidth requirements, leading to faster inference times compared to dense models.
Quick Start & Requirements
pip install -U deepsparse-nightly[llm]
(for LLM support) or pip install deepsparse
deepsparse
, deepsparse-nightly
may support newer), ONNX 1.5.0-1.15.0 with opset 11+. Hardware support includes x86 AVX2, AVX-512, AVX-512 VNNI, and ARM v8.2+.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 months ago
Inactive