deepsparse by neuralmagic

CPU inference runtime for sparse deep learning models

Created 5 years ago

3,160 stars

Top 15.1% on SourcePulse

View on GitHub

5 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Cofounder of Lightning AI

and 1 more!

Project Summary

DeepSparse is a CPU inference runtime designed to accelerate neural network performance by leveraging model sparsity. It targets researchers and developers seeking to optimize inference speed and memory usage on CPUs, offering significant gains through techniques like pruning and quantization, with recent expansion into Large Language Models (LLMs).

How It Works

DeepSparse utilizes sparsity-aware kernels and optimizations to accelerate inference on CPUs. It works by exploiting the presence of zero-valued weights and activations, often introduced through pruning and quantization techniques managed by the companion SparseML library. This approach allows for reduced computation and memory bandwidth requirements, leading to faster inference times compared to dense models.

Quick Start & Requirements

Install: pip install -U deepsparse-nightly[llm] (for LLM support) or pip install deepsparse
Requirements: Linux OS, Python 3.8-3.11 (3.8-3.11 for deepsparse, deepsparse-nightly may support newer), ONNX 1.5.0-1.15.0 with opset 11+. Hardware support includes x86 AVX2, AVX-512, AVX-512 VNNI, and ARM v8.2+.
Resources: LLM inference requires specific model downloads from Hugging Face.
Docs: TextGeneration documentation, Pipelines User Guide, Server User Guide.

Highlighted Details

Achieves 7x acceleration for sparse-quantized LLMs (MPT-7B) over dense baselines.
Supports unstructured sparsity and 8-bit weight/activation quantization for LLMs.
Offers three deployment APIs: Engine (low-level), Pipeline (pre/post-processing), and Server (REST API).
Provides access to a wide range of pre-optimized models via SparseZoo for Computer Vision and NLP tasks.

Maintenance & Community

Active development with a community Slack channel available.
Regular updates and a nightly build for latest features.
Extensive documentation and examples provided.

Licensing & Compatibility

DeepSparse Community is licensed under the Neural Magic DeepSparse Community License.
Some components are under Apache License Version 2.0.
DeepSparse Enterprise requires a license for production/commercial use.

Limitations & Caveats

Primarily supports Linux; Mac/Windows users are directed to use Docker.
Product Usage Analytics are enabled by default, requiring explicit disabling via an environment variable.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days