deepsparse  by neuralmagic

CPU inference runtime for sparse deep learning models

created 4 years ago
3,160 stars

Top 15.6% on sourcepulse

GitHubView on GitHub
Project Summary

DeepSparse is a CPU inference runtime designed to accelerate neural network performance by leveraging model sparsity. It targets researchers and developers seeking to optimize inference speed and memory usage on CPUs, offering significant gains through techniques like pruning and quantization, with recent expansion into Large Language Models (LLMs).

How It Works

DeepSparse utilizes sparsity-aware kernels and optimizations to accelerate inference on CPUs. It works by exploiting the presence of zero-valued weights and activations, often introduced through pruning and quantization techniques managed by the companion SparseML library. This approach allows for reduced computation and memory bandwidth requirements, leading to faster inference times compared to dense models.

Quick Start & Requirements

  • Install: pip install -U deepsparse-nightly[llm] (for LLM support) or pip install deepsparse
  • Requirements: Linux OS, Python 3.8-3.11 (3.8-3.11 for deepsparse, deepsparse-nightly may support newer), ONNX 1.5.0-1.15.0 with opset 11+. Hardware support includes x86 AVX2, AVX-512, AVX-512 VNNI, and ARM v8.2+.
  • Resources: LLM inference requires specific model downloads from Hugging Face.
  • Docs: TextGeneration documentation, Pipelines User Guide, Server User Guide.

Highlighted Details

  • Achieves 7x acceleration for sparse-quantized LLMs (MPT-7B) over dense baselines.
  • Supports unstructured sparsity and 8-bit weight/activation quantization for LLMs.
  • Offers three deployment APIs: Engine (low-level), Pipeline (pre/post-processing), and Server (REST API).
  • Provides access to a wide range of pre-optimized models via SparseZoo for Computer Vision and NLP tasks.

Maintenance & Community

  • Active development with a community Slack channel available.
  • Regular updates and a nightly build for latest features.
  • Extensive documentation and examples provided.

Licensing & Compatibility

  • DeepSparse Community is licensed under the Neural Magic DeepSparse Community License.
  • Some components are under Apache License Version 2.0.
  • DeepSparse Enterprise requires a license for production/commercial use.

Limitations & Caveats

  • Primarily supports Linux; Mac/Windows users are directed to use Docker.
  • Product Usage Analytics are enabled by default, requiring explicit disabling via an environment variable.
Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
30 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 23 hours ago
Feedback? Help us improve.