deepsparse  by neuralmagic

CPU inference runtime for sparse deep learning models

Created 4 years ago
3,157 stars

Top 15.3% on SourcePulse

GitHubView on GitHub
Project Summary

DeepSparse is a CPU inference runtime designed to accelerate neural network performance by leveraging model sparsity. It targets researchers and developers seeking to optimize inference speed and memory usage on CPUs, offering significant gains through techniques like pruning and quantization, with recent expansion into Large Language Models (LLMs).

How It Works

DeepSparse utilizes sparsity-aware kernels and optimizations to accelerate inference on CPUs. It works by exploiting the presence of zero-valued weights and activations, often introduced through pruning and quantization techniques managed by the companion SparseML library. This approach allows for reduced computation and memory bandwidth requirements, leading to faster inference times compared to dense models.

Quick Start & Requirements

  • Install: pip install -U deepsparse-nightly[llm] (for LLM support) or pip install deepsparse
  • Requirements: Linux OS, Python 3.8-3.11 (3.8-3.11 for deepsparse, deepsparse-nightly may support newer), ONNX 1.5.0-1.15.0 with opset 11+. Hardware support includes x86 AVX2, AVX-512, AVX-512 VNNI, and ARM v8.2+.
  • Resources: LLM inference requires specific model downloads from Hugging Face.
  • Docs: TextGeneration documentation, Pipelines User Guide, Server User Guide.

Highlighted Details

  • Achieves 7x acceleration for sparse-quantized LLMs (MPT-7B) over dense baselines.
  • Supports unstructured sparsity and 8-bit weight/activation quantization for LLMs.
  • Offers three deployment APIs: Engine (low-level), Pipeline (pre/post-processing), and Server (REST API).
  • Provides access to a wide range of pre-optimized models via SparseZoo for Computer Vision and NLP tasks.

Maintenance & Community

  • Active development with a community Slack channel available.
  • Regular updates and a nightly build for latest features.
  • Extensive documentation and examples provided.

Licensing & Compatibility

  • DeepSparse Community is licensed under the Neural Magic DeepSparse Community License.
  • Some components are under Apache License Version 2.0.
  • DeepSparse Enterprise requires a license for production/commercial use.

Limitations & Caveats

  • Primarily supports Linux; Mac/Windows users are directed to use Docker.
  • Product Usage Analytics are enabled by default, requiring explicit disabling via an environment variable.
Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wei-Lin Chiang Wei-Lin Chiang(Cofounder of LMArena), and
3 more.

sparseml by neuralmagic

0.1%
2k
Sparsification toolkit for optimized neural networks
Created 4 years ago
Updated 3 months ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
15 more.

FasterTransformer by NVIDIA

0.1%
6k
Optimized transformer library for inference
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.