Python library for model compression (quantization, pruning, distillation, NAS)
Top 19.3% on sourcepulse
Intel® Neural Compressor is an open-source Python library offering state-of-the-art model compression techniques like low-bit quantization (INT8, FP8, INT4, FP4, NF4) and sparsity. It targets researchers and engineers seeking to optimize deep learning models for inference on various hardware, particularly Intel platforms, by reducing model size and accelerating execution.
How It Works
The library supports quantization, pruning, distillation, and neural architecture search across TensorFlow, PyTorch, and ONNX Runtime. It employs accuracy-driven, automatic quantization strategies, including dynamic, static, smooth, and weight-only quantization, to minimize accuracy loss while maximizing performance gains. The recent 3.x API introduces a Transformers-like interface for INT4 inference.
Quick Start & Requirements
pip install neural-compressor[pt]
(for PyTorch) or pip install neural-compressor[tf]
(for TensorFlow).intel_extension_for_pytorch
). Docker images are recommended for Intel Gaudi AI Accelerators.Highlighted Details
Maintenance & Community
inc.maintainers@intel.com
), and a Discord Channel.Licensing & Compatibility
Limitations & Caveats
3 days ago
1 week