Neural network compression for optimized inference
Top 36.0% on sourcepulse
NNCF (Neural Network Compression Framework) provides algorithms for optimizing neural network inference, primarily targeting the OpenVINO™ toolkit. It supports post-training and training-time compression techniques like quantization and sparsity for PyTorch, TensorFlow, and ONNX models, aiming to reduce model size and improve inference speed with minimal accuracy loss.
How It Works
NNCF employs a unified framework architecture allowing for the addition of various compression algorithms across different deep learning backends. It facilitates automatic, configurable model graph transformations. For post-training quantization, it uses a calibration dataset to gather statistics. Training-time compression integrates directly into the training loop, enabling fine-tuning of model weights and compression parameters simultaneously for potentially higher accuracy.
Quick Start & Requirements
pip install nncf
or conda install -c conda-forge nncf
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
import nncf
before other torch
imports to avoid incomplete compression.1 day ago
1 day