nncf by openvinotoolkit

Neural network compression for optimized inference

Created 5 years ago

1,115 stars

Top 34.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

James Reed

Cofounder of Fireworks AI

Project Summary

NNCF (Neural Network Compression Framework) provides algorithms for optimizing neural network inference, primarily targeting the OpenVINO™ toolkit. It supports post-training and training-time compression techniques like quantization and sparsity for PyTorch, TensorFlow, and ONNX models, aiming to reduce model size and improve inference speed with minimal accuracy loss.

How It Works

NNCF employs a unified framework architecture allowing for the addition of various compression algorithms across different deep learning backends. It facilitates automatic, configurable model graph transformations. For post-training quantization, it uses a calibration dataset to gather statistics. Training-time compression integrates directly into the training loop, enabling fine-tuning of model weights and compression parameters simultaneously for potentially higher accuracy.

Quick Start & Requirements

Install: pip install nncf or conda install -c conda-forge nncf
Requirements: Python 3.9+, backend-specific dependencies (PyTorch, TensorFlow, ONNX).
Links: Documentation, Model Zoo

Highlighted Details

Supports Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT).
Offers Weight Compression, Sparsity, and Filter Pruning algorithms.
Integrates with HuggingFace Optimum Intel and OpenVINO Training Extensions.
Provides extensive Jupyter notebooks and sample scripts for various models and domains.

Maintenance & Community

Actively maintained by the OpenVINO™ toolkit team.
Community support via Discord/Slack (implied by common Intel open source practices).
Contributing Guide available.

Licensing & Compatibility

License: Apache License 2.0.
Compatibility: Compatible with commercial use and closed-source linking.

Limitations & Caveats

Limited support for TensorFlow models, primarily Sequential or Keras Functional API.
TorchFX integration is experimental.
Activation Sparsity and Movement Pruning are experimental for some backends.
PyTorch users must import nncf before other torch imports to avoid incomplete compression.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days