nncf  by openvinotoolkit

Neural network compression for optimized inference

Created 5 years ago
1,079 stars

Top 35.2% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

NNCF (Neural Network Compression Framework) provides algorithms for optimizing neural network inference, primarily targeting the OpenVINO™ toolkit. It supports post-training and training-time compression techniques like quantization and sparsity for PyTorch, TensorFlow, and ONNX models, aiming to reduce model size and improve inference speed with minimal accuracy loss.

How It Works

NNCF employs a unified framework architecture allowing for the addition of various compression algorithms across different deep learning backends. It facilitates automatic, configurable model graph transformations. For post-training quantization, it uses a calibration dataset to gather statistics. Training-time compression integrates directly into the training loop, enabling fine-tuning of model weights and compression parameters simultaneously for potentially higher accuracy.

Quick Start & Requirements

  • Install: pip install nncf or conda install -c conda-forge nncf
  • Requirements: Python 3.9+, backend-specific dependencies (PyTorch, TensorFlow, ONNX).
  • Links: Documentation, Model Zoo

Highlighted Details

  • Supports Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT).
  • Offers Weight Compression, Sparsity, and Filter Pruning algorithms.
  • Integrates with HuggingFace Optimum Intel and OpenVINO Training Extensions.
  • Provides extensive Jupyter notebooks and sample scripts for various models and domains.

Maintenance & Community

  • Actively maintained by the OpenVINO™ toolkit team.
  • Community support via Discord/Slack (implied by common Intel open source practices).
  • Contributing Guide available.

Licensing & Compatibility

  • License: Apache License 2.0.
  • Compatibility: Compatible with commercial use and closed-source linking.

Limitations & Caveats

  • Limited support for TensorFlow models, primarily Sequential or Keras Functional API.
  • TorchFX integration is experimental.
  • Activation Sparsity and Movement Pruning are experimental for some backends.
  • PyTorch users must import nncf before other torch imports to avoid incomplete compression.
Health Check
Last Commit

15 hours ago

Responsiveness

1 day

Pull Requests (30d)
38
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

neural-compressor by intel

0.2%
2k
Python library for model compression (quantization, pruning, distillation, NAS)
Created 5 years ago
Updated 15 hours ago
Feedback? Help us improve.