ppq by OpenPPL

Offline quantization tool for neural network optimization

Created 4 years ago

1,775 stars

Top 24.0% on SourcePulse

Project Summary

PPQ (PPL Quantization Tool) is an industrial-grade, extensible neural network quantization framework designed for accelerating AI models on edge devices. It addresses the challenges of rapidly evolving model architectures and the need for efficient, low-power inference by enabling users to precisely control quantization parameters and optimization passes.

How It Works

PPQ employs a custom execution engine and a graph-based approach to parse, analyze, and modify complex neural network structures. It abstracts quantization logic into 27 independent "Quantization Optimization Passes" that can be customized or combined by users. This modular design allows for fine-grained control over quantization bit-width, granularity, and calibration algorithms for individual operators and tensors, facilitating high flexibility and exploration of new quantization techniques.

Quick Start & Requirements

Installation: Clone the repository, install dependencies via pip install -r requirements.txt, and run python setup.py install. Docker and pip installation options are also available.
Prerequisites: CUDA Toolkit, a C++ compiler (Ninja build recommended), and PyTorch (version 1.10+).
Resources: Setup involves cloning and installing Python packages. Docker image is available for pre-configured environments.
Links: Learning Path, Optimization Passes, Examples.

Highlighted Details

Supports FP8 quantization (E4M3, E5M2).
Features advanced graph pattern matching and fusion capabilities.
Offers native support for ONNX model Quantization-Aware Training (QAT).
Provides extensive integration with multiple inference frameworks (TensorRT, OpenVINO, ONNX Runtime, etc.).

Maintenance & Community

Actively developed by OpenPPL.
Community support via QQ Group (627853444) and email (openppl.ai@hotmail.com).
Contributions are welcomed, with a process for feature discussion.

Licensing & Compatibility

Licensed under the Apache License, Version 2.0.
Permits commercial use and integration with closed-source projects.

Limitations & Caveats

The README indicates that PPQ is tested with specific model suites (mmlab) and provides benchmarks against FP32 and PPLCUDA, but does not detail performance gains from quantization itself or specific hardware targets beyond general framework integrations.

ppq by OpenPPL

Explore Similar Projects

EfficientQAT by OpenGVLab

LLM-QAT by facebookresearch

Sparsebit by megvii-research

aimet-model-zoo by quic

exllamav3 by turboderp-org

optimum-quanto by huggingface

deepcompressor by nunchaku-tech

ComfyUI-nunchaku by nunchaku-tech

smoothquant by mit-han-lab

Awesome-Model-Quantization by Efficient-ML

neural-compressor by intel

PINTO_model_zoo by PINTO0309