ppq  by OpenPPL

Offline quantization tool for neural network optimization

created 3 years ago
1,718 stars

Top 25.4% on sourcepulse

GitHubView on GitHub
Project Summary

PPQ (PPL Quantization Tool) is an industrial-grade, extensible neural network quantization framework designed for accelerating AI models on edge devices. It addresses the challenges of rapidly evolving model architectures and the need for efficient, low-power inference by enabling users to precisely control quantization parameters and optimization passes.

How It Works

PPQ employs a custom execution engine and a graph-based approach to parse, analyze, and modify complex neural network structures. It abstracts quantization logic into 27 independent "Quantization Optimization Passes" that can be customized or combined by users. This modular design allows for fine-grained control over quantization bit-width, granularity, and calibration algorithms for individual operators and tensors, facilitating high flexibility and exploration of new quantization techniques.

Quick Start & Requirements

  • Installation: Clone the repository, install dependencies via pip install -r requirements.txt, and run python setup.py install. Docker and pip installation options are also available.
  • Prerequisites: CUDA Toolkit, a C++ compiler (Ninja build recommended), and PyTorch (version 1.10+).
  • Resources: Setup involves cloning and installing Python packages. Docker image is available for pre-configured environments.
  • Links: Learning Path, Optimization Passes, Examples.

Highlighted Details

  • Supports FP8 quantization (E4M3, E5M2).
  • Features advanced graph pattern matching and fusion capabilities.
  • Offers native support for ONNX model Quantization-Aware Training (QAT).
  • Provides extensive integration with multiple inference frameworks (TensorRT, OpenVINO, ONNX Runtime, etc.).

Maintenance & Community

  • Actively developed by OpenPPL.
  • Community support via QQ Group (627853444) and email (openppl.ai@hotmail.com).
  • Contributions are welcomed, with a process for feature discussion.

Licensing & Compatibility

  • Licensed under the Apache License, Version 2.0.
  • Permits commercial use and integration with closed-source projects.

Limitations & Caveats

The README indicates that PPQ is tested with specific model suites (mmlab) and provides benchmarks against FP32 and PPLCUDA, but does not detail performance gains from quantization itself or specific hardware targets beyond general framework integrations.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
41 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
6 more.

AutoGPTQ by AutoGPTQ

0.1%
5k
LLM quantization package using GPTQ algorithm
created 2 years ago
updated 3 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Nathan Lambert Nathan Lambert(AI Researcher at AI2), and
1 more.

tianshou by thu-ml

0.1%
9k
PyTorch RL library for algorithm development and application
created 7 years ago
updated 2 days ago
Feedback? Help us improve.