ppq  by OpenPPL

Offline quantization tool for neural network optimization

Created 3 years ago
1,751 stars

Top 24.5% on SourcePulse

GitHubView on GitHub
Project Summary

PPQ (PPL Quantization Tool) is an industrial-grade, extensible neural network quantization framework designed for accelerating AI models on edge devices. It addresses the challenges of rapidly evolving model architectures and the need for efficient, low-power inference by enabling users to precisely control quantization parameters and optimization passes.

How It Works

PPQ employs a custom execution engine and a graph-based approach to parse, analyze, and modify complex neural network structures. It abstracts quantization logic into 27 independent "Quantization Optimization Passes" that can be customized or combined by users. This modular design allows for fine-grained control over quantization bit-width, granularity, and calibration algorithms for individual operators and tensors, facilitating high flexibility and exploration of new quantization techniques.

Quick Start & Requirements

  • Installation: Clone the repository, install dependencies via pip install -r requirements.txt, and run python setup.py install. Docker and pip installation options are also available.
  • Prerequisites: CUDA Toolkit, a C++ compiler (Ninja build recommended), and PyTorch (version 1.10+).
  • Resources: Setup involves cloning and installing Python packages. Docker image is available for pre-configured environments.
  • Links: Learning Path, Optimization Passes, Examples.

Highlighted Details

  • Supports FP8 quantization (E4M3, E5M2).
  • Features advanced graph pattern matching and fusion capabilities.
  • Offers native support for ONNX model Quantization-Aware Training (QAT).
  • Provides extensive integration with multiple inference frameworks (TensorRT, OpenVINO, ONNX Runtime, etc.).

Maintenance & Community

  • Actively developed by OpenPPL.
  • Community support via QQ Group (627853444) and email (openppl.ai@hotmail.com).
  • Contributions are welcomed, with a process for feature discussion.

Licensing & Compatibility

  • Licensed under the Apache License, Version 2.0.
  • Permits commercial use and integration with closed-source projects.

Limitations & Caveats

The README indicates that PPQ is tested with specific model suites (mmlab) and provides benchmarks against FP32 and PPLCUDA, but does not detail performance gains from quantization itself or specific hardware targets beyond general framework integrations.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
25 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Zack Li Zack Li(Cofounder of Nexa AI), and
4 more.

smoothquant by mit-han-lab

0.3%
2k
Post-training quantization research paper for large language models
Created 2 years ago
Updated 1 year ago
Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

neural-compressor by intel

0.2%
2k
Python library for model compression (quantization, pruning, distillation, NAS)
Created 5 years ago
Updated 13 hours ago
Feedback? Help us improve.