aimet-model-zoo by quic

Model zoo for quantized neural network performance analysis

Created 5 years ago

342 stars

Top 81.1% on SourcePulse

Project Summary

This repository provides a comprehensive model zoo for the AI Model Efficiency Toolkit (AIMET), showcasing popular neural network models quantized using AIMET's techniques. It targets researchers and engineers working on optimizing models for edge devices, offering a benchmark of floating-point versus quantized performance and providing scripts for quantization.

How It Works

The zoo demonstrates model quantization using AIMET, a toolkit supporting state-of-the-art techniques like Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT). For each model, it provides links to original FP32 checkpoints and quantized versions, along with evaluation scripts that perform quantization and report accuracy metrics. This allows users to directly compare performance and leverage AIMET's capabilities.

Quick Start & Requirements

Install AIMET and its dependencies following the official Installation instructions.
Install the AIMET model zoo Python package(s).
Run evaluation scripts for specific models, referencing .md files in TensorFlow or PyTorch subfolders for detailed procedures.

Highlighted Details

Extensive coverage across PyTorch and TensorFlow frameworks.
Models span various tasks: Image Classification, Object Detection, Pose Estimation, Super Resolution, Semantic Segmentation, Video Understanding, Speech Recognition, and NLP/NLU.
Quantization levels include W8A8 (8-bit weights, 8-bit activations) and W4A8 (4-bit weights, 8-bit activations), with some models using mixed precision.
Performance comparisons (accuracy, mAP, mIOU, WER, GLUE score, etc.) are provided for FP32 and quantized models.

Maintenance & Community

Maintained by Qualcomm Innovation Center, Inc.
Links to specific documentation and installation guides are provided within the README.

Licensing & Compatibility

The license is specified as available in the LICENSE file. No specific license type is mentioned in the README.

Limitations & Caveats

"TBD" (To Be Determined) entries in the results tables indicate that support or specific results are not yet available for those configurations.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

3

Issues (30d)

0

Star History

2 stars in the last 30 days

Explore Similar Projects

LLM-QAT by facebookresearch

Research paper code for data-free quantization-aware training (QAT) of LLMs

Created 2 years ago

Updated 11 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

EfficientQAT by OpenGVLab

PyTorch implementation for efficient quantization-aware training of LLMs

Created 1 year ago

Updated 3 months ago

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai),

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs), and

4 more.

optimum-quanto by huggingface

PyTorch quantization backend for Hugging Face models

Created 2 years ago

Updated 3 months ago

Starred by

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI),

Daniel Han

Daniel Han(Cofounder of Unsloth), and

3 more.

hqq by dropbox

Model quantizer for fast, accurate post-training quantization, skipping calibration

Created 2 years ago

Updated 2 months ago

Starred by

Luca Antiga

Luca Antiga(CTO of Lightning AI),

William Falcon

William Falcon(Founder of Lightning AI), and

4 more.

lightning-thunder by Lightning-AI

PyTorch compiler for model optimization via source-to-source transformation

Created 1 year ago

Updated 2 days ago

Starred by

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI),

Zack Li

Zack Li(Cofounder of Nexa AI), and

4 more.

smoothquant by mit-han-lab

Post-training quantization research paper for large language models

Created 3 years ago

Updated 1 year ago

Awesome-Model-Quantization by Efficient-ML

Curated list for model quantization research

Created 7 years ago

Updated 4 weeks ago

ppq by OpenPPL

Offline quantization tool for neural network optimization

Created 4 years ago

Updated 1 year ago

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

3 more.

neural-compressor by intel

Python library for model compression (quantization, pruning, distillation, NAS)

Created 5 years ago

Updated 21 hours ago

Starred by

Logan Kilpatrick

Logan Kilpatrick(Product Lead on Google AI Studio),

Paras Jain

Paras Jain(Cofounder of Genmo), and

7 more.

catalyst by catalyst-team

PyTorch framework for accelerated deep learning R&D

Created 7 years ago

Updated 8 months ago

Starred by

Daniel Han

Daniel Han(Cofounder of Unsloth),

Michael Han

Michael Han(Cofounder of Unsloth), and

14 more.

ao by pytorch

PyTorch library for quantization and sparsity in training/inference

Created 2 years ago

Updated 23 hours ago

PINTO_model_zoo by PINTO0309

Model zoo for inter-converted AI frameworks (TF, PyTorch, ONNX, etc.)

Created 6 years ago

Updated 2 months ago

Feedback? Help us improve.