AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
AIMET is a library that provides advanced model quantization and compression techniques for trained neural network models. It provides features that have been proven to improve run-time performance of deep learning neural network models with lower compute and memory requirements and minimal impact to task accuracy.
AIMET is designed to work with PyTorch, TensorFlow and ONNX models.
We also host the AIMET Model Zoo - a collection of popular neural network models optimized for 8-bit inference. We also provide recipes for users to quantize floating point models using AIMET.
Please visit the AIMET on Github Pages for more details.
Some recently added features include
AIMET can quantize an existing 32-bit floating-point model to an 8-bit fixed-point model without sacrificing much accuracy and without model fine-tuning.
The DFQ method applied to several popular networks, such as MobileNet-v2 and ResNet-50, result in less than 0.9% loss in accuracy all the way down to 8-bit quantization, in an automated way without any training data.
Models | FP32 | INT8 Simulation |
---|---|---|
MobileNet v2 (top1) | 71.72% | 71.08% |
ResNet 50 (top1) | 76.05% | 75.45% |
DeepLab v3 (mIOU) | 72.65% | 71.91% |
For this example ADAS object detection model, which was challenging to quantize to 8-bit precision, AdaRound can recover the accuracy to within 1% of the FP32 accuracy.
Configuration | mAP - Mean Average Precision | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FP32 | 82.20% | ||||||||||||||
Nearest Rounding (INT8 weights, INT8 acts) | 49.85% | ||||||||||||||
AdaRound (INT8 weights, INT8 acts) | 81.21% |
For some models like the DeepLabv3 semantic segmentation model, AdaRound can even quantize the model weights to 4-bit precision without a significant drop in accuracy.
Configuration | mIOU - Mean intersection over union | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
FP32 | 72.94% | ||||||||||||||
Nearest Rounding (INT4 weights, INT8 acts) | 6.09% | ||||||||||||||
AdaRound (INT4 weights, INT8 acts) | 70.86% |
AIMET supports quantization simulation and quantization-aware training (QAT) for recurrent models (RNN, LSTM, GRU). Using QAT feature in AIMET, a DeepSpeech2 model with bi-directional LSTMs can be quantized to 8-bit precision with minimal drop in accuracy.
DeepSpeech2 (using bi-directional LSTMs) |
Word Error Rate |
---|---|
FP32 | 9.92% |
INT8 | 10.22% |
AIMET can also significantly compress models. For popular models, such as Resnet-50 and Resnet-18, compression with spatial SVD plus channel pruning achieves 50% MAC (multiply-accumulate) reduction while retaining accuracy within approx. 1% of the original uncompressed model.
Models | Uncompressed model | 50% Compressed model |
---|---|---|
ResNet18 (top1) | 69.76% | 68.56% |
ResNet 50 (top1) | 76.05% | 75.75% |
To install and use the pre-built version of the AIMET package, please follow one of the below links:
To build, modify (optionally) and use the latest AIMET source code, please follow one of the below links:
Thanks for your interest in contributing to AIMET! Please read our Contributions Page for more information on contributing features or bug fixes. We look forward to your participation!
AIMET aims to be a community-driven project maintained by Qualcomm Innovation Center, Inc.
AIMET is licensed under the BSD 3-clause "New" or "Revised" License. Check out the LICENSE for more details.