Aimet Versions Save

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

1.31.0

1 month ago

1.30.0

3 months ago

What's New

ONNX

  • Upgraded AIMET to support Onnx version 1.14 and ONNXRUNTIME version 1.15.
  • Added support for AutoQuant.

Documentation

1.29.0

5 months ago

What's New

Keras

  • Fixes issues with TF Op Lambda Layers in Qc Quantize Wrappers call.

PyTorch

  • [experimental] Support for embedding AIMET encodings within the graph using ONNX quantize/dequantize operators. Currently this option is only supported when using 8bit per-tensor quantization.

ONNX

  • Added support for Adaround.

TensorFlow

  • No significant updates

Documentation

1.28.1

6 months ago

1.28.0

8 months ago

What's New

Keras

  • Added Support for Spatial SVD Compression feature.
  • [experimental] Debugging APIs have been added for dumping intermediate tensor outputs. This data can be used with current QNN/SNPE tools for debugging accuracy problems.

PyTorch

  • Upgraded AIMET Pytorch default version to 1.13. AIMET remains compatible with Pytorch version 1.9.

ONNX

  • [experimental] Debugging APIs have been added for dumping intermediate tensor outputs. This data can be used with current QNN/SNPE tools for debugging accuracy problems.

TensorFlow

  • No significant updates

Documentation

1.27.0

9 months ago

What's New

Keras

  • Update support for TFOpLambda layers in Batch Norm Folding with extra call args/kwargs.

PyTorch

  • Added AIMET to support PyTorch version 1.13.0. Only ONNX opset 14 is supported for export.
  • [experimental] Debugging APIs have been added for dumping intermediate tensor data. This data can be used with current QNN/SNPE tools for debugging accuracy problems. Layer Output Generation API gives incorrect tensor data for the layer just before Relu when used for original FP32 model.
  • [experimental] Support for embedding AIMET encodings within the graph using ONNX quantize/dequantize operators. Currently this is option is only supported when using 8bit per-tensor quantization.
  • Fixed a bug in AIMET QuantSim for PyTorch models to handle non-contiguous tensors.

ONNX

  • AIMET support for ONNX 1.11.0 has been added. However there is currently limited op support in QNN/SNPE. If the model fails to load please continue to use opset 11 for export.

TensorFlow

  • [experimental] Debugging APIs have been added for dumping intermediate tensor outputs. This data can be used with current QNN/SNPE tools for debugging accuracy problems.

Documentation

1.26.1

10 months ago

What's New

TensorFlow

  • Upgraded AIMET to support TensorFlow version 2.10.1 (AIMET remains compatible with TensorFlow 2.4).
  • Several bug fixes

Common

  • Upgraded to Ubuntu 20 base image for all variants.

Documentation

1.26.0

1 year ago

What's New

Keras

  • Added a feature called BN Re-estimation that can improve model accuracy after QAT for INT4 quantization.
  • Updated the AutoQuant feature to automatically choose the optimal calibration scheme, create an HTML report on which optimizations were applied.
  • Update to Model Preparer to replace separable conventional with depth wise and point wise conv layers.
  • Fixes BN fold implementation to account for a subsequent multi-input layer
  • Fixed a bug where min/max encoding values were not aligned with scale/offset during QAT.

PyTorch

  • Several bug fixes

TensorFlow

  • Added a feature called BN Re-estimation that can improve model accuracy after QAT for INT4 quantization
  • Updated the AutoQuant feature to automatically choose the optimal calibration scheme, create an HTML report on which optimizations were applied.
  • Fixed a bug where min/max encoding values were not aligned with scale/offset during QAT.

Common

  • Documentation updates for taking AIMET models to target.
  • Standalone Batchnorm layers parameter’s conversion such that it will behave as linear/dense layer.

Experimental

  • Added new Architecture Checker feature to identify and report model architecture constructs that are not ideal for quantized runtimes. Users can utilize this information to change their model architectures accordingly.

Documentation

1.25.0

1 year ago

What's New

Keras

  • Added QuantAnalyzer feature
  • Adds Batch Normalization folding for Functional Keras Models. This allows the default config files to work for super grouping.
  • Resolved an issue with quantizer placement in Sequential blocks in subclassed models

PyTorch

  • Added AutoQuant V2 which includes advanced features such as out-of-the-box inference, model preparer, quant scheme search, improved summary report, etc.
  • Fixes to resolve minor accuracy diffs in the learnedGrid quantizer for per-channel quantization
  • Fixes to improve EfficientNetB4 accuracy w/respect to target
  • Fixed rare case where quantizer may calculate incorrect offset when generating QAT 2.0 learned encodings

TensorFlow

  • Added QuantAnalyzer feature
  • Fixed an accuracy issue due to rare cases where the incorrect BN epsilon was being used
  • Fixed an accuracy issue due to Quantsim export incorrectly recomputing QAT2.0 encodings

Common

  • Updated AIMET python package version format to support latest pip
  • Fixed an issue where not all inputs might be quantized properly

Documentation

1.24.0

1 year ago

What's New

  • Export quantsim configuration for configuring downstream target quantization

PyTorch

  • Fixes to resolve minor accuracy diffs in the learnedGrid quantizer for per-channel quantization
  • Added support for AMP 2.0 which enables faster automatic mixed precision
  • Added support for QAT for INT4 quantized models – includes a feature for performing BN Re-estimation after QAT

Keras

  • Added support for AMP 2.0 which enables faster automatic mixed precision
  • Support for basic transformer networks
  • Added support for subclassed models. The current subclassing feature includes support for only a single level of subclassing and does not support lambdas.
  • Added QAT per-channel gradient support
  • Minor updates to the quantization configuration
  • Fixed QuantSim bug where layers using dtypes other than float were incorrectly quantized

TensorFlow

  • Added an additional prelu mapping pattern to ensure proper folding and quantsim node placement
  • Fixed per-channel encoding representation to align with Pytorch and Keras

Documentation