TensorRT Versions Save

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

22.12

1 year ago

Commit used by the 22.12 TensorRT NGC container.

Added

  • Stable Diffusion demo using TensorRT Plugins
  • KV-cache and beam search to GPT2 and T5 demos
  • Perplexity calculation to all HF demos

Changed

  • Updated trex to v0.1.5
  • Increased default workspace size in demoBERT to build BS=128 fp32 engines
  • Use avg_iter=8 and timing cache to make demoBERT perf more stable

Removed

  • None

8.5.1

1 year ago

TensorRT OSS release corresponding to TensorRT 8.5.1.7 GA release.

Key Features and Updates:

  • Samples enhancements

    • Added sampleNamedDimensions which works with named dimensions.
    • Updated sampleINT8API and introductory_parser_samples to use ONNX models over Caffe/UFF
    • Removed UFF/Caffe samples including sampleMNIST, end_to_end_tensorflow_mnist, sampleINT8, sampleMNISTAPI, sampleUffMNIST, sampleUffPluginV2Ext, engine_refit_mnist, int8_caffe_mnist, uff_custom_plugin, sampleFasterRCNN, sampleUffFasterRCNN, sampleGoogleNet, sampleSSD, sampleUffSSD, sampleUffMaskRCNN and uff_ssd.
  • Plugin enhancements

    • Added GridAnchorRectPlugin to support rectangular feature maps in gridAnchorPlugin.
    • Added ROIAlignPlugin to support the ONNX operator RoiAlign. The ONNX parser will automatically route ROIAlign ops through the plugin.
    • Added Hopper support for the BERTQKVToContextPlugin plugin.
    • Exposed the use_int8_scale_max attribute in the BERTQKVToContextPlugin plugin to allow users to disable the by-default usage of INT8 scale factors to optimize softmax MAX reduction in versions 2 and 3 of the plugin.
  • ONNX-TensorRT changes

  • Build containers

    • Updated default cuda versions to 11.8.0.
  • Tooling enhancements

8.4.3

1 year ago

TensorRT OSS release corresponding to TensorRT 8.4.3.1 release.

Key Updates:

  • Python packages for Python 3.10.
  • Bug fix for potential overlaps in H2D and inference execution in trtexec.

22.08

1 year ago

Commit used by the 22.08 TensorRT NGC container.

Changelog

Updated TensorRT version to 8.4.2 - see the TensorRT 8.4.2 release notes for more information

Changed

  • Updated default protobuf version to 3.20.x
  • Updated ONNX-TensorRT submodule version to 22.08 tag
  • Updated sampleIOFormats and sampleAlgorithmSelector to use ONNX models over Caffe

Fixes

  • Fixed missing serialization member in CustomClipPlugin plugin
  • Fixed various Python import issues

Added

  • Added new DeBERTA demo
  • Added version 2 for disentangledAttentionPlugin to support DeBERTA v2

Removed

  • None

22.07

1 year ago

Commit used by the 22.07 TensorRT NGC container.

Changelog

Added

  • polygraphy-trtexec-plugin tool for Polygraphy
  • Multi-profile support for demoBERT
  • KV cache support for HF BART demo

Changed

  • Updated ONNX-GS to v0.3.20

Removed

  • None

8.4.1

1 year ago

TensorRT OSS release corresponding to TensorRT 8.4.1.5 GA release.

Key Features and Updates:

  • Samples enhancements

  • EfficientDet sample

    • Added support for EfficientDet Lite and AdvProp models.
    • Added dynamic batch support.
    • Added mixed precision engine builder.
  • HuggingFace transformer demo

    • Added BART model.
    • Performance speedup of GPT-2 greedy search using GPU implementation.
    • Fixed GPT2 onnx export failure due to 2G file size limitation.
    • Extended Megatron LayerNorm plugins to support larger hidden sizes.
    • Added performance benchmarking mode.
    • Enable tf32 format by default.
  • demoBERT enhancements

    • Add --duration flag to perf benchmarking script.
    • Fixed import of nvinfer_plugins library in demoBERT on Windows.
  • Torch-QAT toolkit

    • quant_bert.py module removed. It is now upstreamed to HuggingFace QDQBERT.
    • Use axis0 as default for deconv.
    • #1939 - Fixed path in classification_flow example.
  • Plugin enhancements

  • Build containers

    • Updated default cuda versions to 11.6.2.
    • CentOS Linux 8 has reached End-of-Life on Dec 31, 2021. The corresponding container has been removed from TensorRT-OSS.
    • Install devtoolset-8 for updated g++ versions in CentOS7 container.
  • Tooling enhancements

  • trtexec enhancements

    • Added --layerPrecisions and --layerOutputTypes flags for specifying layer-wise precision and output type constraints.
    • Added --memPoolSize flag to specify the size of workspace as well as the DLA memory pools via a unified interface. Correspondingly the --workspace flag has been deprecated.
    • "End-To-End Host Latency" metric has been removed. Use the “Host Latency” metric instead. For more information, refer to Benchmarking Network section in the TensorRT Developer Guide.
    • Use enqueueV2() instead of enqueue() when engine has explicit batch dimensions.

22.06

1 year ago

Commit used by the 22.06 TensorRT NGC container.

Changelog

Added

  • None

Changed

  • Disentangled attention (DMHA) plugin refactored
  • ONNX parser updated to 8.2GA

Removed

  • None

22.05

2 years ago

Commit used by the 22.05 TensorRT NGC container.

Changelog

Added

  • Disentangled attention plugin for DeBERTa
  • DMHA (multiscaleDeformableAttnPlugin) plugin for DDETR
  • Performance benchmarking mode to HuggingFace demo

Changed

  • Updated base TensorRT version to 8.2.5.1
  • Updated onnx-graphsurgeon v0.3.19 CHANGELOG
  • fp16 support for pillarScatterPlugin
  • #1939 - Fixed path in quantization classification_flow
  • Fixed GPT2 onnx export failure due to 2G limitation
  • Use axis0 as default for deconv in pytorch-quantization toolkit
  • Updated onnx export script for CoordConvAC sample
  • Install devtoolset-8 for updated g++ version in CentOS7 container

Removed

  • Usage of deprecated TensorRT APIs in samples removed
  • quant_bert.py module removed from pytorch-quantization

22.04

2 years ago

Commit used by the 22.04 TensorRT NGC container.

Changelog

Added

  • TensorRT Engine Explorer v0.1.0 README
  • Detectron 2 Mask R-CNN R50-FPN python sample
  • Model export script for sampleOnnxMnistCoordConvAC

Changed

  • Updated base TensorRT version to 8.2.4.2
  • Updated copyright headers with SPDX identifiers
  • Updated onnx-graphsurgeon v0.3.17 CHANGELOG
  • PyramidROIAlign plugin refactor and bug fixes
  • Fixed MultilevelCropAndResize crashes on Windows
  • #1583 - sublicense ieee/half.h under Apache2
  • Updated demo/BERT performance tables for rel-8.2
  • #1774 Fix python hangs at IndexErrors when TF is imported after TensorRT
  • Various bugfixes in demos - BERT, Tacotron2 and HuggingFace GPT/T5 notebooks
  • Cleaned up sample READMEs

Removed

  • sampleNMT removed from samples

22.03

2 years ago

Commit used by the 22.03 TensorRT NGC container.

Changelog

Added

  • EfficientDet sample enhancements
    • Added support for EfficientDet Lite and AdvProp models.
    • Added dynamic batch support.
    • Added mixed precision engine builder.

Changed

  • Better decoupling of HuggingFace demo tests