NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Commit used by the 22.12 TensorRT NGC container.
avg_iter=8
and timing cache to make demoBERT perf more stableTensorRT OSS release corresponding to TensorRT 8.5.1.7 GA release.
Key Features and Updates:
Samples enhancements
sampleINT8API
and introductory_parser_samples
to use ONNX
models over Caffe
/UFF
sampleMNIST
, end_to_end_tensorflow_mnist
, sampleINT8
, sampleMNISTAPI
, sampleUffMNIST
, sampleUffPluginV2Ext
, engine_refit_mnist
, int8_caffe_mnist
, uff_custom_plugin
, sampleFasterRCNN
, sampleUffFasterRCNN
, sampleGoogleNet
, sampleSSD
, sampleUffSSD
, sampleUffMaskRCNN
and uff_ssd
.Plugin enhancements
ONNX-TensorRT changes
Build containers
11.8.0
.Tooling enhancements
TensorRT OSS release corresponding to TensorRT 8.4.3.1 release.
Key Updates:
trtexec
.Commit used by the 22.08 TensorRT NGC container.
Updated TensorRT version to 8.4.2 - see the TensorRT 8.4.2 release notes for more information
22.08
tagsampleIOFormats
and sampleAlgorithmSelector
to use ONNX
models over Caffe
CustomClipPlugin
plugindisentangledAttentionPlugin
to support DeBERTA v2TensorRT OSS release corresponding to TensorRT 8.4.1.5 GA release.
Key Features and Updates:
Samples enhancements
sampleNMT
.EfficientDet sample
HuggingFace transformer demo
demoBERT
enhancements
--duration
flag to perf benchmarking script.nvinfer_plugins
library in demoBERT on Windows.Torch-QAT toolkit
quant_bert.py
module removed. It is now upstreamed to HuggingFace QDQBERT.classification_flow
example.Plugin enhancements
DisentangledAttention_TRT
, to support DeBERTa model.MultiscaleDeformableAttnPlugin_TRT
, to support DDETR model.fp16
support for pillarScatterPlugin
.Build containers
11.6.2
.devtoolset-8
for updated g++ versions in CentOS7 container.Tooling enhancements
trtexec
enhancements
--layerPrecisions
and --layerOutputTypes
flags for specifying layer-wise precision and output type constraints.--memPoolSize
flag to specify the size of workspace as well as the DLA memory pools via a unified interface. Correspondingly the --workspace
flag has been deprecated.enqueueV2()
instead of enqueue()
when engine has explicit batch dimensions.Commit used by the 22.05 TensorRT NGC container.
classification_flow
quant_bert.py
module removed from pytorch-quantizationCommit used by the 22.04 TensorRT NGC container.
PyramidROIAlign
plugin refactor and bug fixesMultilevelCropAndResize
crashes on Windows