TRTorch Versions Save

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

v0.2.0

3 years ago

TRTorch v0.2.0

Support for PyTorch 1.7.x, Multi Device APIs, Runtime Library, New Converters, Bug Fixes

This is the second beta release of TRTorch, targeting PyTorch 1.7.x, CUDA 11.0 (on x86_64), TensorRT 7.2 and cuDNN 8. TRTorch 0.2.0 for aarch64 targets JetPack 4.5.x. It updates the to_backend integration for PyTorch to reflect changes in the PyTorch API. A new API has been added to disable the newly introduced TF32 data format used on Ampere as TF32 is now the default FP32 format used in TRTorch. APIs have been solidified for runtime configuration of the active CUDA device to let users choose what device a program is deserialized on. This API will continue to change as we further define the serialization format and work with the PyTorch team to make runtime device configuration more ergonomic. You can follow this work here: https://github.com/NVIDIA/TRTorch/discussions/311. This PR also formalizes DLA support in TRTorch, adding APIs and capabilities to target DLA on Jetson and DRIVE platforms. v0.2.0 also includes a new shared library libtrtorchrt.so. This library only contains the runtime components of TRTorch and is suitable for use in situations where device footprint is extremely limited. libtrtorch.so can be linked to C++ applications and loaded into Python scripts and will load all necessary trtorch runtime components into the torch runtime allowing users to run TRTorch applications without the full compiler. v0.2.0 also adds support for Python 3.9.

Dependencies:

- Bazel 4.0.0
- Libtorch 1.7.1 (on x86_64), 1.7.0 (on aarch64)
- CUDA 11.0 (by default, newer CUDA 11 supported with compatible PyTorch build)
- cuDNN 8.0.5
- TensorRT 7.2.2

v0.2.0 (2021-02-25)

  • refactor!: Update bazel and trt versions (0618b6b)

Bug Fixes

  • //core/conversion/conversionctx: Fix memory leak in conversion (6f83b41)
  • //core/lowering: fix debug message for bn dim check removal pass (86bb5b7)
  • //py: Fix bounds for enum macros (6b942e5)
  • aten::expand: Fix compiler warning for unused out ITensor (5b0f584)
  • aten::expand: Fix compiler warnings in the expand converter (51b09d4)
  • aten::flatten: Fixing flatten converter to handle dynamic batch (00f2d78)
  • aten::max_pool2d: Supressing error due to not filling in stride in (ed3c185)
  • aten::zeros: verify zeros produces a tensor correctly (00d2d0c)
  • remove_to: bug in remove_to.cpp, replace outputs()[0] with inputs()[0] (6c5118a)
  • setup.py: Broaden the supported pytorch versions to handle jetson (e94a040)
  • test_op_aliasing: Fix the renamed op (91c3c80)
  • tests: Fix broken elementwise tests (22ed944)

Features

  • support true_divide, floor_divide, max, min, rsub (a35fbf1)
  • //.github: Moving to python directly (ece114c)
  • //core/conversion: Adding a check to detect programs that will (a3d4144)
  • //core/lowering: Adding a new pass to handle new dim checks for (3d14cda)
  • //cpp/api/lib: New runtime only library (6644a9e)
  • //notebooks: Update notebooks container for 0.1.0 (a5851ff)
  • //py: [to_backend] adding device specification support for (6eeba1c), closes #286
  • aten::leaky_relu_: Adding alias for inplace leaky relu (bc53411)
  • aten::softmax: Adding support for any neg index (abc29a2)
  • aten::squeeze|aten::unsqueeze: adding BUILD files for new squeeze (9e0a1d7)
  • aten::sum: Allow for negative indices less than -1 (769bbc9)
  • aten::topk: Add a debug message noting that sorted is always true (81f1e9d)
  • aten::topk: Adding BUILD files for topk op (22e6a6b)
  • disable_tf32: Add a new API to disable TF32 (536983b)
  • interpolate: Adding support for .vec variants and overhauling test (0cda1cc)
  • interpolate: Addressing the linear, scale factor, align corners edge case (92e3818)
  • supportedops: Application to dump a list of supported operators (872d9a3)

BREAKING CHANGES

  • Version of bazel has been bumped to 4.0.0 Version of TensorRT has been bumped to 7.2.2.3

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

  • The device API has now changed. Device settings are configured via a device struct which encapsulates information on selected device ids and types.

Supported Operators in TRTorch v0.2.0

Operators Currently Supported Through Converters

  • aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
  • aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
  • aten::abs(Tensor self) -> (Tensor)
  • aten::acos(Tensor self) -> (Tensor)
  • aten::acosh(Tensor self) -> (Tensor)
  • aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
  • aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
  • aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
  • aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
  • aten::asin(Tensor self) -> (Tensor)
  • aten::asinh(Tensor self) -> (Tensor)
  • aten::atan(Tensor self) -> (Tensor)
  • aten::atanh(Tensor self) -> (Tensor)
  • aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
  • aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
  • aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
  • aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
  • aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
  • aten::ceil(Tensor self) -> (Tensor)
  • aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
  • aten::cos(Tensor self) -> (Tensor)
  • aten::cosh(Tensor self) -> (Tensor)
  • aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
  • aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
  • aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
  • aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
  • aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::erf(Tensor self) -> (Tensor)
  • aten::exp(Tensor self) -> (Tensor)
  • aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
  • aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
  • aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
  • aten::floor(Tensor self) -> (Tensor)
  • aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
  • aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
  • aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
  • aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
  • aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
  • aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
  • aten::log(Tensor self) -> (Tensor)
  • aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
  • aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::matmul(Tensor self, Tensor other) -> (Tensor)
  • aten::max(Tensor self) -> (Tensor)
  • aten::max.other(Tensor self, Tensor other) -> (Tensor)
  • aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
  • aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
  • aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
  • aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
  • aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
  • aten::min(Tensor self) -> (Tensor)
  • aten::min.other(Tensor self, Tensor other) -> (Tensor)
  • aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
  • aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
  • aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
  • aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::neg(Tensor self) -> (Tensor)
  • aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
  • aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
  • aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
  • aten::prelu(Tensor self, Tensor weight) -> (Tensor)
  • aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
  • aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
  • aten::reciprocal(Tensor self) -> (Tensor)
  • aten::relu(Tensor input) -> (Tensor)
  • aten::relu_(Tensor(a!) self) -> (Tensor(a!))
  • aten::repeat(Tensor self, int[] repeats) -> (Tensor)
  • aten::reshape(Tensor self, int[] shape) -> (Tensor)
  • aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
  • aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
  • aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
  • aten::sigmoid(Tensor input) -> (Tensor)
  • aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
  • aten::sin(Tensor self) -> (Tensor)
  • aten::sinh(Tensor self) -> (Tensor)
  • aten::slice.Tensor(Tensor(a) self, int dim=0, int start=0, int end=9223372036854775807, int step=1) -> (Tensor(a))
  • aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
  • aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
  • aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
  • aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
  • aten::sqrt(Tensor self) -> (Tensor)
  • aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
  • aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
  • aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
  • aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
  • aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
  • aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
  • aten::tan(Tensor self) -> (Tensor)
  • aten::tanh(Tensor input) -> (Tensor)
  • aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
  • aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
  • aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
  • aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
  • aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
  • aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
  • aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
  • aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
  • aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
  • aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
  • aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
  • aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
  • aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
  • aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
  • aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
  • aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
  • aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
  • trt::const(Tensor self) -> (Tensor)

Operators Currently Supported Through Evaluators

  • aten::Bool.float(float b) -> (bool)
  • aten::Bool.int(int a) -> (bool)
  • aten::Float.Scalar(Scalar a) -> float
  • aten::Float.bool(bool a) -> float
  • aten::Float.int(int a) -> float
  • aten::and(int a, int b) -> (bool)
  • aten::getitem.t(t list, int idx) -> (t(*))
  • aten::is(t1 self, t2 obj) -> bool
  • aten::isnot(t1 self, t2 obj) -> bool
  • aten::not(bool self) -> bool
  • aten::or(int a, int b) -> (bool)
  • aten::__round_to_zero_floordiv(int a, int b) -> (int)
  • aten::xor(int a, int b) -> (bool)
  • aten::add.float(float a, float b) -> (float)
  • aten::add.int(int a, int b) -> (int)
  • aten::add_.t(t self, t[] b) -> (t[])
  • aten::append.t(t self, t(c -> *) el) -> (t)
  • aten::dim(Tensor self) -> int
  • aten::div.float(float a, float b) -> (float)
  • aten::div.int(int a, int b) -> (float)
  • aten::eq.bool(bool a, bool b) -> (bool)
  • aten::eq.float(float a, float b) -> (bool)
  • aten::eq.float_int(float a, int b) -> (bool)
  • aten::eq.int(int a, int b) -> (bool)
  • aten::eq.int_float(int a, float b) -> (bool)
  • aten::floor.float(float a) -> (int)
  • aten::floordiv.float(float a, float b) -> (int)
  • aten::floordiv.int(int a, int b) -> (int)
  • aten::ge.bool(bool a, bool b) -> (bool)
  • aten::ge.float(float a, float b) -> (bool)
  • aten::ge.float_int(float a, int b) -> (bool)
  • aten::ge.int(int a, int b) -> (bool)
  • aten::ge.int_float(int a, float b) -> (bool)
  • aten::gt.bool(bool a, bool b) -> (bool)
  • aten::gt.float(float a, float b) -> (bool)
  • aten::gt.float_int(float a, int b) -> (bool)
  • aten::gt.int(int a, int b) -> (bool)
  • aten::gt.int_float(int a, float b) -> (bool)
  • aten::le.bool(bool a, bool b) -> (bool)
  • aten::le.float(float a, float b) -> (bool)
  • aten::le.float_int(float a, int b) -> (bool)
  • aten::le.int(int a, int b) -> (bool)
  • aten::le.int_float(int a, float b) -> (bool)
  • aten::len.t(t[] a) -> (int)
  • aten::lt.bool(bool a, bool b) -> (bool)
  • aten::lt.float(float a, float b) -> (bool)
  • aten::lt.float_int(float a, int b) -> (bool)
  • aten::lt.int(int a, int b) -> (bool)
  • aten::lt.int_float(int a, float b) -> (bool)
  • aten::mul.float(float a, float b) -> (float)
  • aten::mul.int(int a, int b) -> (int)
  • aten::ne.bool(bool a, bool b) -> (bool)
  • aten::ne.float(float a, float b) -> (bool)
  • aten::ne.float_int(float a, int b) -> (bool)
  • aten::ne.int(int a, int b) -> (bool)
  • aten::ne.int_float(int a, float b) -> (bool)
  • aten::neg.int(int a) -> (int)
  • aten::numel(Tensor self) -> int
  • aten::size(Tensor self) -> (int[])
  • aten::size.int(Tensor self, int dim) -> (int)
  • aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
  • aten::sub.float(float a, float b) -> (float)
  • aten::sub.int(int a, int b) -> (int)
  • prim::max.bool(bool a, bool b) -> (bool)
  • prim::max.float(float a, float b) -> (bool)
  • prim::max.float_int(float a, int b) -> (bool)
  • prim::max.int(int a, int b) -> (bool)
  • prim::max.int_float(int a, float b) -> (bool)
  • prim::max.self_int(int[] self) -> (int)
  • prim::min.bool(bool a, bool b) -> (bool)
  • prim::min.float(float a, float b) -> (bool)
  • prim::min.float_int(float a, int b) -> (bool)
  • prim::min.int(int a, int b) -> (bool)
  • prim::min.int_float(int a, float b) -> (bool)
  • prim::min.self_int(int[] self) -> (int)
  • prim::shape(Tensor a) -> (int[])

v0.1.0

3 years ago

TRTorch v0.1.0

Direct PyTorch integration via backend API, support for Ampere, support for simple branch and loop cases

This is the first "beta" release of TRTorch, introducing direct integration into PyTorch via the new Backend API. This release also contains an NGC based Dockerfile for users looking to use TRTorch on Ampere, using NGC's patched version of PyTorch. Note that compiled programs from older versions of TRTorch are not compatible with the TRTorch 0.1.0 runtime due to an ABI change. There are now example Jupyter notebooks which demonstrate various features of the compiler included in the documentation.

New Ops:

  • prelu
  • lstm_cell
  • power
  • conv3d
  • narrow

Dependencies:

  • Bazel 3.4.1
  • Libtorch 1.6.0
  • CUDA 10.2 (by default, CUDA 11 supported with compatible PyTorch build)
  • cuDNN 7.6.5 (by default, cuDNN 8 supported with compatible PyTorch build)
  • TensorRT 7.0.0 (by default, TensorRT 7.1 supported with compatible PyTorch build)

Changelog

v0.1.0 (2020-10-23)

Bug Fixes

  • added some fixes, trt/jit output still mismatches (723ac1d)

  • added test cases to explicitly check hidden/cell state outputs (d7c3164)

  • cleaned up logic, added case where bias doesn't exist for LSTM cell converter (a3e1093)

  • //core/conversion/evaluator: Custom to IValue that handles int[] (68c934a)

  • //docker: Workaround only shared libraries being available in (50c7eda)

  • //py: Fix long description section of setup.py (efd2099)

  • //tests: Add stride to complete tensors (af5d28e)

  • //tests/accuracy: Fix int8 accuracy test for new PTQ api (a53bea7)

  • //tests/core/converters/activations: Complete tensors in prelu test (0e90f78)

  • docsrc: Update docsrc container for bazel 3.4.1 (4eb53b5)

  • fix(Windows)!: Fix dependency resolution for local builds (858d8c3)

  • chore!: Update dependencies to PyTorch 1.6.0 (8eda27d)

  • chore!: Bumping version numbers to 0.1.0 (b84c90b)

  • refactor(//core)!: Introducing a binding convention that will address (5a105c6)

  • refactor!: Renaming extra info to compile spec to be more consistent (b8fa228)

Features

  • //core/conversion/converters: LSTMCell converter (8c61248)
  • //core/conversion/var: created ITensorOrFreeze() method, to replace functionality of Var::ITensor() (2ccf8d0)
  • //core/converters: Add power layer conversion support and minor README edits (a801506)
  • //core/lowering: Add functionalization pass to replace implace (90a9ed6), closes #30
  • //docker: Adding CUDA11 based container for Ampere support (970d775)
  • started working on lstm_cell converter (546d790)
  • //py: Initial compiliant implementation of the to_backend api for (59113cf)
  • //third_party/tensorrt: Add back TensorRT static lib in a cross (d3c2e7e)
  • aten::prelu: Basic prelu support (8bc4369)
  • aten::prelu: Implement the multi-channel version of prelu and (c066581)
  • finished logic for LSTM cell, now to test (a88cfaf)

BREAKING CHANGES

  • Users on Windows trying to use cuDNN 8 must manually configure third_party/cudnn/local/BUILD to use cuDNN 8.

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

  • Support for Python 3.5 is being dropped with this update

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

  • Version is being bumped to version 0.1.0a0 to target PyTorch 1.6.0

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

  • This changes the "ABI" of compiled TRTorch programs and the runtime and breaks backwards compatability between the runtime in 0.1.0+ and programs compiled pre-0.1.0

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

  • This changes the top level api for setting the specification for compilation, a simple find and replace should allow users to port forward

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

v0.0.3

3 years ago

TRTorch v0.0.3

aarch64 toolchain, Revised PTQ API, PyTorch 1.5.1, support for cuDNN 8.0, TensorRT 7.1 (with compatible PyTorch build)

This is the thrid alpha release of TRTorch. It bumps the target PyTorch version to 1.5.1 and introduces support for cuDNN 8.0 and TensorRT 7.1, however this is only supported in cases where PyTorch has been compiled with the same cuDNN version. This release also introduces formal support for aarch64, however pre-compiled binaries will not be available until we can deliver python packages for aarch64 for all supported version of python. Note some idiosyncrasies when it comes to working with PyTorch on aarch64, if you are using PyTorch compiled by NVIDIA for aarch64 the ABI version is CXX11 instead of the pre CXX11 ABI found on PyTorch on x86_64. When compiling the Python API for TRTorch add the --use-cxx11-abi flag to the command and do not use the --config=pre-cxx11-abi flag when building the C++ library (more instructions on native aarch64 compilation in the documentation). This release also introduces a breaking change to the C++ API where now in order to use logging or ptq APIs a separate header file must be included. Look at the implementation of trtorchc or ptq for example usage.

Dependencies:

  • Bazel 3.3.1
  • Libtorch 1.5.1
  • CUDA 10.2
  • cuDNN 7.6.5 (by default, cuDNN 8 supported with compatable PyTorch build)
  • TensorRT 7.0.0 (by default, TensorRT 7.1 supported with compatable PyTorch build)

Changelog

  • feat!: Lock bazel version (25f4371)
  • refactor(//cpp/api)!: Refactoring ptq to use includes but seperate from (d2f8a59)

Bug Fixes

  • //core: Do not compile hidden methods (6bd1a3f)
  • //core/conversion: Check for calibrator before setting int8 mode (3afd209)
  • //core/conversion: Supress unnecessary debug messages (2b23874)
  • //core/conversion/conversionctx: Check both tensor and eval maps (2d65ece)
  • //core/conversion/conversionctx: In the case of strict types and (3611778)
  • //core/conversion/converters: Fix plugin implementation for TRT 7 (94d6a0f)
  • //core/conversion/converters/impl: 1d case not working (f42562b)
  • //core/conversion/converters/impl: code works for interpolate2d/3d, doesn't work for 1d yet (e4cb117)
  • //core/conversion/converters/impl: Fix interpolate.cpp (b6942a2)
  • //core/conversion/converters/impl/element_wise: Fix broadcast (a9f33e4)
  • //core/conversion/evaluators: A couple fixes for evaluators (07ba980)
  • //core/lowering: Conv2D -> _convolution pass was triggering conv (ca2b5f9)
  • //cpp: Remove deprecated script namespace (d70760f)
  • //cpp/api: Better inital condition for the dataloader iterator to (8d22bdd)
  • //cpp/api: Remove unecessary destructor in ptq class (fc70267)
  • //cpp/api: set a default for calibrator (825be69)
  • //cpp/benchmark: reorder benchmark so FP16 bn issue in JIT doesnt (98527d2)
  • //cpp/ptq: Default version of the app should not resize images (de3cbc4)
  • //cpp/ptq: Enable FP16 kernels for INT8 applications (26709cc)
  • //cpp/ptq: Enable FP16 kernels for INT8 applications (e1c5416)
  • //cpp/ptq: remove some logging from ptq app (b989c7f)
  • //cpp/ptq: Tracing model in eval mode wrecks accuracy in Libtorch (54a24b3)
  • //cpp/trtorchc: Refactor trtorchc to use new C++ API (789e1be), closes #132
  • //cpp/trtorchc: Support building trtorchc with the pre_cxx11_abi (172d4d5)
  • //docs: add nojekyll file (2a02cd5)
  • //docs: fix version links (11555f7)
  • //notebooks: Fix WORKSPACE template file to reflect new build system layout (c8ea9b7)
  • //py: Build system issues (c1de126)
  • //py: Ignore generated version file (9e37dc1)
  • //py: Lib path incorrect (ff2b13c)
  • //tests: Duplicated tensorrt dep (5cd697e)
  • //third_party/tensorrt: Fix include dir for library headers (22ed5cf)
  • //third_party/tensorrt: Fix TensorRT paths for local x86 builds (73d804b)
  • aarch64: fixes and issues for aarch64 toolchain (9a6cccd)
  • aten::_convolution: out channels was passed in incorrectly for (ee727f8)
  • aten::_convolution: Pass dummy bias when there is no bias (b20671c)
  • aten::batch_norm: A new batch norm implementation that hopefully (6461872)
  • aten::batchnorm|aten::view: Fix converter implementation for (bf651dd)
  • aten::contiguous: Blacklist aten::contiguous from conversion (b718121)
  • aten::flatten: Fixes dynamic shape for flatten (4eb20bb)
  • fixed FP16 bug, fixed README, addressed some other PR comments (d9c0e84)
  • aten::neg: Fix a index bug in neg (1b2cde4)
  • aten::size, other aten evaluators: Removes aten::size converter in (c83447e)
  • BUILD: modified BUILD (a0d8586)
  • trying to resolve interpolate plugin problems (f0fefaa)
  • core/conversion/converters/impl: fix error message in interpolate (5ddab8b)
  • Address issues in PR (cd24f26)
  • bypass jeykll, also add PR template (a41c400)
  • first commit (4f1a9df)
  • Fix pre CXX11 ABI python builds and regen docs (42013ab)
  • fixed interpolate_plugin to handle dynamically sized inputs for adaptive_pool2d (7794c78)
  • need to fix gather converter (024a6b2)
  • plugin: trying to fix bug in plugin (cafcced)
  • pooling: fix the tests and the 1D pooling cases (a90e6db)
  • RunGraphEngineDynamic fixed to work with dynamically sized input tensors (6308190)

Features

  • //:libtrtorch: Ship trtorchc with the tarball (d647447)
  • //core/compiler: Multiple outputs supported now via tuple (f9af574)
  • //core/conversion: Adds the ability to evaluate loops (dcb1474)
  • //core/conversion: Compiler can now create graphs (9d1946e)
  • //core/conversion: Evaluation of static conditionals works now (6421f3d)
  • //core/conversion/conversionctx: Make op precision available at (78a1c61)
  • //core/conversion/converters: Throw a warning if a converter is (6cce381)
  • //core/conversion/converters/impl: added support for aten::stack (415378e)
  • //core/conversion/converters/impl: added support for linear1d and bilinear2d ops (4416d1f)
  • //core/conversion/converters/impl: added support for trilinear3d op (bb46e70)
  • //core/conversion/converters/impl: all function schemas for upsample_nearest (1b50484)
  • //core/conversion/converters/impl: logic implemented (7f12160)
  • //core/conversion/converters/impl: Round out pooling (7dc4af4)
  • //core/conversion/converters/impl: select converter, which adds support for aten::select.int (5151c34)
  • //core/conversion/converters/impl/plugins: Created interpolate plugin, works for mode='linear' (205ab99)
  • //core/conversion/converters/impl/plugins: interpolate plugin compiles now. time to test it. (58dbaef)
  • //core/conversion/converters/impl/plugins: template for interpolate plugin (7c91dec)
  • //core/conversion/converters/impl/shuffle: Implement aten::resize (353f2d2)
  • //core/conversion/evaluators: A whole bunch of new evaluators (7466b8a)
  • //core/conversion/evaluators: adding support for common evaluation (d351717)
  • //core/conversion/evaluators: Adds new applicability filters for (2cc3226)
  • //core/conversion/evaluators: Allow ITensors to be wrapped in (619e345)
  • //core/execution: Type checking for the executor, now is the (2dd1ba3)
  • //core/lowering: Add tuple lowering pass to remove tuples if (ce6cf75)
  • //core/lowering: Adds peephole optimization pass (0014b84)
  • //core/lowering: Fuse aten::addmm branches into a single (68f0317)
  • //core/lowering: New freeze model pass and new exception (4acc3fd)
  • //core/lowering: Remove aten::contiguous (630b615)
  • //core/quantization: skeleton of INT8 PTQ calibrator (dd443a6)
  • //core/util: New logging level for Graph Dumping (90c44b9)
  • //cpp/api: Adding max batch size setting (1b25542)
  • //cpp/api: Functional Dataloader based PTQ (f022dfe)
  • //cpp/api: Remove the extra includes in the API header (2f86f84)
  • //cpp/benchmark: Increased workspace size for benchmark, may help (8171f79)
  • //cpp/ptq: Add a feature to the dataset to use less than the full (5f36f47)
  • //cpp/ptq: do real benchmarking in the PTQ app instead of rough (65e71c7)
  • //cpp/ptq/training: Training recipe for VGG16 Classifier on (676bf56)
  • //cpp/trtorchc: Adding a new CLI application for TRTorch which (4f349a1)
  • //cpp/trtorchexec: TRTorch exec now supports checking correctness (80808b7)
  • //lowering: centralize lowering and try to use PyTorch Conv2DBN folding (fad4a10)
  • //py: add the option to build python package with CXX11 abi (fdbd7d2)
  • //py: API now produces valid engines that are consumable by (72bc1f7)
  • //py: Inital introduction of the Python API (7088245)
  • //py: Manylinux container and build system for multiple python (639c2a3)
  • //py: register trtorch with torch op library to support (736e914)
  • //py: setup.py now searches for bazel executable (737fe5c)
  • //py: Working portable package (482ef2c)
  • added adaptive_avg_pool2d plugin, and added test for it (fa227b0)
  • //tests: New optional accuracy tests to check INT8 and FP16 (df74136)
  • //toolchains: Adding platform targets for supported platforms (7889ebd)
  • /cpp/api: Working INT8 Calibrator, also resolves #41 (5c0d737)
  • aten::add_t: aten::add_.t evaluator that adds lists together (c4c3ce1)
  • aten::avg_pool2d: Implement Average Pooling 2D (0c39519)
  • aten::cat: Implements aten::cat and completes support for SSD (c2d3a6e)
  • aten::conv_transpose: Add support for dilated and group (48b950a)
  • aten::dropout_: Remove inplace dropout (7aa57c3)
  • aten::flatten: Adds a converter for aten flatten since MM is the (d945eb9)
  • addressed some PR comments, refactored code (141763f)
  • aten::matmul|aten::addmm: Adds support for aten::matmul and (c5b6202)
  • aten::permute: Implement permute support (c7d6b49)
  • aten::size [static]: Implement a aten::size converter for static input size (0548540)
  • started to work on add_.t evaluator, doesn't work yet (f216d3f)
  • aten::to: Remove remaining typecast operators (should be a very (0f63ffa)
  • aten::view: Adds support for ATen view also fixes some tests (24b422e)
  • aten::zeros: Implement aten::zeros evaluator (670817c)
  • conv2d_to_convolution: A pass to map aten::conv2d to _convolution (2c5c0d5)
  • prim::NumToTensor: Implement evaluator for NumToTensor (60df888)
  • tests/util: added RunGraphEngineDynamic to handle dynamic input sized tensors (9458f21)
  • trt_util: from Naren, added unpadDims tool (164a1a6)
  • support for adaptive_avg_pool2d plugin (52be580)
  • Support non cxx11-abi builds for use in python api (83e0ed6)

BREAKING CHANGES

  • Bazel version is now locked to Bazel 3.3.1 and will be bumped manually from now on. Builds will fail on all other versions since now bazel will check the version before it compiles.

Documentation on how to install bazel is added as well to support aarch64 until bazel releases binaries for the platform (which is soon)

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

  • To use ptq you now need to include trtorch/ptq.h in addition to trtorch/trtorch.h, similarly for logging commands you need to include trtorch/logging.h

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

v0.0.2

4 years ago

TRTorch v0.0.2

Python API & PyTorch 1.5.0 Support

  • This is a second alpha release of TRTorch. It bumps support for PyTorch to 1.5.0 and introduces a Python distribution for TRTorch.
  • Also now includes full documentation https://nvidia.github.io/TRTorch
  • Adds support for Post Training Quantization in C++

Dependencies

  • Libtorch 1.5.0
  • CUDA 10.2
  • cuDNN 7.6.5
  • TensorRT 7.0.0

Changelog

Bug Fixes

  • //core/conversion: Check for calibrator before setting int8 mode (3afd209)
  • //core/conversion/conversionctx: Check both tensor and eval maps (2d65ece)
  • //core/conversion/converters/impl/element_wise: Fix broadcast (a9f33e4)
  • //cpp: Remove deprecated script namespace (d70760f)
  • //cpp/api: Better inital condition for the dataloader iterator to (8d22bdd)
  • //cpp/api: Remove unecessary destructor in ptq class (fc70267)
  • //cpp/api: set a default for calibrator (825be69)
  • //cpp/ptq: remove some logging from ptq app (b989c7f)
  • Address issues in PR (cd24f26)
  • //cpp/ptq: Tracing model in eval mode wrecks accuracy in Libtorch (54a24b3)
  • //docs: add nojekyll file (2a02cd5)
  • //docs: fix version links (11555f7)
  • //py: Build system issues (c1de126)
  • //py: Ignore generated version file (9e37dc1)
  • bypass jeykll, also add PR template (a41c400)

Features

  • //core/conversion/conversionctx: Make op precision available at (78a1c61)
  • //core/conversion/converters/impl/shuffle: Implement aten::resize (353f2d2)
  • //core/execution: Type checking for the executor, now is the (2dd1ba3)
  • //core/lowering: New freeze model pass and new exception (4acc3fd)
  • //core/quantization: skeleton of INT8 PTQ calibrator (dd443a6)
  • //core/util: New logging level for Graph Dumping (90c44b9)
  • //cpp/api: Adding max batch size setting (1b25542)
  • //cpp/api: Functional Dataloader based PTQ (f022dfe)
  • //cpp/api: Remove the extra includes in the API header (2f86f84)
  • //cpp/ptq: Add a feature to the dataset to use less than the full (5f36f47)
  • //cpp/ptq/training: Training recipe for VGG16 Classifier on (676bf56)
  • //lowering: centralize lowering and try to use PyTorch Conv2DBN folding (fad4a10)
  • //py: API now produces valid engines that are consumable by (72bc1f7)
  • //py: Inital introduction of the Python API (7088245)
  • //py: Manylinux container and build system for multiple python (639c2a3)
  • //py: Working portable package (482ef2c)
  • //tests: New optional accuracy tests to check INT8 and FP16 (df74136)
  • //cpp/api: Working INT8 Calibrator, also resolves #41 (5c0d737)
  • aten::flatten: Adds a converter for aten flatten since MM is the (d945eb9)
  • aten::matmul|aten::addmm: Adds support for aten::matmul and (c5b6202)
  • Support non cxx11-abi builds for use in python api (83e0ed6)
  • aten::size [static]: Implement a aten::size converter for static input size (0548540)
  • conv2d_to_convolution: A pass to map aten::conv2d to _convolution (2c5c0d5)

v0.0.1

4 years ago

TRTorch v0.0.1

Initial Release

  • This is the initial alpha release of TRTorch. Supports basic compilation of TorchScript Modules, networks similar to ResNet50, Mobilenet, simple feed forward networks.
  • C++ Based API
    • Can save converted models to PLAN file for use in TensorRT Apps
    • Compile module and continue running with JIT interpreter accelerated by TensorRT
  • Supports FP32 and FP16 execution
  • Sample application to show how to use the compiler

Dependencies

  • Libtorch 1.4.0
  • CUDA 10.1
  • cuDNN 7.6
  • TensorRT 6.0.1