TRTorch Versions Save

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

v2.2.0

3 months ago

Dynamo Frontend for Torch-TensorRT, PyTorch 2.2, CUDA 12.1, TensorRT 8.6

Torch-TensorRT 2.2.0 targets PyTorch 2.2, CUDA 12.1 (builds for CUDA 11.8 are available via the PyTorch package index - https://download.pytorch.org/whl/cu118) and TensorRT 8.6. This release is the second major release of Torch-TensorRT as the default frontend has changed from TorchScript to Dynamo allowing for users to more easily control and customize the compiler in Python.

The dynamo frontend can support both JIT workflows through torch.compile and AOT workflows through torch.export + torch_tensorrt.compile. It targets the Core ATen Opset (https://pytorch.org/docs/stable/torch.compiler_ir.html#core-aten-ir) and currently has 82% coverage. Just like in Torchscript graphs will be partitioned based on the ability to map operators to TensorRT in addition to any graph surgery done in Dynamo.

Output Format

Through the Dynamo frontend, different output formats can be selected for AOT workflows via the output_format kwarg. The choices are torchscript where the resulting compiled module will be traced with torch.jit.trace, suitable for Pythonless deployments, exported_program a new serializable format for PyTorch models or finally if you would like to run further graph transformations on the resultant model, graph_module will return a torch.fx.GraphModule.

Multi-GPU Safety

To address a long standing source of overhead, single GPU systems will now operate without typical required device checks. This check can be re-added when multiple GPUs are available to the host process using torch_tensorrt.runtime.set_multi_device_safe_mode

# Enables Multi Device Safe Mode
torch_tensorrt.runtime.set_multi_device_safe_mode(True)

# Disables Multi Device Safe Mode [Default Behavior]
torch_tensorrt.runtime.set_multi_device_safe_mode(False)

# Enables Multi Device Safe Mode, then resets the safe mode to its prior setting
with torch_tensorrt.runtime.set_multi_device_safe_mode(True):
    ...

More information can be found here: https://pytorch.org/TensorRT/user_guide/runtime.html

Capability Validators

In the Dynamo frontend, tests can be written and associated with converters to dynamically enable or disable them based on conditions in the target graph.

For example, the convolution converter in dynamo only supports 1D, 2D, and 3D convolution. We can therefore create a lambda which given a convolution FX node can determine if the convolution is supported:

@dynamo_tensorrt_converter(
    torch.ops.aten.convolution.default, 
     capability_validator=lambda conv_node: conv_node.args[7] in ([0], [0, 0], [0, 0, 0])
)  # type: ignore[misc]
def aten_ops_convolution(
    ctx: ConversionContext,
    target: Target,
    args: Tuple[Argument, ...],
    kwargs: Dict[str, Argument],
    name: str,
) -> Union[TRTTensor, Sequence[TRTTensor]]:

In such a case where the Node is not supported, the node will be partitioned out and run in PyTorch. All capability validators are run prior to partitioning, after the lowering phase.

More information on writing converters for the Dynamo frontend can be found here: https://pytorch.org/TensorRT/contributors/dynamo_converters.html

Breaking Changes

Dynamo (torch.export) is now the default frontend for Torch-TensorRT. The TorchScript and FX frontends are now in maintenance mode. Therefore any torch.nn.Modules or torch.fx.GraphModules provided to torch_tensorrt.compile will by default be exported using torch.export then compiled. This default can be overridden by setting the ir=[torchscript|fx] kwarg. Any bugs reported will first be attempted to be resolved in the dynamo stack before attempting other frontends however pull requests for additional functionally in the TorchScript and FX frontends from the community will still be accepted.

What's Changed

chore: Update Torch and Torch-TRT versions and docs on main by @gs-olive in https://github.com/pytorch/TensorRT/pull/1784
fix: Repair invalid schema arising from lowering pass by @gs-olive in https://github.com/pytorch/TensorRT/pull/1786
fix: Allow full model compilation with collection inputs (input_signature) by @gs-olive in https://github.com/pytorch/TensorRT/pull/1656
feat(//core/conversion): Add support for aten::size with dynamic shaped models for Torchscript backend. by @peri044 in https://github.com/pytorch/TensorRT/pull/1647
feat: add support for aten::baddbmm by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1806
[feat] Add dynamic conversion path to aten::mul evaluator by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1710
[fix] aten::stack with dynamic inputs by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1804
fix undefined attr issue by @bowang007 in https://github.com/pytorch/TensorRT/pull/1783
fix: Out-Of-Bounds bug in Unsqueeze by @gs-olive in https://github.com/pytorch/TensorRT/pull/1820
feat: Upgrade Docker build to use custom TRT + CUDNN by @gs-olive in https://github.com/pytorch/TensorRT/pull/1805
fix: include str ivalue type conversion by @bowang007 in https://github.com/pytorch/TensorRT/pull/1785
fix: dependency order of inserted long input casts by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1833
feat: Add ts converter support for aten::all.dim by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1840
fix: Error caused by invalid binding name in TRTEngine.to_str() method by @gs-olive in https://github.com/pytorch/TensorRT/pull/1846
fix: Implement aten.mean.default and aten.mean.dim converters by @gs-olive in https://github.com/pytorch/TensorRT/pull/1810
feat: Add converter for aten::log2 by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1866
feat: Add support for aten::where with scalar other by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1855
feat: Add converter support for logical_and by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1856
feat: Refactor FX APIs under dynamo namespace for parity with TS APIs by @peri044 in https://github.com/pytorch/TensorRT/pull/1807
fix: Add version checking for torch._dynamo import in __init__ by @gs-olive in https://github.com/pytorch/TensorRT/pull/1881
fix: Improve Docker build robustness, add validation by @gs-olive in https://github.com/pytorch/TensorRT/pull/1873
fix: Improve input weight handling to acc_ops convolution layers in FX by @gs-olive in https://github.com/pytorch/TensorRT/pull/1886
fix: Upgrade main to TRT 8.6, CUDA 11.8, CuDNN 8.8, Torch Dev by @gs-olive in https://github.com/pytorch/TensorRT/pull/1852
feat: Wrap dynamic size handling in a compilation flag by @peri044 in https://github.com/pytorch/TensorRT/pull/1851
fix: Add torchvision legacy CI parameter by @gs-olive in https://github.com/pytorch/TensorRT/pull/1918
Sync fb internal change to OSS by @wushirong in https://github.com/pytorch/TensorRT/pull/1892
fix: Reorganize Dynamo directory + backends by @gs-olive in https://github.com/pytorch/TensorRT/pull/1928
fix: Improve partitioning + lowering systems in torch.compile path by @gs-olive in https://github.com/pytorch/TensorRT/pull/1879
fix: Upgrade TRT to 8.6.1, parallelize FX tests in CI by @gs-olive in https://github.com/pytorch/TensorRT/pull/1930
feat: Add issue template for Story by @gs-olive in https://github.com/pytorch/TensorRT/pull/1936
feat: support type promotion in aten::cat converter by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1911
Reorg for converters in (FX Converter Refactor [1/N]) by @narendasan in https://github.com/pytorch/TensorRT/pull/1867
fix: Add support for default dimension in aten.cat by @gs-olive in https://github.com/pytorch/TensorRT/pull/1863
Relaxing glob pattern for CUDA12 by @borisfom in https://github.com/pytorch/TensorRT/pull/1950
refactor: Centralizing sigmoid implementation (FX Converter Refactor [2/N]) <Target: converter_reorg_proto> by @narendasan in https://github.com/pytorch/TensorRT/pull/1868
fix: Address .numpy() issue on fake tensors by @gs-olive in https://github.com/pytorch/TensorRT/pull/1949
feat: Add support for passing through build issues in Dynamo compile by @gs-olive in https://github.com/pytorch/TensorRT/pull/1952
fix: int/int=float division by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1957
fix: Support dims < -1 in aten::stack converter by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1947
fix: Resolve issue in isInputDynamic with mixed static/dynamic shapes by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1883
DLFW changes by @apbose in https://github.com/pytorch/TensorRT/pull/1878
feat: Add converter for aten::isfinite by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1841
Reorg for converters in hardtanh(FX Converter Refactor [5/N]) <Target: converter_reorg_proto> by @apbose in https://github.com/pytorch/TensorRT/pull/1901
fix/feat: Add lowering pass to resolve most aten::Int.Tensor uses by @gs-olive in https://github.com/pytorch/TensorRT/pull/1937
fix: Add decomposition for aten.addmm by @gs-olive in https://github.com/pytorch/TensorRT/pull/1953
Reorg for converters tanh (FX Converter Refactor [4/N]) <Target: converter_reorg_proto> by @apbose in https://github.com/pytorch/TensorRT/pull/1900
Reorg for converters leaky_relu (FX Converter Refactor [6/N]) <Target: converter_reorg_proto> by @apbose in https://github.com/pytorch/TensorRT/pull/1902
Upstream 3 features to fx_ts_compat: MS, VC, Optimization Level by @wu6u3tw in https://github.com/pytorch/TensorRT/pull/1935
fix: Add lowering pass to remove output repacking in convert_method_to_trt_engine calls by @gs-olive in https://github.com/pytorch/TensorRT/pull/1945
Fixing aten::slice invalid schema and implementing aten::list evaluator by @apbose in https://github.com/pytorch/TensorRT/pull/1695
fix: Rewrite constant_pad_nd to use a single slice layer for performance by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1970
Adding converter aten::chunk in torchscript by @apbose in https://github.com/pytorch/TensorRT/pull/1802
fix: Repair index used to access tensor bindings by @gs-olive in https://github.com/pytorch/TensorRT/pull/1998
Reorg for converters elu and selu (FX Converter Refactor [7/N]) <Target: converter_reorg_proto> by @apbose in https://github.com/pytorch/TensorRT/pull/1903
chore(deps): bump transformers from 4.17.0 to 4.30.0 in /tests/modules by @dependabot in https://github.com/pytorch/TensorRT/pull/2013
fix: Repair input range on BERT inputs for CI by @gs-olive in https://github.com/pytorch/TensorRT/pull/2017
fix: Refactor assertions in E2E tests for Dynamo by @gs-olive in https://github.com/pytorch/TensorRT/pull/2001
chore/fix: Update TRTInterpreter impl in Dynamo compile [1 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/2002
fix: Repair flaky TopK core test by @gs-olive in https://github.com/pytorch/TensorRT/pull/2022
feat: Add options kwargs for Torch compile [3 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/2005
feat: Add support for output data types in TRTInterpreter [2 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/2004
chore: Upgrade Torch nightly to 2.1.0.dev20230605 [4 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/1975
fix: Repair output binding indexing scheme in TRT by @gs-olive in https://github.com/pytorch/TensorRT/pull/2054
fix: Improve logging and kwarg passing in Dynamo by @gs-olive in https://github.com/pytorch/TensorRT/pull/2052
fix: Add support for fake tensors by @gs-olive in https://github.com/pytorch/TensorRT/pull/1955
fix: Repair argument passing in both Dynamo paths by @gs-olive in https://github.com/pytorch/TensorRT/pull/1997
minor fix: Dynamo CI fix due to merge issue by @gs-olive in https://github.com/pytorch/TensorRT/pull/2067
feat: Module-Acceleration in Dynamo [5 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/1979
fix/feat: Move convolution core to impl + add feature (FX converter refactor) by @gs-olive in https://github.com/pytorch/TensorRT/pull/1972
chore: Upgrade to CUDA 12.1 by @gs-olive in https://github.com/pytorch/TensorRT/pull/2020
fix: Repair null bindings issue in TRT Engines by @gs-olive in https://github.com/pytorch/TensorRT/pull/2080
fix: Add python3 symlink in final container by @gs-olive in https://github.com/pytorch/TensorRT/pull/2085
feat: Add support for TorchTensorRTModule in Dynamo [1 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/2003
fix: Repair import error for legacy TS testing by @gs-olive in https://github.com/pytorch/TensorRT/pull/2091
chore: Update Torch to Jul 3 Nightly by @gs-olive in https://github.com/pytorch/TensorRT/pull/2099
fix: Repair graph naming for FX legacy suite by @gs-olive in https://github.com/pytorch/TensorRT/pull/2111
DLFW changes by @apbose in https://github.com/pytorch/TensorRT/pull/2109
fix: Update CI GPU Class by @gs-olive in https://github.com/pytorch/TensorRT/pull/2116
fix: Replace EliminateExceptions lowering pass by @gs-olive in https://github.com/pytorch/TensorRT/pull/1859
chore: Improve error propagation for torch compile by @gs-olive in https://github.com/pytorch/TensorRT/pull/2106
fix: Repair version checking system for Torch by @gs-olive in https://github.com/pytorch/TensorRT/pull/2118
feat: Dynamo refactor by @peri044 in https://github.com/pytorch/TensorRT/pull/2104
feat: Set default ir to dynamo export by @peri044 in https://github.com/pytorch/TensorRT/pull/2029
fix: TRTInterpreter output lacks return value by @gs-olive in https://github.com/pytorch/TensorRT/pull/2114
fix/feat: Add Dynamo-only converter registry by @gs-olive in https://github.com/pytorch/TensorRT/pull/1944
fix: Add support for truncate_long_and_double in Dynamo [8 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/1983
docs: Update readme to include TRT as a seperate install dep. by @narendasan in https://github.com/pytorch/TensorRT/pull/2137
fix: Move all aten PRs to Dynamo converter registry by @gs-olive in https://github.com/pytorch/TensorRT/pull/2070
Change python build system to be PEP517 compatible by @narendasan in https://github.com/pytorch/TensorRT/pull/2056
chore: fix the docgen job by @narendasan in https://github.com/pytorch/TensorRT/pull/2158
feat: Implement dynamic shape support for floordiv, NumToTensor, layer_norm by @peri044 in https://github.com/pytorch/TensorRT/pull/2006
examples: Add example usage scripts for torch_tensorrt.dynamo.compile path [1.1 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/1966
[feat] TS: Add support for dynamic select and masked_fill by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/2115
feat: Added support for aten::unflatten converter by @andi4191 in https://github.com/pytorch/TensorRT/pull/2097
feat: Added a variant for aten::fake_quant_per_tensor by @andi4191 in https://github.com/pytorch/TensorRT/pull/2107
ci: Add automatic GHA job to build + push Docker Container on main by @gs-olive in https://github.com/pytorch/TensorRT/pull/2129
chore: Add pyyaml import to GHA Docker job by @gs-olive in https://github.com/pytorch/TensorRT/pull/2170
feat(torch_tensorrt.dynamo.tools): Tool to calculate coverage of PyTorch by @narendasan in https://github.com/pytorch/TensorRT/pull/2166
chore: Add parallelism to Dynamo tests by @gs-olive in https://github.com/pytorch/TensorRT/pull/2165
feat: Add support for dynamic zeros_like and ones_like by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1847
feat: Added support for aten::tile converter by @andi4191 in https://github.com/pytorch/TensorRT/pull/2105
Improve Python tooling by @narendasan in https://github.com/pytorch/TensorRT/pull/2126
Py38 compatibility by @narendasan in https://github.com/pytorch/TensorRT/pull/2189
abandoned create_plugin() function by @zewenli98 in https://github.com/pytorch/TensorRT/pull/2146
feat: Improve Dynamo partitioning System Performance on Large Models by @gs-olive in https://github.com/pytorch/TensorRT/pull/2175
feat: Improve Logging in Dynamo by @peri044 in https://github.com/pytorch/TensorRT/pull/2194
feat: Add ExportedProgram as an IR by @peri044 in https://github.com/pytorch/TensorRT/pull/2191
feat: Improve layer naming by @peri044 in https://github.com/pytorch/TensorRT/pull/2162
fix: Update aten.embedding to reflect schema by @gs-olive in https://github.com/pytorch/TensorRT/pull/2182
feat: Add _to_copy, operator.get and clone ATen converters by @gs-olive in https://github.com/pytorch/TensorRT/pull/2161
fix: Repair broadcasting utility for aten.where by @gs-olive in https://github.com/pytorch/TensorRT/pull/2228
chore: Fix Logging in torch_compile path by @peri044 in https://github.com/pytorch/TensorRT/pull/2238
feat: Add Selective ATen decompositions by @gs-olive in https://github.com/pytorch/TensorRT/pull/2173
Type mismatch for dynamo aten::where converter by @apbose in https://github.com/pytorch/TensorRT/pull/2198
fix: Set dynamic=False in torch.compile call by @gs-olive in https://github.com/pytorch/TensorRT/pull/2240
fix: Allow rank differences in aten.expand by @gs-olive in https://github.com/pytorch/TensorRT/pull/2234
fix: Address runtimes with 0D inputs by @gs-olive in https://github.com/pytorch/TensorRT/pull/2188
feat: support many unary dynamo converters by @zewenli98 in https://github.com/pytorch/TensorRT/pull/2246
fix: Decrease Docker container size by 20% by @gs-olive in https://github.com/pytorch/TensorRT/pull/2257
feat: Add support for device compilation setting by @gs-olive in https://github.com/pytorch/TensorRT/pull/2190
fix: Legacy CI pip installation by @gs-olive in https://github.com/pytorch/TensorRT/pull/2239
feat: support amax dynamo converter by @zewenli98 in https://github.com/pytorch/TensorRT/pull/2241
feat: Exempt default softmax from decomposition by @gs-olive in https://github.com/pytorch/TensorRT/pull/2268
fix: Reorganize Dynamo testing directories by @gs-olive in https://github.com/pytorch/TensorRT/pull/2255
feat: Add support for require_full_compilation in Dynamo by @gs-olive in https://github.com/pytorch/TensorRT/pull/2138
fix: Unify layers in Docker Container Cleanup by @gs-olive in https://github.com/pytorch/TensorRT/pull/2275
infra: testing out GHA CI by @narendasan in https://github.com/pytorch/TensorRT/pull/2073
feat: support conv dynamo converter by @zewenli98 in https://github.com/pytorch/TensorRT/pull/2252
Enabling var_mean decomposition by @apbose in https://github.com/pytorch/TensorRT/pull/2273
fix: add an arg in matmul by @zewenli98 in https://github.com/pytorch/TensorRT/pull/2279
feat: support activation dynamo converters by @zewenli98 in https://github.com/pytorch/TensorRT/pull/2254
feat: support torch.ops.aten.sum.(default and dim_IntList) dynamo converter by @zewenli98 in https://github.com/pytorch/TensorRT/pull/2278
tools(opset_coverage): Map default ops to unoverloaded ops by @narendasan in https://github.com/pytorch/TensorRT/pull/2292
add initial support for torch.ops.aten.neg.default converter by @bowang007 in https://github.com/pytorch/TensorRT/pull/2147
fix: Torch Upgrade to 2.2.0.dev by @gs-olive in https://github.com/pytorch/TensorRT/pull/2298
chore: enabling TS FE testing by @narendasan in https://github.com/pytorch/TensorRT/pull/2283
Update _Input.py by @phyboy in https://github.com/pytorch/TensorRT/pull/2293
feat: support many elementwise dynamo converters by @zewenli98 in https://github.com/pytorch/TensorRT/pull/2263
feat: support linear (fully connected layer) dynamo converter by @zewenli98 in https://github.com/pytorch/TensorRT/pull/2253
WAR: Disabling ViT tests until exporting with py311 is fixed by @narendasan in https://github.com/pytorch/TensorRT/pull/2305
neg converter correction by @apbose in https://github.com/pytorch/TensorRT/pull/2307
feat: Add preliminary support for freezing tensors in Dynamo by @gs-olive in https://github.com/pytorch/TensorRT/pull/2128
fix: Wrap import of ConstantFold utilities by @gs-olive in https://github.com/pytorch/TensorRT/pull/2312
fix: Move aten.neg test case by @gs-olive in https://github.com/pytorch/TensorRT/pull/2310
small fix: Packaging version switch by @gs-olive in https://github.com/pytorch/TensorRT/pull/2315
fix: Register tensorrt backend name by @gs-olive in https://github.com/pytorch/TensorRT/pull/2311
feat: Transition export workflows to use torch._export APIs by @peri044 in https://github.com/pytorch/TensorRT/pull/2195
fix: Add special cases for clone and to_copy where input of graph is output by @gs-olive in https://github.com/pytorch/TensorRT/pull/2265
fix: Raise error when registering Packet-keyed converter by @gs-olive in https://github.com/pytorch/TensorRT/pull/2285
FX converter documentation by @apbose in https://github.com/pytorch/TensorRT/pull/2039
aten::split converter by @apbose in https://github.com/pytorch/TensorRT/pull/2232
DLFW changes by @apbose in https://github.com/pytorch/TensorRT/pull/2281
feat: Add ATen lowering pass system by @gs-olive in https://github.com/pytorch/TensorRT/pull/2280
fix: Support non -1 end idx and <0 start idx in aten::flatten converter by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/2321
Dynamo converter support for torch.ops.aten.erf.default op by @bowang007 in https://github.com/pytorch/TensorRT/pull/2164
fix: Update Torchvision version to address dependency resolution issue by @gs-olive in https://github.com/pytorch/TensorRT/pull/2339
fix: Remove input aliasing of builtin ops by @gs-olive in https://github.com/pytorch/TensorRT/pull/2276
fix: Allow low rank inputs in Python Runtime by @gs-olive in https://github.com/pytorch/TensorRT/pull/2282
fix: Address multi-GPU issue in engine deserialize by @gs-olive in https://github.com/pytorch/TensorRT/pull/2325
feat: support deconv (1d, 2d, and Nd) dynamo converter by @zewenli98 in https://github.com/pytorch/TensorRT/pull/2337
Update usage of PyTorch's custom op API by @zou3519 in https://github.com/pytorch/TensorRT/pull/2193
feat: support bmm converter in dynamo by @bowang007 in https://github.com/pytorch/TensorRT/pull/2248
feat: support 1D, 2D, and 3D avg and max pooling dynamo converters by @zewenli98 in https://github.com/pytorch/TensorRT/pull/2317
fix: Add support for negative dimensions in reduce by @gs-olive in https://github.com/pytorch/TensorRT/pull/2347
feat: Add tensor type enforcement for converters by @gs-olive in https://github.com/pytorch/TensorRT/pull/2324
fix: Issue in TS dimension-squeeze utility by @gs-olive in https://github.com/pytorch/TensorRT/pull/2336
perf: Add lowering passes to improve TRT runtime on SD by @gs-olive in https://github.com/pytorch/TensorRT/pull/2351
feat: Implement Dynamic shapes + fallback support for export path by @peri044 in https://github.com/pytorch/TensorRT/pull/2271
feat: Add maxpool lowering passes and experimental folder in Dynamo by @gs-olive in https://github.com/pytorch/TensorRT/pull/2358
Aten::Index converter by @apbose in https://github.com/pytorch/TensorRT/pull/2277
feat: Implement support for exporting Torch-TensorRT compiled graphs using torch.export serde APIs by @peri044 in https://github.com/pytorch/TensorRT/pull/2249
chore: Switch converter tests to generate standalone ops using fx.symbolic_trace by @peri044 in https://github.com/pytorch/TensorRT/pull/2361
fix/feat: Add and repair multiple converters for SD + other models by @gs-olive in https://github.com/pytorch/TensorRT/pull/2353
feat: support flatten and reshape via shuffle_layer by @zewenli98 in https://github.com/pytorch/TensorRT/pull/2354
feat: support prod, max, min, and mean via reduce layer by @zewenli98 in https://github.com/pytorch/TensorRT/pull/2355
minor fix: Update get_ir prefixes by @gs-olive in https://github.com/pytorch/TensorRT/pull/2369
Dynamo converter cat by @apbose in https://github.com/pytorch/TensorRT/pull/2343
fix: Repair issue in Torch Constant Folder by @gs-olive in https://github.com/pytorch/TensorRT/pull/2375
fix: Repair aten.where with Numpy + Broadcast by @gs-olive in https://github.com/pytorch/TensorRT/pull/2372
Cherry-pick changes from main into release/2.1 by @narendasan in https://github.com/pytorch/TensorRT/pull/2302
cherry-pick: Key converters and documentation to release/2.1 by @gs-olive in https://github.com/pytorch/TensorRT/pull/2387
cherry-pick: Decompostion fix and documentation updates by @gs-olive in https://github.com/pytorch/TensorRT/pull/2391
feat: Wrap ExportedPrograms transformations with an API, allow dynamo.compile to accept graphmodules. by @peri044 in https://github.com/pytorch/TensorRT/pull/2388
Cherry-pick : Add documentation for dynamo.compile backend (#2389) by @peri044 in https://github.com/pytorch/TensorRT/pull/2416
cherry-pick: Transformer XL fix to release/2.1 by @gs-olive in https://github.com/pytorch/TensorRT/pull/2414
cherry-pick/fix: Performance benchmarking fixes and Torch version fix by @gs-olive in https://github.com/pytorch/TensorRT/pull/2433
Cherry pick 2420 to release/2.1 by @peri044 in https://github.com/pytorch/TensorRT/pull/2425
cherry-pick/minor fix: Parse out slashes in Docker container name (#2437) by @gs-olive in https://github.com/pytorch/TensorRT/pull/2438
chore: fix docs for export [release/2.1] by @peri044 in https://github.com/pytorch/TensorRT/pull/2448
chore: add additional native BN converter (cherry-pick of #2446) by @peri044 in https://github.com/pytorch/TensorRT/pull/2452
cherry-pick/fix: Docs rendering on PyTorch site (#2440) by @gs-olive in https://github.com/pytorch/TensorRT/pull/2441
minor fix: Update Benchmark values (#2453) by @gs-olive in https://github.com/pytorch/TensorRT/pull/2454
cherry-pick: Wrap perf benchmarks with no_grad (#2466) by @gs-olive in https://github.com/pytorch/TensorRT/pull/2470
chore: Upgrade release to Torch 2.1.1 by @gs-olive in https://github.com/pytorch/TensorRT/pull/2472
fix: Naming issue in opset coverage tool by @gs-olive in https://github.com/pytorch/TensorRT/pull/2477
cherry-pick: View and slice bugfixes by @gs-olive in https://github.com/pytorch/TensorRT/pull/2500
cherry-pick: Perf + Bugfix PRs by @gs-olive in https://github.com/pytorch/TensorRT/pull/2513
fix: release/2.1 CI Repair by @gs-olive in https://github.com/pytorch/TensorRT/pull/2528
cherry-pick: Safe mode and Build Arguments PRs by @gs-olive in https://github.com/pytorch/TensorRT/pull/2521
cherry-pick: Port most changes from main by @gs-olive in https://github.com/pytorch/TensorRT/pull/2574
chore: clean up AWS credentials PR changes by @peri044 in https://github.com/pytorch/TensorRT/pull/2608
chore: Set return type of compilation to ExportedProgram [release/2.2] by @peri044 in https://github.com/pytorch/TensorRT/pull/2607
cherry-pick: Docker fixes release/2.2 by @gs-olive in https://github.com/pytorch/TensorRT/pull/2628
fix: Upgrade versions for Docker build rel 2.2 by @gs-olive in https://github.com/pytorch/TensorRT/pull/2630
cherry-pick: Remove keyserver fetch from Dockerfile (#2639) by @gs-olive in https://github.com/pytorch/TensorRT/pull/2640
cherry-pick: Remove extraneous argument in compile (#2635) by @gs-olive in https://github.com/pytorch/TensorRT/pull/2638
cherry-pick: Attention converter and linting fixes by @gs-olive in https://github.com/pytorch/TensorRT/pull/2641
small fix: Index validator enable int64 (#2642) by @gs-olive in https://github.com/pytorch/TensorRT/pull/2643

New Contributors

@wushirong made their first contribution in https://github.com/pytorch/TensorRT/pull/1892
@wu6u3tw made their first contribution in https://github.com/pytorch/TensorRT/pull/1935
@phyboy made their first contribution in https://github.com/pytorch/TensorRT/pull/2293
@zou3519 made their first contribution in https://github.com/pytorch/TensorRT/pull/2193

Full Changelog: https://github.com/pytorch/TensorRT/compare/v1.4.0...v2.2.0

v1.4.0

11 months ago

PyTorch 2.0, CUDA 11.8, TensorRT 8.6, Support for the new `torch.compile` API, compatibility mode for FX frontend

Torch-TensorRT 1.4.0 targets PyTorch 2.0, CUDA 11.8, TensorRT 8.5. This release introduces a number of beta features to set the stage for working with PyTorch and TensorRT in the 2.0 ecosystem. Primarily, this includes a new torch.compile backend targeting Torch-TensorRT. It also adds a compatibility layer that allows users of the TorchScript frontend for Torch-TensorRT to seamlessly try FX and Dynamo.

torch.compile` Backend for Torch-TensorRT

One of the most prominent new features in PyTorch 2.0 is the torch.compile workflow, which enables users to accelerate code easily by specifying a backend of their choice. Torch-TensorRT 1.4.0 introduces a new backend for torch.compile as a beta feature, including a convenience frontend to perform accelerated inference. This frontend can be accessed in one of two ways:

import torch_tensorrt
torch_tensorrt.dynamo.compile(model, inputs, ...)

##### OR #####

torch_tensorrt.compile(model, ir="dynamo_compile", inputs=inputs, ...)

For more examples, see the provided sample scripts, which can be found here This compilation method has a couple key considerations:

It can handle models with data-dependent control flow
It automatically falls back to Torch if the TRT Engine Build fails for any reason
It uses the Torch FX aten library of converters to accelerate models
Recompilation can be caused by changing the batch size of the input, or providing an input which enters a new control flow branch
Compiled models cannot be saved across Python sessions (yet) The feature is currently in beta, and we expect updates, changes, and improvements to the above in the future.

`fx_ts_compat` Frontend

As the ecosystem transitions from TorchScript to Dynamo, users of Torch-TensorRT may want start to experiment with this stack. As such we have introduced a new frontend for Torch-TensorRT which exposes the same APIs as the TorchScript frontend but will use the FX/Dynamo compiler stack. You can try this frontend by using the ir="fx_ts_compat" setting

torch_tensorrt.compile(..., ir="fx_ts_compat")

What's Changed

Fix build by @yinghai in https://github.com/pytorch/TensorRT/pull/1479
add circle CI signal in README page by @yinghai in https://github.com/pytorch/TensorRT/pull/1481
fix eisum signature by @yinghai in https://github.com/pytorch/TensorRT/pull/1480
Fix link to CircleCI in README.md by @yinghai in https://github.com/pytorch/TensorRT/pull/1483
Minor changes by @yinghai in https://github.com/pytorch/TensorRT/pull/1482
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1456
chore: upload docs for 1.3.0 by @narendasan in https://github.com/pytorch/TensorRT/pull/1504
fix: Repair Citrinet-1024 compilation issues by @gs-olive in https://github.com/pytorch/TensorRT/pull/1488
refactor: Split elementwise tests by @peri044 in https://github.com/pytorch/TensorRT/pull/1507
[feat] Support 1D topk by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1491
Support aten::sum with bool tensor input by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1512
[fix]Disambiguate cast layer names by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1513
feat: Add functionality for easily benchmarking fx code on key models by @gs-olive in https://github.com/pytorch/TensorRT/pull/1506
[feat]Canonicalize aten::multiply to aten::mul by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1517
broadcast the two input shapes for transposed matmul by @nvpohanh in https://github.com/pytorch/TensorRT/pull/1457
make padding layer converter more efficient by @nvpohanh in https://github.com/pytorch/TensorRT/pull/1470
fix: Change equals-check from reference to value for BERT model not compiling in FX by @gs-olive in https://github.com/pytorch/TensorRT/pull/1539
Update README dependencies section for v1.3.0 by @take-cheeze in https://github.com/pytorch/TensorRT/pull/1540
fix: aten::where with differing-shape inputs bugfix by @gs-olive in https://github.com/pytorch/TensorRT/pull/1533
fix: Automatically send truncated long ints to cuda at shape analysis time by @gs-olive in https://github.com/pytorch/TensorRT/pull/1541
feat: Add functionality to FX benchmarking + Improve documentation by @gs-olive in https://github.com/pytorch/TensorRT/pull/1529
[fix] Fix crash when calling unbind on evaluated tensor by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1554
Update test_flatten_aten and test_reshape_aten due to PT2.0 changed tracer behavior for these ops by @frank-wei in https://github.com/pytorch/TensorRT/pull/1559
fix: Bugfix for align_corners=False- FX interpolate by @gs-olive in https://github.com/pytorch/TensorRT/pull/1561
fix: Properly cast intermediate Int8 tensors to TensorRT Engines in Fallback by @gs-olive in https://github.com/pytorch/TensorRT/pull/1549
Upgrade stack to Pytorch 2.0 + CUDA 11.7 + TRT 8.5 GA by @peri044 in https://github.com/pytorch/TensorRT/pull/1477
feat: Add option to specify int64 as an Input dtype by @gs-olive in https://github.com/pytorch/TensorRT/pull/1551
feat: Support int inputs to aten::max/min and aten::argmax/argmin by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1574
fix: Add aten::full_like evaluator by @gs-olive in https://github.com/pytorch/TensorRT/pull/1584
tools: assign 1 person to a bug instead of all by @narendasan in https://github.com/pytorch/TensorRT/pull/1604
feat: Add support for aten::meshgrid by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1601
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1603
chore: Add FX core test by @peri044 in https://github.com/pytorch/TensorRT/pull/1593
chore: Update dockerfile by @peri044 in https://github.com/pytorch/TensorRT/pull/1581
fix: Replace RemoveDropout lowering pass implementation with modified JIT pass by @gs-olive in https://github.com/pytorch/TensorRT/pull/1589
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1625
chore: Update Dockerfile to Ubuntu 20.04 + Crash Resolution by @gs-olive in https://github.com/pytorch/TensorRT/pull/1639
fix: Bugfix in Linear-to-AddMM Fusion Lowering Pass by @gs-olive in https://github.com/pytorch/TensorRT/pull/1619
fix: Resolve compilation bug for empty tensors in aten::select by @gs-olive in https://github.com/pytorch/TensorRT/pull/1623
Convolution cast by @apbose in https://github.com/pytorch/TensorRT/pull/1609
fix: Bugfix in TRT Engine deserialization indexing by @gs-olive in https://github.com/pytorch/TensorRT/pull/1646
fix: fix the inappropriate lowering pass of aten::to by @bowang007 in https://github.com/pytorch/TensorRT/pull/1649
Lowering aten::pad to aten::constant_pad_nd/aten::reflection_padXd/aten::replication_padXd by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1588
[fix] Disambiguate element-wise cast layer names by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1630
feat: Add optional tensor domain argument to Input class by @gs-olive in https://github.com/pytorch/TensorRT/pull/1537
Improve batch_norm fp16 accuracy by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1450
add an example of aten2trt, fix batch norm pass by @frank-wei in https://github.com/pytorch/TensorRT/pull/1685
fix: Issue in non-Tensor Input Resolution by @gs-olive in https://github.com/pytorch/TensorRT/pull/1617
Corrected a typo, which was raising an error by @zshn25 in https://github.com/pytorch/TensorRT/pull/1694
Cherry-pick manylinux compatible builds into main by @narendasan in https://github.com/pytorch/TensorRT/pull/1677
fix: Improve input handling for input_signature by @gs-olive in https://github.com/pytorch/TensorRT/pull/1698
Unsqueeze operator with dynamic inout by @apbose in https://github.com/pytorch/TensorRT/pull/1624
[feat] Add converter support for index_select by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1692
[feat] Add converter support for aten::logical_not by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1705
fix: Bugfix in convNd_to_convolution lowering pass by @gs-olive in https://github.com/pytorch/TensorRT/pull/1693
[feat] Add converter for aten::any.dim by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1707
[fix] resolve issue for single non-batch index tensor in aten::index by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1700
fix: Handle nonetype pad value for Constant pad by @peri044 in https://github.com/pytorch/TensorRT/pull/1712
infra: Add Torch 1.13.1 testing to nightly CI by @gs-olive in https://github.com/pytorch/TensorRT/pull/1731
fix: Allow full model compilation with collection outputs by @gs-olive in https://github.com/pytorch/TensorRT/pull/1599
fix: fix the prim::Loop fallback issue by @bowang007 in https://github.com/pytorch/TensorRT/pull/1691
feat: Add decorator utility to improve error messaging for legacy support by @gs-olive in https://github.com/pytorch/TensorRT/pull/1738
minor fix: Update default minimum torch version for aten tracer by @gs-olive in https://github.com/pytorch/TensorRT/pull/1747
Get windows build working by @bharrisau in https://github.com/pytorch/TensorRT/pull/1711
Update config.yml by @frank-wei in https://github.com/pytorch/TensorRT/pull/1736
fix: Bugfix in shape analysis for multi-GPU systems by @gs-olive in https://github.com/pytorch/TensorRT/pull/1765
fix: Add schemas to convolution lowering pass by @gs-olive in https://github.com/pytorch/TensorRT/pull/1728
fix: Update Docker build to automatically adapt Torch version by @gs-olive in https://github.com/pytorch/TensorRT/pull/1732
feat: Upgrade Pytorch and TensorRT versions by @peri044 in https://github.com/pytorch/TensorRT/pull/1759
feat: Merge dynamo additions into release/1.4 by @gs-olive in https://github.com/pytorch/TensorRT/pull/1884
fix: Cherry-pick acc convolution fix to release/1.4 by @gs-olive in https://github.com/pytorch/TensorRT/pull/1910
cherry-pick: Reorganize + Upgrade Dynamo (release/1.4) by @gs-olive in https://github.com/pytorch/TensorRT/pull/1931
fix: Upgrade release/1.4 to Torch 2.0.1 + TensorRT 8.6.1 by @gs-olive in https://github.com/pytorch/TensorRT/pull/1896
cherry-pick: Dynamo upgrades and bugfixes (release/1.4) by @gs-olive in https://github.com/pytorch/TensorRT/pull/1956

New Contributors

@nvpohanh made their first contribution in https://github.com/pytorch/TensorRT/pull/1457
@take-cheeze made their first contribution in https://github.com/pytorch/TensorRT/pull/1540
@zshn25 made their first contribution in https://github.com/pytorch/TensorRT/pull/1694
@bharrisau made their first contribution in https://github.com/pytorch/TensorRT/pull/1711

Full Changelog: https://github.com/pytorch/TensorRT/compare/v1.3.0...v1.4.0

v1.3.0

1 year ago

PyTorch 1.13, CUDA 11.7, TensorRT 8.5, Support for Dynamic Batch for Partially Compiled Modules, Engine Profiling, Experimental Unified Runtime for FX and TorchScript Frontends

Torch-TensorRT 1.3.0 targets PyTorch 1.13, CUDA 11.7, cuDNN 8.5 and TensorRT 8.5. This release focuses on adding support for Dynamic Batch Sizes for partially compiled modules using the TorchScript frontend (this is also supported with the FX frontend). It also introduces a new execution profiling utility to understand the execution of specific engine sub blocks that can be used in conjunction with PyTorch profiling tools to understand the performance of your model post compilation. Finally this release introduces a new experimental unified runtime shared by both the TorchScript and FX frontends. This allows you to start using the FX frontend to generate torch.jit.traceable compiled modules.

Dynamic Batch Sizes for Partially Compiled Modules via the TorchScript Frontend

A long-standing limitation of the partitioning system in the TorchScript function is lack of support for dynamic shapes. In this release we address a major subset of these use cases with support for dynamic batch sizes for modules that will be partially compiled. Usage is the same as the fully compiled workflow where using the torch_tensorrt.Input class, you may define the range of shapes that an input may take during runtime. This is represented as a set of 3 shape sizes: min, max and opt. min and max define the dynamic range of the input Tensor. opt informs TensorRT what size to optimize for provided there are multiple valid kernels available. TensorRT will select kernels that are valid for the full range of input shapes but most efficient at the opt size. In this release, partially compiled module inputs can vary in shape for the highest order dimension.

For example:

min_shape: (1, 3, 128, 128)
opt_shape: (8, 3, 128, 128)
max_shape: (32, 3, 128, 128)

Is a valid shape range, however:

min_shape: (1, 3, 128, 128)
opt_shape: (1, 3, 256, 256)
max_shape: (1, 3, 512, 512)

is still not supported.

Engine Profiling [Experimental]

This release introduces a number of profiling tools to measure the performance of TensorRT sub blocks in compiled modules. This can be used in conjunction with PyTorch profiling tools to get a picture of the performance of your model. Profiling for any particular sub block can be enabled by the enabled_profiling() method of any __torch__.classes.tensorrt.Engine attribute, or of any torch_tensorrt.TRTModuleNext. The profiler will dump trace files by default in /tmp, though this path can be customized by either setting the profile_path_prefix of __torch__.classes.tensorrt.Engine or as an argument to torch_tensorrt.TRTModuleNext.enable_precision(profiling_results_dir=""). Traces can be visualized using the Perfetto tool (https://perfetto.dev)

Screenshot 2022-11-21 at 6 23 01 PM

Engine Layer information can also be accessed using get_layer_info which returns a JSON string with the layers / fusions that the engine contains.

Unified Runtime for FX and TorchScript Frontends [Experimental]

In previous versions of Torch-TensorRT, the FX and TorchScript frontends were mostly separate and each had their distinct benefits and limitations. Torch-TensorRT 1.3.0 introduces a new unified runtime to support both FX and TorchScript meaning that you can choose the compilation workflow that makes the most sense for your particular use case, be it pure Python conversion via FX or C++ Torchscript compilation. Both frontends use the same primitives to construct their compiled graphs be it fully compiled or just partially.

Basic Usage

The TorchScript frontend uses the new runtime by default. No additional workflow changes are necessary.

Note: The runtime ABI version was increased to support this feature, as such models compiled with previous versions of Torch-TensorRT will need to be recompiled

For the FX frontend, the new runtime can be chosen but setting use_experimental_fx_rt=True as part of your compile settings to either torch_tensorrt.compile(my_mod, ir="fx", use_experimental_fx_rt=True, explicit_batch_dimension=True) or torch_tensorrt.fx.compile(my_mod, use_experimental_fx_rt=True, explicit_batch_dimension=True)

Note: The new runtime only supports explicit batch dimension

TRTModuleNext

The FX frontend will return a torch.nn.Module containing torch_tensorrt.TRTModuleNext submodules instead of torch_tensorrt.fx.TRTModules. The features of these modules are nearly identical but with a few key improvements.

TRTModuleNext profiling dumps a trace visualizable with Perfetto (see above for more details).
TRTModuleNext modules are torch.jit.trace-able, meaning you can save FX compiled modules as TorchScript for python-less / C++ deployment scenarios. Traced compiled modules have the same deployment instructions as compiled modules produced by the TorchScript frontend.
TRTModuleNext maintains the same serialization workflows TRTModule supports as well (state_dict / extra_state, torch.save/torch.load)

Examples

model_fx = model_fx.cuda()
inputs_fx = [i.cuda() for i in inputs_fx]
trt_fx_module_f16 = torch_tensorrt.compile(
    model_fx,
    ir="fx",
    inputs=inputs_fx,
    enabled_precisions={torch.float16},
    use_experimental_fx_rt=True,
    explicit_batch_dimension=True
)

# Save model using torch.save 

torch.save(trt_fx_module_f16, "trt.pt")
reload_trt_mod = torch.load("trt.pt")

# Trace and save the FX module in TorchScript
scripted_fx_module = torch.jit.trace(trt_fx_module_f16, example_inputs=inputs_fx)
scripted_fx_module.save("/tmp/scripted_fx_module.ts")
scripted_fx_module = torch.jit.load("/tmp/scripted_fx_module.ts")

... #Get a handle for a TRTModuleNext submodule

# Extract state dictionary
st = trt_mod.state_dict()

# Load the state dict into a new module
new_trt_mod = TRTModuleNext()
new_trt_mod.load_state_dict(st)

Using TRTModuleNext as an arbirary TensorRT engine holder

Using TorchScript you have long been able to embed an arbritrary TensorRT engine from any source in a TorchScript module using torch_tensorrt.ts.embed_engine_in_new_module. Now you can do this at the torch.nn.Module level by directly using TRTModuleNext and access all the benefits enumerated above.

trt_mod = TRTModuleNext(
            serialized_engine,
            name="TestModule",
            input_binding_names=input_names,
            output_binding_names=output_names,
 )

The intention is in a future release to have torch_tensorrt.TRTModuleNext replace torch_tensorrt.fx.TRTModule as the default TensorRT Module implementation. Feedback on this class or how it is used, the runtime in general or associated features (profiler, engine inspector) is welcomed.

What's Changed

chore: Bump version to 1.2.0a0 by @narendasan in https://github.com/pytorch/TensorRT/pull/1044
feat: Extending nox for cxx11 ABI version by @andi4191 in https://github.com/pytorch/TensorRT/pull/1013
docs: Update the documentation theme to PyTorch by @narendasan in https://github.com/pytorch/TensorRT/pull/1063
Adding Code of Conduct file by @facebook-github-bot in https://github.com/pytorch/TensorRT/pull/1061
Update CONTRIBUTING.md by @frank-wei in https://github.com/pytorch/TensorRT/pull/1064
feat: Optimize hub.py download by @andi4191 in https://github.com/pytorch/TensorRT/pull/1022
Adding an action to automatically assign reviewers and assignees by @narendasan in https://github.com/pytorch/TensorRT/pull/1078
Add PR assigner support by @narendasan in https://github.com/pytorch/TensorRT/pull/1080
(//core): Align with prim::Enter in module fallback by @andi4191 in https://github.com/pytorch/TensorRT/pull/991
(//core): Added a variant for aten::split by @andi4191 in https://github.com/pytorch/TensorRT/pull/992
feat(nox): Replacing session with environment variable by @andi4191 in https://github.com/pytorch/TensorRT/pull/1057
Refactor the internal codebase from fx2trt_oss to torch_tensorrt by @frank-wei in https://github.com/pytorch/TensorRT/pull/1104
format by buildifier by @frank-wei in https://github.com/pytorch/TensorRT/pull/1106
[fx2trt] Modify lower setting class by @frank-wei in https://github.com/pytorch/TensorRT/pull/1107
Modified the notebooks directory's README file by @svenchilton in https://github.com/pytorch/TensorRT/pull/1102
[FX] Sync to OSS by @frank-wei in https://github.com/pytorch/TensorRT/pull/1118
[fx_acc] Add acc_tracer support for torch.mm by @khabinov in https://github.com/pytorch/TensorRT/pull/1120
Added Triton deployment instructions to documentation by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1116
amending triton deployment docs by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1126
fix: Update broken repo hyperlink by @lamhoangtung in https://github.com/pytorch/TensorRT/pull/1131
fix: Fix keep_dims functionality for aten::max by @peri044 in https://github.com/pytorch/TensorRT/pull/1099
fix(tests/core/partitioning): Fix tests of refactoring segmentation in partitioning by @peri044 in https://github.com/pytorch/TensorRT/pull/1140
feat(//tests): Update rtol and atol based tolerance for test cases by @andi4191 in https://github.com/pytorch/TensorRT/pull/1055
doc: add the explanation for partition phases on docs by @bowang007 in https://github.com/pytorch/TensorRT/pull/1090
feat (//cpp): Using atol and rtol based tolerance threshold for torchtrtc by @andi4191 in https://github.com/pytorch/TensorRT/pull/1052
CI/CD setup by @frank-wei in https://github.com/pytorch/TensorRT/pull/1137
Update README.md by @frank-wei in https://github.com/pytorch/TensorRT/pull/1142
[fx2trt] Engineholder feature improvement, test fixes by @frank-wei in https://github.com/pytorch/TensorRT/pull/1143
feat (//core/conversion) : Add converter for torch.bitwise_not by @blchu in https://github.com/pytorch/TensorRT/pull/1029
fixed typos by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1098
[FX] --fx-only does not need to check bazel by @frank-wei in https://github.com/pytorch/TensorRT/pull/1147
[FX] refactor the fx path in compile function by @frank-wei in https://github.com/pytorch/TensorRT/pull/1141
[FX] Create getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1145
[FX] move example folder by @frank-wei in https://github.com/pytorch/TensorRT/pull/1149
[FX] Sync enhancement done internally at Meta by @yinghai in https://github.com/pytorch/TensorRT/pull/1161
Update config.yml by @frank-wei in https://github.com/pytorch/TensorRT/pull/1163
Use py3 next() syntax by @ptrblck in https://github.com/pytorch/TensorRT/pull/1159
Add missing comma for proper torch versioning in setup.py by @dabauxi in https://github.com/pytorch/TensorRT/pull/1164
[docs] Update link to relative path by @zhiqwang in https://github.com/pytorch/TensorRT/pull/1171
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1172
fix: fix the model name typo error by @bowang007 in https://github.com/pytorch/TensorRT/pull/1176
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1178
[feat]: support slice with dynamic shape by @inocsin in https://github.com/pytorch/TensorRT/pull/1110
[FX] Update getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1184
[FX] Update README.md by @frank-wei in https://github.com/pytorch/TensorRT/pull/1183
fix: Fix PTQ calibration when there are multiple inputs by @peri044 in https://github.com/pytorch/TensorRT/pull/1191
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1194
[fix]: fix bug in aten::to, when network only have aten::to layer wil… by @inocsin in https://github.com/pytorch/TensorRT/pull/1108
Add .circleci/config.yml by @narendasan in https://github.com/pytorch/TensorRT/pull/1153
feat: Upgrade TRT to 8.4 by @peri044 in https://github.com/pytorch/TensorRT/pull/1152
feat: Update Pytorch version to 1.12 by @peri044 in https://github.com/pytorch/TensorRT/pull/1177
fix: converter renaming already named tensors by @bowang007 in https://github.com/pytorch/TensorRT/pull/1167
feat(//py): Use TensorRT to fill in .so libraries automatically if possible by @narendasan in https://github.com/pytorch/TensorRT/pull/1085
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1204
fix: fix the parsing related model loading bug by @bowang007 in https://github.com/pytorch/TensorRT/pull/1148
feat: support min_block_size != 1 caused fallback nodes re-segmentation by @bowang007 in https://github.com/pytorch/TensorRT/pull/1195
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1208
fix: fix the fallback related issue after merging collection by @bowang007 in https://github.com/pytorch/TensorRT/pull/1206
Add CMake support to build the libraries by @gcuendet in https://github.com/pytorch/TensorRT/pull/1058
Fix typo in EfficientNet-example by @davinnovation in https://github.com/pytorch/TensorRT/pull/1217
fix: fix bug that ListConstruct in TRT subgraph when it's entire graph's output by @bowang007 in https://github.com/pytorch/TensorRT/pull/1220
fix: fix the error that collection input segmented into trt subgraph by @bowang007 in https://github.com/pytorch/TensorRT/pull/1225
feat(//circleci): Adding release automation by @narendasan in https://github.com/pytorch/TensorRT/pull/1215
fix: support int tensor * int scaler in aten::mul by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1095
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1221
Fix errors in unbind and list slice by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1088
Adding a Resnet C++ example by @vinhngx in https://github.com/pytorch/TensorRT/pull/1175
[FX] disable 2 of conv3d and type_as tests by @frank-wei in https://github.com/pytorch/TensorRT/pull/1224
[feat] Add support for integers in aten::abs converter (#35) by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1232
Update PTQ example to fix new compile_spec requirements by @ncomly-nvidia in https://github.com/pytorch/TensorRT/pull/1242
feat: support for grouped inputs by @narendasan in https://github.com/pytorch/TensorRT/pull/1201
feat: Added support for custom torch operators and converters in torchtrtc by @andi4191 in https://github.com/pytorch/TensorRT/pull/1219
Add outputPadding in deconv by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1234
chore: Apply linting and ignore new bazel dirs by @narendasan in https://github.com/pytorch/TensorRT/pull/1223
added qat-ptq workflow notebook by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1239
fix: Update cmake for the new collection files by @narendasan in https://github.com/pytorch/TensorRT/pull/1246
chore: ignore dist dir for pre-commit by @narendasan in https://github.com/pytorch/TensorRT/pull/1249
chore: Aligning bazel version for consistency across different docker… by @andi4191 in https://github.com/pytorch/TensorRT/pull/1250
refactor: Changed the hardcoded values to macros for DLA memory sizes by @andi4191 in https://github.com/pytorch/TensorRT/pull/1247
chore: update jetson pytorch baase by @narendasan in https://github.com/pytorch/TensorRT/pull/1251
[feat] Add automatic type promotion to element-wise ops by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1240
Assorted small fixes by @narendasan in https://github.com/pytorch/TensorRT/pull/1259
[FX] remove op_lowering_disallow_list and format revert by @frank-wei in https://github.com/pytorch/TensorRT/pull/1261
fix: fix the "schema not found for node" error by @bowang007 in https://github.com/pytorch/TensorRT/pull/1236
chore: Fix contributing doc by @peri044 in https://github.com/pytorch/TensorRT/pull/1268
feat: support scatter.value and scatter.src by @inocsin in https://github.com/pytorch/TensorRT/pull/1252
Internal workspace workflow by @narendasan in https://github.com/pytorch/TensorRT/pull/1269
Fix typo in README by @davinnovation in https://github.com/pytorch/TensorRT/pull/1273
Support swin/bert with dynamic batch by @Njuapp in https://github.com/pytorch/TensorRT/pull/1270
correct sha256sum of cudnn by @Njuapp in https://github.com/pytorch/TensorRT/pull/1278
Jetson workspace by @narendasan in https://github.com/pytorch/TensorRT/pull/1280
chore(deps): bump @actions/core from 1.8.2 to 1.9.1 in /.github/actions/assigner by @dependabot in https://github.com/pytorch/TensorRT/pull/1287
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1288
chore: Fix dataloader in finetune_qat script by @andi4191 in https://github.com/pytorch/TensorRT/pull/1292
chore: Truncate long and double for ptq CPP path by @andi4191 in https://github.com/pytorch/TensorRT/pull/1291
feat: Add support for aten::square by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1286
fix: fix misleading skipping partitioning msg by @bowang007 in https://github.com/pytorch/TensorRT/pull/1289
fix: Add int support to constant_pad_nd by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1283
fix: Resolve non-determinism in registerSegmentsOutputs by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1284
docs: Update docgen task by @narendasan in https://github.com/pytorch/TensorRT/pull/1294
update fx notebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1297
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1299
fix(tools): Fix linter to not depend on docker by @narendasan in https://github.com/pytorch/TensorRT/pull/1301
Support multiple indices for aten::index.Tensor by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1309
chore: Adding CMake to the CI by @narendasan in https://github.com/pytorch/TensorRT/pull/1310
feat: Upgrade Pytorch to 1.12.1 and TensorRT to 8.4.3.1 by @peri044 in https://github.com/pytorch/TensorRT/pull/1315
Fix bug: correct the output shape of aten::index.Tensor by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1314
feat (//core/conversion) : Add converter for torch.repeat_interleave ( by @blchu in https://github.com/pytorch/TensorRT/pull/1313
chore: Adding NGC build path by @narendasan in https://github.com/pytorch/TensorRT/pull/1311
Update lower.py by @frank-wei in https://github.com/pytorch/TensorRT/pull/1324
fix!: Fixed Windows compilation failures by @andi4191 in https://github.com/pytorch/TensorRT/pull/1330
[feat] Add support for argmax and argmin by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1312
chore: Adding a guideline to build on Windows platform by @andi4191 in https://github.com/pytorch/TensorRT/pull/1337
chore: Fix data loader issues and nox file paths by @peri044 in https://github.com/pytorch/TensorRT/pull/1281
feat(//tools/perf): Refactor perf_run.py, add fx2trt backend support, usage via CLI arguments by @peri044 in https://github.com/pytorch/TensorRT/pull/1254
refactor(//tests) : Refactor the test suite by @peri044 in https://github.com/pytorch/TensorRT/pull/1329
[feat] add support for aten::reciprocal(int) by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1308
[FX] Update getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1342
Update getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1343
enable direct call to fx.compile() by @frank-wei in https://github.com/pytorch/TensorRT/pull/1344
fix: add remove_exception pass from torch to fix uninitialized tensor… by @bowang007 in https://github.com/pytorch/TensorRT/pull/1345
chore: apply linting to docs by @narendasan in https://github.com/pytorch/TensorRT/pull/1347
docs: Adding v1.2.0 and v1.1.1 docs by @narendasan in https://github.com/pytorch/TensorRT/pull/1349
Docs for release by @narendasan in https://github.com/pytorch/TensorRT/pull/1350
fix: Fixing pybind error on nightly by @andi4191 in https://github.com/pytorch/TensorRT/pull/1285
Centralizing Partitioning State by @narendasan in https://github.com/pytorch/TensorRT/pull/1263
chore: Fix centralized partititoning by @peri044 in https://github.com/pytorch/TensorRT/pull/1367
chore: Move master to test nightly only by @narendasan in https://github.com/pytorch/TensorRT/pull/1370
[fix] Avoid layer name conflicts in aten::index by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1377
[fix] Fix output dimensions of aten::unbind converter by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1373
Einsum converter by @gs-olive in https://github.com/pytorch/TensorRT/pull/1385
Atan2 converter by @gs-olive in https://github.com/pytorch/TensorRT/pull/1381
[FX] aten2trt and some pass fixes by @frank-wei in https://github.com/pytorch/TensorRT/pull/1390
feat: Add converter for aten::sign unary op by @gs-olive in https://github.com/pytorch/TensorRT/pull/1391
Add support for aten::squeeze without a dim by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1393
[fix] incorrect casting behavior in floor_divide by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1392
chore: minor fixes by @peri044 in https://github.com/pytorch/TensorRT/pull/1397
fix: torch.std and torch.var support multi-dimensional reductions by @gs-olive in https://github.com/pytorch/TensorRT/pull/1395
fix: fix missing float type in shape analysis by @bowang007 in https://github.com/pytorch/TensorRT/pull/1399
feat: Rsqrt lowering pass by @gs-olive in https://github.com/pytorch/TensorRT/pull/1394
Add correct pip install instructions by @msaroufim in https://github.com/pytorch/TensorRT/pull/1400
fix: aten::split behavior with negative indexing by @gs-olive in https://github.com/pytorch/TensorRT/pull/1403
fix: fix compilation stuck bug caused by elimination exception by @bowang007 in https://github.com/pytorch/TensorRT/pull/1409
[FX] Fix clamping float32 boundary values, aten2trt init check-in, fix slice issues by @frank-wei in https://github.com/pytorch/TensorRT/pull/1415
[feat]Add converter for aten::where by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1421
[feat]Add converter support for aten::frobenius_norm by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1422
chore: Update torch installation paths for NGC by @peri044 in https://github.com/pytorch/TensorRT/pull/1435
[feat] Add dependency awareness to torch-trt partitioning by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1304
docs: minor changes in Resnet50 example by @przemb in https://github.com/pytorch/TensorRT/pull/1427
fix: Ensure proper type inheritance in aten::masked_fill by @gs-olive in https://github.com/pytorch/TensorRT/pull/1430
chore: Nox file update from NGC 22.11 release by @peri044 in https://github.com/pytorch/TensorRT/pull/1438
fix: Add check to ensure einsum converter has no more than 2 tensor inputs by @gs-olive in https://github.com/pytorch/TensorRT/pull/1439
[feat] Add partial converter support for aten::linalg_norm by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1426
chore: Lint noxfile.py by @gs-olive in https://github.com/pytorch/TensorRT/pull/1443
fix: CUDA error 710 bugfix by @gs-olive in https://github.com/pytorch/TensorRT/pull/1424
scalar_to_tensor avoid scalar.to() by @Njuapp in https://github.com/pytorch/TensorRT/pull/1448
feat: rewriting param to a Constant if it's a introduced input by @bowang007 in https://github.com/pytorch/TensorRT/pull/1298
feat: support int64 <=> int32 auto conversion by @bowang007 in https://github.com/pytorch/TensorRT/pull/1407
fix: Device casting issues with certain aten operators by @gs-olive in https://github.com/pytorch/TensorRT/pull/1416
feat(//core/partitioning) : Dynamic shapes + fallback by @peri044 in https://github.com/pytorch/TensorRT/pull/1414
[fix] unmangle_cls_name for variable length mangled tags by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1454
fix: Error with aten::div when using truncation with Int32 tensor inputs by @gs-olive in https://github.com/pytorch/TensorRT/pull/1442
fix: fix failed test cases caused by partition API changes by @bowang007 in https://github.com/pytorch/TensorRT/pull/1460
fix: Update floor division schema replacement in lowering by @gs-olive in https://github.com/pytorch/TensorRT/pull/1464
feat: Add functionality to performance tooling by @gs-olive in https://github.com/pytorch/TensorRT/pull/1451
Unifying the FX and TS Frontends by @narendasan in https://github.com/pytorch/TensorRT/pull/1404

New Contributors

@facebook-github-bot made their first contribution in https://github.com/pytorch/TensorRT/pull/1061
@frank-wei made their first contribution in https://github.com/pytorch/TensorRT/pull/1064
@khabinov made their first contribution in https://github.com/pytorch/TensorRT/pull/1120
@blchu made their first contribution in https://github.com/pytorch/TensorRT/pull/1029
@yinghai made their first contribution in https://github.com/pytorch/TensorRT/pull/1161
@ptrblck made their first contribution in https://github.com/pytorch/TensorRT/pull/1159
@dabauxi made their first contribution in https://github.com/pytorch/TensorRT/pull/1164
@zhiqwang made their first contribution in https://github.com/pytorch/TensorRT/pull/1171
@gcuendet made their first contribution in https://github.com/pytorch/TensorRT/pull/1058
@davinnovation made their first contribution in https://github.com/pytorch/TensorRT/pull/1217
@dependabot made their first contribution in https://github.com/pytorch/TensorRT/pull/1287
@msaroufim made their first contribution in https://github.com/pytorch/TensorRT/pull/1400
@przemb made their first contribution in https://github.com/pytorch/TensorRT/pull/1427

Full Changelog: https://github.com/pytorch/TensorRT/compare/v1.1.0...v1.3.0

v1.2.0

1 year ago

PyTorch 1.12, Collections based I/O, FX Frontend, torchtrtc custom op support, CMake build system and Community Window Support

Torch-TensorRT 1.2.0 targets PyTorch 1.12, CUDA 11.6, cuDNN 8.4 and TensorRT 8.4. This release focuses on a couple key new APIs to handle function I/O that uses collection types which should enable whole new model classes to be compiled by Torch-TensorRT without source code modification. It also introduces the "FX Frontend", a new frontend for Torch-TensorRT which leverages FX, a high level IR built into PyTorch with extensive Python APIs. For uses cases which do not need to be run outside of Python this may be a strong option to try as it is easily extensible in a familar development enviornment. In Torch-TensorRT 1.2.0, the FX frontend should be considered beta level in stability. torchtrtc has received improvements which target the ability to handle operators outside of the core PyTorch op set. This includes custom operators from libraries such as torchvision and torchtext. Similarlly users can provide custom converters to torchtrtc to extend the compilers support from the command line instead of having to write an application to do so. Finally, Torch-TensorRT introduces community supported Windows and CMake support.

New Dependencies

`nvidia-tensorrt`

For previous versions of Torch-TensorRT, users had to install TensorRT via system package manager and modify their LD_LIBRARY_PATH in order to set up Torch-TensorRT. Now users should install the TensorRT Python API as part of the installation proceedure. This can be done via the following steps:

pip install nvidia-pyindex
pip install nvidia-tensorrt==8.4.3.1
pip install torch-tensorrt==1.2.0 -f https://github.com/pytorch/tensorrt/releases

Installing the TensorRT pip package will allow Torch-TensorRT to automatically load the TensorRT libraries without any modification to enviornment variables. It is also a necessary dependency for the FX Frontend.

`torchvision`

Some FX frontend converters are designed to target operators from 3rd party libraries like torchvision. As such, you must have torchvision installed in order to use them. However, this dependency is optional for cases where you do not need this support.

Jetson

Starting from this release we will be distributing precompiled binaries of our NGC release branches for aarch64 (as well as x86_64), starting with ngc/22.11. These releases are designed to be paired with NVIDIA distributed builds of PyTorch including the NGC containers and Jetson builds and are equivalent to the prepackaged distribution of Torch-TensorRT that comes in the containers. They represent the state of the master branch at the time of branch cutting so may lag in features by a month or so. These releases will come separately to minor version releases like this one. Therefore going forward, these NGC releases should be the primary release channel used on Jetson (including for building from source).

NOTE: NGC PyTorch builds are not identical to builds you might install through normal channels like pytorch.org. In the past this has caused issues in portability between pytorch.org builds and NGC builds. Therefore we strongly recommend in workflows such as exporting a TorchScript module on an x86 machine and then compiling on Jetson to ensure you are using the NGC container release on x86 for your host machine operations. More information about Jetson support can be found along side the 22.07 release (https://github.com/pytorch/TensorRT/releases/tag/v1.2.0a0.nv22.07)

Collections based I/O [Experimental]

Torch-TensorRT previously has operated under the assumption that nn.Module forward functions can trivially be reduced to the form forward([Tensor]) -> [Tensor]. Typically this implies functions fo the form forward(Tensor, Tensor, ... Tensor) -> (Tensor, Tensor, ..., Tensor). However as model complexity increases, grouping inputs may make it easier to manage many inputs. Therefore, function signatures similar to forward([Tensor], (Tensor, Tensor)) -> [Tensor] or forward((Tensor, Tensor)) -> (Tensor, (Tensor, Tensor)) might be more common. In Torch-TensorRT 1.2.0, more of these kinds of uses cases are supported using the new experimental input_signature compile spec API. This API allows users to group Input specs similar to how they might group the input Tensors they would use to call the original module's forward function. This informs Torch-TensorRT on how to map a Tensor input from its location in a group to the engine and from the engine into its grouping returned back to the user.

To make this concrete consider the following standard case:

class StandardTensorInput(nn.Module):
    def __init__(self):
        super(StandardTensorInput, self).__init__()

    def forward(self, x, y):
        r = x + y
        return r

x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = StandardTensorInput().eval().to("cuda")

trt_module = torch_tensorrt.compile(
    module,
    inputs=[
        torch_tensorrt.Input(x.shape),
        torch_tensorrt.Input(y.shape)
    ],
    min_block_size=1
)

out = trt_module(x,y)
print(out)

Here a user has defined two explicit tensor inputs and used the existing list based API to define the input specs.

With Torch-TensorRT the following use cases are now possible using the new input_signature API:

Tuple based input collection

class TupleInput(nn.Module):
    def __init__(self):
        super(TupleInput, self).__init__()

    def forward(self, z: Tuple[torch.Tensor, torch.Tensor]):
        r = z[0] + z[1]
        return r

x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = TupleInput().eval().to("cuda")

trt_module = torch_tensorrt.compile(
    module,
    input_signature=((x, y),), # Note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module((x,y))
print(out)

List based input collection

class ListInput(nn.Module):
    def __init__(self):
        super(ListInput, self).__init__()

    def forward(self, z: List[torch.Tensor]):
        r = z[0] + z[1]
        return r

x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = ListInput().eval().to("cuda")

trt_module = torch_tensorrt.compile(
    module,
    input_signature=([x,y],), # Again, note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module([x,y])
print(out)

Note how the input specs (in this case just example tensors) are provided to the compiler. The input_signature argument expects a Tuple[Union[torch.Tensor, torch_tensorrt.Input, List, Tuple]] grouped in a format representative of how the function would be called. In these cases its just a list or tuple of specs.

More advanced cases are supported as we:

Tuple I/O

class TupleInputOutput(nn.Module):
    def __init__(self):
        super(TupleInputOutput, self).__init__()

    def forward(self, z: Tuple[torch.Tensor, torch.Tensor]):
        r1 = z[0] + z[1]
        r2 = z[0] - z[1]
        r1 = r1 * 10
        r = (r1, r2)
        return r

x = torch.Tensor([1,2,3For previous versions of Torch-TensorRT, users had to install TensorRT via ]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = TupleInputOutput()

trt_module = torch_tensorrt.compile(
    module,
    input_signature=((x,y),), # Again, note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module((x,y))
print(out)

List I/O

class ListInputOutput(nn.Module):
    def __init__(self):
        super(ListInputOutput, self).__init__()

    def forward(self, z: List[torch.Tensor]):
        r1 = z[0] + z[1]
        r2 = z[0] - z[1]
        r = [r1, r2]
        return r

x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = ListInputOutput()

trt_module = torch_tensorrt.compile(
    module,
    input_signature=([x,y],), # Again, note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module((x,y))
print(out)

Multple Groups of Mixed Types

class MultiGroupIO(nn.Module):
    def __init__(self):
        super(MultiGroupIO, self).__init__()

    def forward(self, z: List[torch.Tensor], a: Tuple[torch.Tensor, torch.Tensor]):
        r1 = z[0] + z[1]
        r2 = a[0] + a[1]
        r3 = r1 - r2
        r4 = [r1, r2]
        return (r3, r4)
    
x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = MultiGroupIO().eval.to("cuda")

trt_module = torch_tensorrt.compile(
    module,
    input_signature=([x,y],(x,y)), # Again, note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module([x,y],(x,y))
print(out)

These features are also supported in C++ as well:


torch::jit::Module mod;
try {
  // Deserialize the ScriptModule from a file using torch::jit::load().
  mod = torch::jit::load(path);
} catch (const c10::Error& e) {
  std::cerr << "error loading the model\n";
}
mod.eval();
mod.to(torch::kCUDA);

std::vector<torch::jit::IValue> inputs_;

for (auto in : inputs) {
  inputs_.push_back(torch::jit::IValue(in.clone()));
}

std::vector<torch::jit::IValue> complex_inputs;
auto input_list = c10::impl::GenericList(c10::TensorType::get());
input_list.push_back(inputs_[0]);
input_list.push_back(inputs_[0]);

torch::jit::IValue input_list_ivalue = torch::jit::IValue(input_list);

complex_inputs.push_back(input_list_ivalue);

auto input_shape = torch_tensorrt::Input(in0.sizes(), torch_tensorrt::DataType::kHalf);
auto input_shape_ivalue = torch::jit::IValue(std::move(c10::make_intrusive<torch_tensorrt::Input>(input_shape)));

c10::TypePtr elementType = input_shape_ivalue.type();
auto list = c10::impl::GenericList(elementType);
list.push_back(input_shape_ivalue);
list.push_back(input_shape_ivalue);

torch::jit::IValue complex_input_shape(list);
std::tuple<torch::jit::IValue> input_tuple2(complex_input_shape);
torch::jit::IValue complex_input_shape2(input_tuple2);

auto compile_settings = torch_tensorrt::ts::CompileSpec(complex_input_shape2);
compile_settings.min_block_size = 1;
compile_settings.enabled_precisions = {torch::kHalf};

// // Compile module
auto trt_mod = torch_tensorrt::ts::compile(mod, compile_settings);
auto trt_out = trt_mod.forward(complex_inputs);

Currently this feature should be considered experimental, APIs may be subject to change or folded into existing APIs. There are also limitations introduced by using this feature including the following:

Not all collection types are supported (e.g. Dict, namedtuple)
Not being able to require_full_compilation while using this feature
Certain operators are required to run in PyTorch throughout the graph which may impact performance
The maximum depth of collections nesting is limited.
These limitations will be addressed in subsequent versions.

Adding FX frontend to Torch-TensorRT [Beta]

This release includes the FX as one of its supported IRs to convert torch models to TensorRT through the new FX frontend. At a high level, this path transforms the model into or consumes an FX graph and similar to the TorchScript frontend converts the graph to TensorRT through the use of a library of converters. The key difference is that it is implemented purely in Python. The role of this FX frontend is to supplement the TS lowering path and to provide users better ease of use and easier extensibility in use cases where removing Python as a dependency is not strictly necessary. Detailed user instructions can be find in the document. The FX path examples are located under //examples/fx The FX path unit tests are located under //py/torch_tensorrt/fx/tests

Custom operators and converters in Torch-TensorRT

While both the C++ API and Python API provide systems to include and convert custom operators in your model (for instance those implemented in torchvision) torchtrtc has been limited to the core opset. In Torch-TensorRT 1.2.0 two new flags have been added to torchtrtc.

    --custom-torch-ops                (repeatable) Shared object/DLL containing custom torch operators
    --custom-converters               (repeatable) Shared object/DLL containing custom converters

These arguments accept paths to .so or DLL files which define custom operators for PyTorch or custom converters for Torch-TensorRT. These files will get DL_OPEN'd at runtime to extend the op and converter libraries.

For example:

torchtrtc tests/modules/ssd_traced.jit.pt ssd_trt.ts --custom-torch-ops=<path to custom library .so file> --custom-converters=<path to custom library .so file> "[(1,3,300,300); (1,3,512,512); (1, 3, 1024, 1024)]@fp16%contiguous" -p f16

Community CMake and Windows support

Thanks to the great work of @gcuendet and others, CMake and consequentially Windows support has been added to the project! Users on Linux and Windows can now build the C++ API using this system and using torch_tensorrt_runtime.dll add support for executing Torch-TensorRT programs on Windows in both Python and C++. Detailed information on how to use this build system can be found here: https://pytorch.org/TensorRT/getting_started/installation.html

Bazel will continue to be the primary build system for the project and all testing and distributed builds will be built and run with Bazel (including future official Windows support) so users should consider this still the canonical version of Torch-TensorRT. However we aim to ensure as best as we can that the CMake system will be able to build the project properly including on Windows. Contributions to continue to grow the support for this build system and Windows as a platform are definitely welcomed.

Known Limitations

Collections I/O
- Not all collection types are supported (e.g. Dict, namedtuple)
- Not being able to require_full_compilation while using this feature
- Certain operators are required to run in PyTorch throughout the graph which may impact performance
- The maximum depth of collections nesting is limited.
FX
- Some of FX operators have limited dynamic shape capability. Please check here.
- Control flow in model could not be handled
Python API via the CMake build system.

Dependencies

- Bazel 5.2.0
- LibTorch 1.12.1
- CUDA 11.6 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build)
- cuDNN 8.4.1.50
- TensorRT 8.4.3.1

Operators Supported (TorchScript)

Operators Currently Supported Through Converters

aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
aten::abs(Tensor self) -> (Tensor)
aten::acos(Tensor self) -> (Tensor)
aten::acosh(Tensor self) -> (Tensor)
aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
aten::adaptive_avg_pool3d(Tensor self, int[3] output_size) -> (Tensor)
aten::adaptive_max_pool1d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
aten::adaptive_max_pool3d(Tensor self, int[3] output_size) -> (Tensor, Tensor)
aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::argmax(Tensor self, int dim, bool keepdim=False) -> (Tensor)
aten::argmin(Tensor self, int dim, bool keepdim=False) -> (Tensor)
aten::asin(Tensor self) -> (Tensor)
aten::asinh(Tensor self) -> (Tensor)
aten::atan(Tensor self) -> (Tensor)
aten::atanh(Tensor self) -> (Tensor)
aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::bitwise_not(Tensor self) -> (Tensor)
aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
aten::ceil(Tensor self) -> (Tensor)
aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
aten::cos(Tensor self) -> (Tensor)
aten::cosh(Tensor self) -> (Tensor)
aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::div.Tensor_mode(Tensor self, Tensor other, *, str? rounding_mode) -> (Tensor)
aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::erf(Tensor self) -> (Tensor)
aten::exp(Tensor self) -> (Tensor)
aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
aten::floor(Tensor self) -> (Tensor)
aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
aten::index.Tensor(Tensor self, Tensor?[] indices) -> (Tensor)
aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
aten::log(Tensor self) -> (Tensor)
aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
aten::matmul(Tensor self, Tensor other) -> (Tensor)
aten::max(Tensor self) -> (Tensor)
aten::max.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
aten::max.other(Tensor self, Tensor other) -> (Tensor)
aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::min(Tensor self) -> (Tensor)
aten::min.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
aten::min.other(Tensor self, Tensor other) -> (Tensor)
aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::neg(Tensor self) -> (Tensor)
aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
aten::prelu(Tensor self, Tensor weight) -> (Tensor)
aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::reciprocal(Tensor self) -> (Tensor)
aten::reflection_pad1d(Tensor self, int[2] padding) -> (Tensor)
aten::reflection_pad2d(Tensor self, int[4] padding) -> (Tensor)
aten::relu(Tensor input) -> (Tensor)
aten::relu_(Tensor(a!) self) -> (Tensor(a!))
aten::repeat(Tensor self, int[] repeats) -> (Tensor)
aten::repeat_interleave.self_int(Tensor self, int repeats, int? dim=None, *, int? output_size=None) -> (Tensor)
aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
aten::reshape(Tensor self, int[] shape) -> (Tensor)
aten::roll(Tensor self, int[1] shifts, int[1] dims=[]) -> (Tensor)
aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::scatter.src(Tensor self, int dim, Tensor index, Tensor src) -> (Tensor)
aten::scatter.value(Tensor self, int dim, Tensor index, Scalar value) -> (Tensor)
aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
aten::sigmoid(Tensor input) -> (Tensor)
aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
aten::sin(Tensor self) -> (Tensor)
aten::sinh(Tensor self) -> (Tensor)
aten::slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> (Tensor(a))
aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
aten::split.sizes(Tensor(a -> *) self, int[] split_size, int dim=0) -> (Tensor[])
aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::sqrt(Tensor self) -> (Tensor)
aten::square(Tensor self) -> (Tensor)
aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
aten::sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::t(Tensor self) -> (Tensor)
aten::tan(Tensor self) -> (Tensor)
aten::tanh(Tensor input) -> (Tensor)
aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
aten::to.device(Tensor(a) self, Device device, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor(a))
aten::to.dtype(Tensor self, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
aten::to.other(Tensor self, Tensor other, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(a|b))
aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
aten::unbind.int(Tensor(a -> *) self, int dim=0) -> (Tensor[])
aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
trt::const(Tensor self) -> (Tensor)

Operators Currently Supported Through Evaluators

aten::Bool.float(float b) -> (bool)
aten::Bool.int(int a) -> (bool)
aten::Float.Scalar(Scalar a) -> float
aten::Float.bool(bool a) -> float
aten::Float.int(int a) -> float
aten::Int.Scalar(Scalar a) -> int
aten::Int.bool(bool a) -> int
aten::Int.float(float a) -> int
aten::Int.int(int a) -> int
aten::and(int a, int b) -> (bool)
aten::and.bool(bool a, bool b) -> (bool)
aten::__derive_index(int idx, int start, int step) -> int
aten::getitem.t(t list, int idx) -> (t(*))
aten::is(t1 self, t2 obj) -> bool
aten::isnot(t1 self, t2 obj) -> bool
aten::not(bool self) -> bool
aten::or(int a, int b) -> (bool)
aten::__range_length(int lo, int hi, int step) -> int
aten::__round_to_zero_floordiv(int a, int b) -> (int)
aten::xor(int a, int b) -> (bool)
aten::add.float(float a, float b) -> (float)
aten::add.int(int a, int b) -> (int)
aten::add.str(str a, str b) -> (str)
aten::add_.t(t self, t[] b) -> (t[])
aten::append.t(t self, t(c -> *) el) -> (t)
aten::arange(Scalar end, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::arange.start_step(Scalar start, Scalar end, Scalar step, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::clone(Tensor self, *, int? memory_format=None) -> (Tensor)
aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
aten::dim(Tensor self) -> int
aten::div.float(float a, float b) -> (float)
aten::div.int(int a, int b) -> (float)
aten::eq.bool(bool a, bool b) -> (bool)
aten::eq.float(float a, float b) -> (bool)
aten::eq.float_int(float a, int b) -> (bool)
aten::eq.int(int a, int b) -> (bool)
aten::eq.int_float(int a, float b) -> (bool)
aten::eq.str(str a, str b) -> (bool)
aten::extend.t(t self, t[] other) -> ()
aten::floor.float(float a) -> (int)
aten::floor.int(int a) -> (int)
aten::floordiv.float(float a, float b) -> (int)
aten::floordiv.int(int a, int b) -> (int)
aten::format(str self, ...) -> (str)
aten::ge.bool(bool a, bool b) -> (bool)
aten::ge.float(float a, float b) -> (bool)
aten::ge.float_int(float a, int b) -> (bool)
aten::ge.int(int a, int b) -> (bool)
aten::ge.int_float(int a, float b) -> (bool)
aten::gt.bool(bool a, bool b) -> (bool)
aten::gt.float(float a, float b) -> (bool)
aten::gt.float_int(float a, int b) -> (bool)
aten::gt.int(int a, int b) -> (bool)
aten::gt.int_float(int a, float b) -> (bool)
aten::is_floating_point(Tensor self) -> (bool)
aten::le.bool(bool a, bool b) -> (bool)
aten::le.float(float a, float b) -> (bool)
aten::le.float_int(float a, int b) -> (bool)
aten::le.int(int a, int b) -> (bool)
aten::le.int_float(int a, float b) -> (bool)
aten::len.t(t[] a) -> (int)
aten::lt.bool(bool a, bool b) -> (bool)
aten::lt.float(float a, float b) -> (bool)
aten::lt.float_int(float a, int b) -> (bool)
aten::lt.int(int a, int b) -> (bool)
aten::lt.int_float(int a, float b) -> (bool)
aten::mul.float(float a, float b) -> (float)
aten::mul.int(int a, int b) -> (int)
aten::ne.bool(bool a, bool b) -> (bool)
aten::ne.float(float a, float b) -> (bool)
aten::ne.float_int(float a, int b) -> (bool)
aten::ne.int(int a, int b) -> (bool)
aten::ne.int_float(int a, float b) -> (bool)
aten::neg.int(int a) -> (int)
aten::numel(Tensor self) -> int
aten::pow.float(float a, float b) -> (float)
aten::pow.float_int(float a, int b) -> (float)
aten::pow.int(int a, int b) -> (float)
aten::pow.int_float(int a, float b) -> (float)
aten::size(Tensor self) -> (int[])
aten::size.int(Tensor self, int dim) -> (int)
aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
aten::sqrt.float(float a) -> (float)
aten::sqrt.int(int a) -> (float)
aten::sub.float(float a, float b) -> (float)
aten::sub.int(int a, int b) -> (int)
aten::tensor(t[] data, *, int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor)
prim::TupleIndex(Any tup, int i) -> (Any)
prim::dtype(Tensor a) -> (int)
prim::max.bool(bool a, bool b) -> (bool)
prim::max.float(float a, float b) -> (bool)
prim::max.float_int(float a, int b) -> (bool)
prim::max.int(int a, int b) -> (bool)
prim::max.int_float(int a, float b) -> (bool)
prim::max.self_int(int[] self) -> (int)
prim::min.bool(bool a, bool b) -> (bool)
prim::min.float(float a, float b) -> (bool)
prim::min.float_int(float a, int b) -> (bool)
prim::min.int(int a, int b) -> (bool)
prim::min.int_float(int a, float b) -> (bool)
prim::min.self_int(int[] self) -> (int)
prim::shape(Tensor a) -> (int[])

What's Changed

chore: Bump version to 1.2.0a0 by @narendasan in https://github.com/pytorch/TensorRT/pull/1044
feat: Extending nox for cxx11 ABI version by @andi4191 in https://github.com/pytorch/TensorRT/pull/1013
docs: Update the documentation theme to PyTorch by @narendasan in https://github.com/pytorch/TensorRT/pull/1063
Adding Code of Conduct file by @facebook-github-bot in https://github.com/pytorch/TensorRT/pull/1061
Update CONTRIBUTING.md by @frank-wei in https://github.com/pytorch/TensorRT/pull/1064
feat: Optimize hub.py download by @andi4191 in https://github.com/pytorch/TensorRT/pull/1022
Adding an action to automatically assign reviewers and assignees by @narendasan in https://github.com/pytorch/TensorRT/pull/1078
Add PR assigner support by @narendasan in https://github.com/pytorch/TensorRT/pull/1080
(//core): Align with prim::Enter in module fallback by @andi4191 in https://github.com/pytorch/TensorRT/pull/991
(//core): Added a variant for aten::split by @andi4191 in https://github.com/pytorch/TensorRT/pull/992
feat(nox): Replacing session with environment variable by @andi4191 in https://github.com/pytorch/TensorRT/pull/1057
Refactor the internal codebase from fx2trt_oss to torch_tensorrt by @frank-wei in https://github.com/pytorch/TensorRT/pull/1104
format by buildifier by @frank-wei in https://github.com/pytorch/TensorRT/pull/1106
[fx2trt] Modify lower setting class by @frank-wei in https://github.com/pytorch/TensorRT/pull/1107
Modified the notebooks directory's README file by @svenchilton in https://github.com/pytorch/TensorRT/pull/1102
[FX] Sync to OSS by @frank-wei in https://github.com/pytorch/TensorRT/pull/1118
[fx_acc] Add acc_tracer support for torch.mm by @khabinov in https://github.com/pytorch/TensorRT/pull/1120
Added Triton deployment instructions to documentation by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1116
amending triton deployment docs by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1126
fix: Update broken repo hyperlink by @lamhoangtung in https://github.com/pytorch/TensorRT/pull/1131
fix: Fix keep_dims functionality for aten::max by @peri044 in https://github.com/pytorch/TensorRT/pull/1099
fix(tests/core/partitioning): Fix tests of refactoring segmentation in partitioning by @peri044 in https://github.com/pytorch/TensorRT/pull/1140
feat(//tests): Update rtol and atol based tolerance for test cases by @andi4191 in https://github.com/pytorch/TensorRT/pull/1055
doc: add the explanation for partition phases on docs by @bowang007 in https://github.com/pytorch/TensorRT/pull/1090
feat (//cpp): Using atol and rtol based tolerance threshold for torchtrtc by @andi4191 in https://github.com/pytorch/TensorRT/pull/1052
CI/CD setup by @frank-wei in https://github.com/pytorch/TensorRT/pull/1137
Update README.md by @frank-wei in https://github.com/pytorch/TensorRT/pull/1142
[fx2trt] Engineholder feature improvement, test fixes by @frank-wei in https://github.com/pytorch/TensorRT/pull/1143
feat (//core/conversion) : Add converter for torch.bitwise_not by @blchu in https://github.com/pytorch/TensorRT/pull/1029
fixed typos by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1098
[FX] --fx-only does not need to check bazel by @frank-wei in https://github.com/pytorch/TensorRT/pull/1147
[FX] refactor the fx path in compile function by @frank-wei in https://github.com/pytorch/TensorRT/pull/1141
[FX] Create getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1145
[FX] move example folder by @frank-wei in https://github.com/pytorch/TensorRT/pull/1149
[FX] Sync enhancement done internally at Meta by @yinghai in https://github.com/pytorch/TensorRT/pull/1161
Update config.yml by @frank-wei in https://github.com/pytorch/TensorRT/pull/1163
Use py3 next() syntax by @ptrblck in https://github.com/pytorch/TensorRT/pull/1159
Add missing comma for proper torch versioning in setup.py by @dabauxi in https://github.com/pytorch/TensorRT/pull/1164
[docs] Update link to relative path by @zhiqwang in https://github.com/pytorch/TensorRT/pull/1171
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1172
fix: fix the model name typo error by @bowang007 in https://github.com/pytorch/TensorRT/pull/1176
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1178
[feat]: support slice with dynamic shape by @inocsin in https://github.com/pytorch/TensorRT/pull/1110
[FX] Update getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1184
[FX] Update README.md by @frank-wei in https://github.com/pytorch/TensorRT/pull/1183
fix: Fix PTQ calibration when there are multiple inputs by @peri044 in https://github.com/pytorch/TensorRT/pull/1191
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1194
[fix]: fix bug in aten::to, when network only have aten::to layer wil… by @inocsin in https://github.com/pytorch/TensorRT/pull/1108
Add .circleci/config.yml by @narendasan in https://github.com/pytorch/TensorRT/pull/1153
feat: Upgrade TRT to 8.4 by @peri044 in https://github.com/pytorch/TensorRT/pull/1152
feat: Update Pytorch version to 1.12 by @peri044 in https://github.com/pytorch/TensorRT/pull/1177
fix: converter renaming already named tensors by @bowang007 in https://github.com/pytorch/TensorRT/pull/1167
feat(//py): Use TensorRT to fill in .so libraries automatically if possible by @narendasan in https://github.com/pytorch/TensorRT/pull/1085
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1204
fix: fix the parsing related model loading bug by @bowang007 in https://github.com/pytorch/TensorRT/pull/1148
feat: support min_block_size != 1 caused fallback nodes re-segmentation by @bowang007 in https://github.com/pytorch/TensorRT/pull/1195
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1208
fix: fix the fallback related issue after merging collection by @bowang007 in https://github.com/pytorch/TensorRT/pull/1206
Add CMake support to build the libraries by @gcuendet in https://github.com/pytorch/TensorRT/pull/1058
Fix typo in EfficientNet-example by @davinnovation in https://github.com/pytorch/TensorRT/pull/1217
fix: fix bug that ListConstruct in TRT subgraph when it's entire graph's output by @bowang007 in https://github.com/pytorch/TensorRT/pull/1220
fix: fix the error that collection input segmented into trt subgraph by @bowang007 in https://github.com/pytorch/TensorRT/pull/1225
feat(//circleci): Adding release automation by @narendasan in https://github.com/pytorch/TensorRT/pull/1215
fix: support int tensor * int scaler in aten::mul by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1095
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1221
Fix errors in unbind and list slice by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1088
Adding a Resnet C++ example by @vinhngx in https://github.com/pytorch/TensorRT/pull/1175
[FX] disable 2 of conv3d and type_as tests by @frank-wei in https://github.com/pytorch/TensorRT/pull/1224
[feat] Add support for integers in aten::abs converter (#35) by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1232
Update PTQ example to fix new compile_spec requirements by @ncomly-nvidia in https://github.com/pytorch/TensorRT/pull/1242
feat: support for grouped inputs by @narendasan in https://github.com/pytorch/TensorRT/pull/1201
feat: Added support for custom torch operators and converters in torchtrtc by @andi4191 in https://github.com/pytorch/TensorRT/pull/1219
Add outputPadding in deconv by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1234
chore: Apply linting and ignore new bazel dirs by @narendasan in https://github.com/pytorch/TensorRT/pull/1223
added qat-ptq workflow notebook by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1239
fix: Update cmake for the new collection files by @narendasan in https://github.com/pytorch/TensorRT/pull/1246
chore: ignore dist dir for pre-commit by @narendasan in https://github.com/pytorch/TensorRT/pull/1249
chore: Aligning bazel version for consistency across different docker… by @andi4191 in https://github.com/pytorch/TensorRT/pull/1250
refactor: Changed the hardcoded values to macros for DLA memory sizes by @andi4191 in https://github.com/pytorch/TensorRT/pull/1247
chore: update jetson pytorch baase by @narendasan in https://github.com/pytorch/TensorRT/pull/1251
[feat] Add automatic type promotion to element-wise ops by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1240
Assorted small fixes by @narendasan in https://github.com/pytorch/TensorRT/pull/1259
[FX] remove op_lowering_disallow_list and format revert by @frank-wei in https://github.com/pytorch/TensorRT/pull/1261
fix: fix the "schema not found for node" error by @bowang007 in https://github.com/pytorch/TensorRT/pull/1236
chore: Fix contributing doc by @peri044 in https://github.com/pytorch/TensorRT/pull/1268
feat: support scatter.value and scatter.src by @inocsin in https://github.com/pytorch/TensorRT/pull/1252
Internal workspace workflow by @narendasan in https://github.com/pytorch/TensorRT/pull/1269
Fix typo in README by @davinnovation in https://github.com/pytorch/TensorRT/pull/1273
Support swin/bert with dynamic batch by @Njuapp in https://github.com/pytorch/TensorRT/pull/1270
Update release 1.2 by @narendasan in https://github.com/pytorch/TensorRT/pull/1275
correct sha256sum of cudnn by @Njuapp in https://github.com/pytorch/TensorRT/pull/1278
Update release branch by @narendasan in https://github.com/pytorch/TensorRT/pull/1279
Jetson workspace by @narendasan in https://github.com/pytorch/TensorRT/pull/1280
chore(deps): bump @actions/core from 1.8.2 to 1.9.1 in /.github/actions/assigner by @dependabot in https://github.com/pytorch/TensorRT/pull/1287
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1288
chore: Fix dataloader in finetune_qat script by @andi4191 in https://github.com/pytorch/TensorRT/pull/1292
chore: Truncate long and double for ptq CPP path by @andi4191 in https://github.com/pytorch/TensorRT/pull/1291
feat: Add support for aten::square by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1286
fix: fix misleading skipping partitioning msg by @bowang007 in https://github.com/pytorch/TensorRT/pull/1289
fix: Add int support to constant_pad_nd by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1283
fix: Resolve non-determinism in registerSegmentsOutputs by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1284
docs: Update docgen task by @narendasan in https://github.com/pytorch/TensorRT/pull/1294
update fx notebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1297
[FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1299
fix(tools): Fix linter to not depend on docker by @narendasan in https://github.com/pytorch/TensorRT/pull/1301
Update release branch by @narendasan in https://github.com/pytorch/TensorRT/pull/1300
Update release branch by @narendasan in https://github.com/pytorch/TensorRT/pull/1307
Support multiple indices for aten::index.Tensor by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1309
chore: Adding CMake to the CI by @narendasan in https://github.com/pytorch/TensorRT/pull/1310
feat: Upgrade Pytorch to 1.12.1 and TensorRT to 8.4.3.1 by @peri044 in https://github.com/pytorch/TensorRT/pull/1315
Fix bug: correct the output shape of aten::index.Tensor by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1314
feat (//core/conversion) : Add converter for torch.repeat_interleave ( by @blchu in https://github.com/pytorch/TensorRT/pull/1313
chore: Adding NGC build path by @narendasan in https://github.com/pytorch/TensorRT/pull/1311
Update release by @narendasan in https://github.com/pytorch/TensorRT/pull/1320
Update lower.py by @frank-wei in https://github.com/pytorch/TensorRT/pull/1324
fix!: Fixed Windows compilation failures by @andi4191 in https://github.com/pytorch/TensorRT/pull/1330
[feat] Add support for argmax and argmin by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1312
chore: Adding a guideline to build on Windows platform by @andi4191 in https://github.com/pytorch/TensorRT/pull/1337
chore: Fix data loader issues and nox file paths by @peri044 in https://github.com/pytorch/TensorRT/pull/1281
feat(//tools/perf): Refactor perf_run.py, add fx2trt backend support, usage via CLI arguments by @peri044 in https://github.com/pytorch/TensorRT/pull/1254
refactor(//tests) : Refactor the test suite by @peri044 in https://github.com/pytorch/TensorRT/pull/1329
[feat] add support for aten::reciprocal(int) by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1308
Update release branch with latest test fixes by @narendasan in https://github.com/pytorch/TensorRT/pull/1339
[FX] Update getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1342
Update getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1343
enable direct call to fx.compile() by @frank-wei in https://github.com/pytorch/TensorRT/pull/1344
fix: add remove_exception pass from torch to fix uninitialized tensor… by @bowang007 in https://github.com/pytorch/TensorRT/pull/1345
chore: apply linting to docs by @narendasan in https://github.com/pytorch/TensorRT/pull/1347
Update release branch by @narendasan in https://github.com/pytorch/TensorRT/pull/1348

New Contributors

@facebook-github-bot made their first contribution in https://github.com/pytorch/TensorRT/pull/1061
@frank-wei made their first contribution in https://github.com/pytorch/TensorRT/pull/1064
@khabinov made their first contribution in https://github.com/pytorch/TensorRT/pull/1120
@blchu made their first contribution in https://github.com/pytorch/TensorRT/pull/1029
@yinghai made their first contribution in https://github.com/pytorch/TensorRT/pull/1161
@ptrblck made their first contribution in https://github.com/pytorch/TensorRT/pull/1159
@dabauxi made their first contribution in https://github.com/pytorch/TensorRT/pull/1164
@zhiqwang made their first contribution in https://github.com/pytorch/TensorRT/pull/1171
@gcuendet made their first contribution in https://github.com/pytorch/TensorRT/pull/1058
@davinnovation made their first contribution in https://github.com/pytorch/TensorRT/pull/1217
@dependabot made their first contribution in https://github.com/pytorch/TensorRT/pull/1287

Full Changelog: https://github.com/pytorch/TensorRT/compare/v1.1.0...v1.2.0

v1.1.1

1 year ago

Adding support for Torch-TensorRT on Jetpack 5.0 Developer Preview

Torch-TensorRT 1.1.1 is a patch release for Torch-TensorRT 1.1 that targets PyTorch 1.11, CUDA 11.4/11.3, TensorRT 8.4 EA/8.2 and cuDNN 8.3/8.2 intended to add support for Torch-TensorRT on Jetson / Jetpack 5.0 DP. As this release is primarily targeted at adding support for Jetpack 5.0DP for the 1.1 feature set we will not be distributing pre-compiled binaries for this release so as not to break compatibility with the current stack for existing users who install directly from GitHub. Please follow the instructions for installation on Jetson in the documentation to install this release: https://pytorch.org/TensorRT/tutorials/installation.html#compiling-from-source

Known Limitations

We have observed in testing, higher than normal numerical instability on Jetpack 5.0 DP. These issues are not observed on x86_64 based platforms. This numerical instability has not been found to decrease model accuracy in our test suite.

What's Changed

feat: Upgrade TensorRT to 8.4 EA by @peri044 in https://github.com/pytorch/TensorRT/pull/1158

Full Changelog: https://github.com/pytorch/TensorRT/compare/v1.1.0...v1.1.1

Operators Supported

Operators Currently Supported Through Converters

aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
aten::abs(Tensor self) -> (Tensor)
aten::acos(Tensor self) -> (Tensor)
aten::acosh(Tensor self) -> (Tensor)
aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
aten::adaptive_avg_pool3d(Tensor self, int[3] output_size) -> (Tensor)
aten::adaptive_max_pool1d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
aten::adaptive_max_pool3d(Tensor self, int[3] output_size) -> (Tensor, Tensor)
aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::asin(Tensor self) -> (Tensor)
aten::asinh(Tensor self) -> (Tensor)
aten::atan(Tensor self) -> (Tensor)
aten::atanh(Tensor self) -> (Tensor)
aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
aten::ceil(Tensor self) -> (Tensor)
aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
aten::cos(Tensor self) -> (Tensor)
aten::cosh(Tensor self) -> (Tensor)
aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::div.Tensor_mode(Tensor self, Tensor other, *, str? rounding_mode) -> (Tensor)
aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::erf(Tensor self) -> (Tensor)
aten::exp(Tensor self) -> (Tensor)
aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
aten::floor(Tensor self) -> (Tensor)
aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
aten::index.Tensor(Tensor self, Tensor?[] indices) -> (Tensor)
aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
aten::log(Tensor self) -> (Tensor)
aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
aten::matmul(Tensor self, Tensor other) -> (Tensor)
aten::max(Tensor self) -> (Tensor)
aten::max.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
aten::max.other(Tensor self, Tensor other) -> (Tensor)
aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::min(Tensor self) -> (Tensor)
aten::min.other(Tensor self, Tensor other) -> (Tensor)
aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::neg(Tensor self) -> (Tensor)
aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
aten::prelu(Tensor self, Tensor weight) -> (Tensor)
aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::reciprocal(Tensor self) -> (Tensor)
aten::reflection_pad1d(Tensor self, int[2] padding) -> (Tensor)
aten::reflection_pad2d(Tensor self, int[4] padding) -> (Tensor)
aten::relu(Tensor input) -> (Tensor)
aten::relu_(Tensor(a!) self) -> (Tensor(a!))
aten::repeat(Tensor self, int[] repeats) -> (Tensor)
aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
aten::reshape(Tensor self, int[] shape) -> (Tensor)
aten::roll(Tensor self, int[1] shifts, int[1] dims=[]) -> (Tensor)
aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
aten::sigmoid(Tensor input) -> (Tensor)
aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
aten::sin(Tensor self) -> (Tensor)
aten::sinh(Tensor self) -> (Tensor)
aten::slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> (Tensor(a))
aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::sqrt(Tensor self) -> (Tensor)
aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
aten::sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::t(Tensor self) -> (Tensor)
aten::tan(Tensor self) -> (Tensor)
aten::tanh(Tensor input) -> (Tensor)
aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
aten::to.device(Tensor(a) self, Device device, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor(a))
aten::to.dtype(Tensor self, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
aten::to.other(Tensor self, Tensor other, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(a|b))
aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
aten::unbind.int(Tensor(a -> *) self, int dim=0) -> (Tensor[])
aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
trt::const(Tensor self) -> (Tensor)

Operators Currently Supported Through Evaluators

aten::Bool.float(float b) -> (bool)
aten::Bool.int(int a) -> (bool)
aten::Float.Scalar(Scalar a) -> float
aten::Float.bool(bool a) -> float
aten::Float.int(int a) -> float
aten::Int.Scalar(Scalar a) -> int
aten::Int.bool(bool a) -> int
aten::Int.float(float a) -> int
aten::Int.int(int a) -> int
aten::and(int a, int b) -> (bool)
aten::and.bool(bool a, bool b) -> (bool)
aten::getitem.t(t list, int idx) -> (t(*))
aten::is(t1 self, t2 obj) -> bool
aten::isnot(t1 self, t2 obj) -> bool
aten::not(bool self) -> bool
aten::or(int a, int b) -> (bool)
aten::__range_length(int lo, int hi, int step) -> int
aten::__round_to_zero_floordiv(int a, int b) -> (int)
aten::xor(int a, int b) -> (bool)
aten::add.float(float a, float b) -> (float)
aten::add.int(int a, int b) -> (int)
aten::add.str(str a, str b) -> (str)
aten::add_.t(t self, t[] b) -> (t[])
aten::append.t(t self, t(c -> *) el) -> (t)
aten::arange(Scalar end, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::arange.start_step(Scalar start, Scalar end, Scalar step, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::clone(Tensor self, *, int? memory_format=None) -> (Tensor)
aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
aten::dim(Tensor self) -> int
aten::div.float(float a, float b) -> (float)
aten::div.int(int a, int b) -> (float)
aten::eq.bool(bool a, bool b) -> (bool)
aten::eq.float(float a, float b) -> (bool)
aten::eq.float_int(float a, int b) -> (bool)
aten::eq.int(int a, int b) -> (bool)
aten::eq.int_float(int a, float b) -> (bool)
aten::eq.str(str a, str b) -> (bool)
aten::extend.t(t self, t[] other) -> ()
aten::floor.float(float a) -> (int)
aten::floor.int(int a) -> (int)
aten::floordiv.float(float a, float b) -> (int)
aten::floordiv.int(int a, int b) -> (int)
aten::format(str self, ...) -> (str)
aten::ge.bool(bool a, bool b) -> (bool)
aten::ge.float(float a, float b) -> (bool)
aten::ge.float_int(float a, int b) -> (bool)
aten::ge.int(int a, int b) -> (bool)
aten::ge.int_float(int a, float b) -> (bool)
aten::gt.bool(bool a, bool b) -> (bool)
aten::gt.float(float a, float b) -> (bool)
aten::gt.float_int(float a, int b) -> (bool)
aten::gt.int(int a, int b) -> (bool)
aten::gt.int_float(int a, float b) -> (bool)
aten::is_floating_point(Tensor self) -> (bool)
aten::le.bool(bool a, bool b) -> (bool)
aten::le.float(float a, float b) -> (bool)
aten::le.float_int(float a, int b) -> (bool)
aten::le.int(int a, int b) -> (bool)
aten::le.int_float(int a, float b) -> (bool)
aten::len.t(t[] a) -> (int)
aten::lt.bool(bool a, bool b) -> (bool)
aten::lt.float(float a, float b) -> (bool)
aten::lt.float_int(float a, int b) -> (bool)
aten::lt.int(int a, int b) -> (bool)
aten::lt.int_float(int a, float b) -> (bool)
aten::mul.float(float a, float b) -> (float)
aten::mul.int(int a, int b) -> (int)
aten::ne.bool(bool a, bool b) -> (bool)
aten::ne.float(float a, float b) -> (bool)
aten::ne.float_int(float a, int b) -> (bool)
aten::ne.int(int a, int b) -> (bool)
aten::ne.int_float(int a, float b) -> (bool)
aten::neg.int(int a) -> (int)
aten::numel(Tensor self) -> int
aten::pow.float(float a, float b) -> (float)
aten::pow.float_int(float a, int b) -> (float)
aten::pow.int(int a, int b) -> (float)
aten::pow.int_float(int a, float b) -> (float)
aten::size(Tensor self) -> (int[])
aten::size.int(Tensor self, int dim) -> (int)
aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
aten::sqrt.float(float a) -> (float)
aten::sqrt.int(int a) -> (float)
aten::sub.float(float a, float b) -> (float)
aten::sub.int(int a, int b) -> (int)
aten::tensor(t[] data, *, int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor)
prim::dtype(Tensor a) -> (int)
prim::max.bool(bool a, bool b) -> (bool)
prim::max.float(float a, float b) -> (bool)
prim::max.float_int(float a, int b) -> (bool)
prim::max.int(int a, int b) -> (bool)
prim::max.int_float(int a, float b) -> (bool)
prim::max.self_int(int[] self) -> (int)
prim::min.bool(bool a, bool b) -> (bool)
prim::min.float(float a, float b) -> (bool)
prim::min.float_int(float a, int b) -> (bool)
prim::min.int(int a, int b) -> (bool)
prim::min.int_float(int a, float b) -> (bool)
prim::min.self_int(int[] self) -> (int)
prim::shape(Tensor a) -> (int[])

v1.1.0

2 years ago

Support for PyTorch 1.11, Various Bug Fixes, Partial `aten::Int` support, New Debugging Tools, Removing Max Batch Size

Torch-TensorRT 1.1.0 targets PyTorch 1.11, CUDA 11.3, cuDNN 8.2 and TensorRT 8.2. Due to recent JetPack upgrades, this release does not support Jetson (Jetpack 5.0DP or otherwise). Jetpack 5.0DP support will arrive in a mid-cycle release (Torch-TensorRT 1.1.x) along with support for TensorRT 8.4. 1.1.0 also drops support for Python 3.6 as it has reached end of life. Following 1.0.0, this release is focused on stabilizing and improving the core of Torch-TensorRT. Many improvements have been made to the partitioning system addressing limitation many users hit while trying to partially compile PyTorch modules. Torch-TensorRT 1.1.0 also addresses a long standing issue with aten::Int operators (albeit) partially. Now certain common patterns which use aten::Int can be handled by the compiler without resorting to partial compilation. Most notably, this means that models like BERT can be run end to end with Torch-TensorRT, resulting in significant performance gains.

New Debugging Tools

With this release we are introducing new syntax sugar that can be used to more easily debug Torch-TensorRT compilation and execution through the use of context managers. For example, in Torch-TensorRT 1.0.0 this may be a common pattern to turn on then turn off debug info:

import torch_tensorrt
...
torch_tensorrt.logging.set_reportable_log_level(torch_tensorrt.logging.Level.Debug)
trt_module = torch_tensorrt.compile(my_module, ...)
torch_tensorrt.logging.set_reportable_log_level(torch_tensorrt.logging.Level.Warning)
results = trt_module(input_tensors)

With Torch-TensorRT 1.1.0, this now can be done with the following code:

import torch_tensorrt
...
with torch_tensorrt.logging.debug():
    trt_module = torch_tensorrt.compile(my_module,...)
results = trt_module(input_tensors)

You can also use this API to debug the Torch-TensorRT runtime as well:

import torch_tensorrt
torch_tensorrt.logging.set_reportable_log_level(torch_tensorrt.logging.Level.Error)
...
trt_module = torch_tensorrt.compile(my_module,...)
with torch_tensorrt.logging.warnings():
    results = trt_module(input_tensors)

The following levels are available:


# Only internal TensorRT failures will be logged
with torch_tensorrt.logging.internal_errors():

# Internal TensorRT failures + Torch-TensorRT errors will be logged
with torch_tensorrt.logging.errors():

# All Errors plus warnings will be logged
with torch_tensorrt.logging.warnings():

# First verbosity level, information about major steps occurring during compilation and execution
with torch_tensorrt.logging.info():

# Second verbosity level, each step is logged + information about compiler state will be outputted
with torch_tensorrt.logging.debug():

# Third verbosity level, all above information + intermediate transformations of the graph during lowering
with torch_tensorrt.logging.graphs():

Removing Max Batch Size, Strict Types

In this release we are removing the max_batch_size and strict_types settings. These settings directly corresponded to the TensorRT settings, however were not always respected which often lead to confusion. Therefore we thought it best to disable these features as deterministic behavior could not be ensured.

Porting forward from `max_batch_size`, `strict_types`:

max_batch_size: The first dim in shapes provided to Torch-TensorRT are considered batch dimensions, therefore instead of setting max_batch_size, you can just use the Input objects directly
strict_types: A replacement with more deterministic behavior will come with an upcoming TensorRT release.

Dependencies

- Bazel 5.1.1
- LibTorch 1.11.0
- CUDA 11.3 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build)
- cuDNN 8.2.4.15
- TensorRT 8.2.4.2

1.1.0 (2022-05-10)

Bug Fixes

add at::adaptive_avg_pool1d in interpolate plugin and fix #791 (deb9f74)
Added ipywidget dependency to notebook (0b2040a)
Added test case names (296e98a)
Added truncate_long_and_double (417c096)
Adding truncate_long_and_double to ptq tests (3a0640a)
Avoid resolving non-tensor inputs to torch segment_blocks unneccessarily (3e090ee)
Considering rtol and atol in threshold comparison for floating point numbers (0b0ba8d)
Disabled mobilenet_v2 test for DLFW CI (40c611f)
fix bug that python api doesn't pass truncate_long_and_double value to internal.partition_info (828336d)
fix bugs in aten::to (2ecd187)
Fix BUILD file for tests/accuracy (8b0170e)
Fix existing uninstallation of Torch-TRT (9ddd7a8)
Fix for torch scripted module faiure with DLFW (88c02d9)
Fix fuse addmm pass (58e9ea0)
Fix pre_built name change in bazelrc (3ecee21)
fix the bug that introduces kLong Tensor in prim::NumToTensor (2c3e1d9)
Fix when TRT prunes away an output (9465e1d)
Fixed bugs and addressed review comments (588e1d1)
Fixed failures for host deps sessions (ec2232f)
Fixed typo in the path (43fab56)
Getting unsupported ops will now bypass non-schema ops avoiding redundant failures (d7d1511)
Guard test activation for CI testing (6d1a1fd)
Implement a patch for gelu schema change in older NGC containers (9ee3a04)
Missing log severity (6a4daef)
Preempt torch package override via timm in nox session (8964d1b)
refactor the resegmentation for TensorRT segments in ResolveNonTensorInput (3cc2dfb)
remove outdated member variables (0268da2)
Removed models directory dependencies (c4413e1)
Resolve issues in exception elmination pass (99cea1b)
Review comments incorporated (962660d)
Review comments incorporated (e9865c2)
support dict type for input in shape analysis (630f9c4)
truncate_long_and_double incur torchscript inference issues (c83aa15)
Typo fix for test case name (2a516b2)
Update "reduceAxes" variable in GlobalPoolingConverter function and add corresponding uTests (f6f5e3e)
//core/conversion/evaluators: Change how schemas are handled (20e5d41)
Update base container for dockerfile (1b3245a)
//core: Take user setting in the case we can't determine the (01c89d1), closes #814
Update test for new Exception syntax (2357099)
//core/conversion: Add special case for If and Loop (eacde8d)
//core/runtime: Support more delimiter variants (819c911)
//cpp/bin/torchtrtc: Fix mbs (aca175f)
//docsrc: Fix dependencies for docgen (806e663)
//notebooks: Render citrinet (12dbda1)
//py: Constrain the CUDA version in container builds (a21a045)
Use user provided dtype when we can't infer it from the graph (14650d1)

Code Refactoring

removing the strict_types and max_batch_size apis (b30cbd9)
Rename enabled precisions arugment to (10957eb)
Removing the max-batch-size argument (03bafc5)

Features

//core/conversion: Better tooling for debugging (c5c5c47)
//core/conversion/evaluators: aten::pow support (c4fdfcb)
//docker: New base container to let master build in container (446bf18)
//py: Context managers to quickly switch logging level (12e470f)
Add converter files for reflection pad 1d and 2d (406d860)
Add converter files for torch::max (f628aca)
Add converter files for torch::max (569bcde)
Add converter files for torch::max (dd7a44e)
Add converter for reflection pad 1d and 2d operation (2484a43)
Added comprehensive perf benchmark script (a8016ff)
Added compute capability for Orin (af3d0ff)
Added env var for TOP_DIR (c26180e)
Added Python accuracy tests using Nox (6ae8652)
Enable prim::DictConstruct to fallback without conversion check error (01d98c7)
Handle empty schemas for unsupported ops (bf6c929)
Implement fast approximation of Gelu as lowering pass to improve performance (8024ea2)
Implement lowering for aten::to.dtype schema (4b3ae3a)
Implement test case for aten::to.dtype lowering (bde8ee0)
Perf benchmark initial draft (f2d1655)
replace view with reshape during lowering (d39b918)
Review comment incorporated (161ef3d)
support aten::adaptive_max_pool1d, aten::adaptive_avg_pool3d and aten::adaptive_max_pool3d operators (e554dbd)
support aten::div.Tensor_mode (bb3046a)
support aten::extend evaluator (33c523d)
support aten::format evaluator (3a33d33)
Update Pytorch version to 1.11 (c009a1f)
Upgrade TensorRT to 8.2.4.2 (f1f151b)
//tests: Adding BERT to the test suite (7996a10)
aten::__range_length: Adding range length evaluator (11c4608)
aten::add: adding string concat evaluator (65dbf90)
aten::Int: Adding a new pass to remove single use (46ac757)
aten::Int: Lowers out aten::Int (908340f)
core//conversion: Implement converter for torch unbind (268a49b)

BREAKING CHANGES

This commit removes the strict types and max_batch_size apis. We are doing this because the functionality of these APIs in TRT is convoluted and likely to be ignored during building. A replacement for strict types with actual guarantees will be added at a later date.

Signed-off-by: Dheeraj Peri [email protected]

This is a minor change but may cause scripts using torchtrtc to fail. We are renaming enabled-precisions to enable-precision since it makes more sense as the argument can be repeated

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

This PR removes --max-batch-size from the CLI as it has no real functional effect

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

Operators Supported

Operators Currently Supported Through Converters

aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
aten::abs(Tensor self) -> (Tensor)
aten::acos(Tensor self) -> (Tensor)
aten::acosh(Tensor self) -> (Tensor)
aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
aten::adaptive_avg_pool3d(Tensor self, int[3] output_size) -> (Tensor)
aten::adaptive_max_pool1d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
aten::adaptive_max_pool3d(Tensor self, int[3] output_size) -> (Tensor, Tensor)
aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::asin(Tensor self) -> (Tensor)
aten::asinh(Tensor self) -> (Tensor)
aten::atan(Tensor self) -> (Tensor)
aten::atanh(Tensor self) -> (Tensor)
aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
aten::ceil(Tensor self) -> (Tensor)
aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
aten::cos(Tensor self) -> (Tensor)
aten::cosh(Tensor self) -> (Tensor)
aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::div.Tensor_mode(Tensor self, Tensor other, *, str? rounding_mode) -> (Tensor)
aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::erf(Tensor self) -> (Tensor)
aten::exp(Tensor self) -> (Tensor)
aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
aten::floor(Tensor self) -> (Tensor)
aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
aten::index.Tensor(Tensor self, Tensor?[] indices) -> (Tensor)
aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
aten::log(Tensor self) -> (Tensor)
aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
aten::matmul(Tensor self, Tensor other) -> (Tensor)
aten::max(Tensor self) -> (Tensor)
aten::max.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
aten::max.other(Tensor self, Tensor other) -> (Tensor)
aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::min(Tensor self) -> (Tensor)
aten::min.other(Tensor self, Tensor other) -> (Tensor)
aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::neg(Tensor self) -> (Tensor)
aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
aten::prelu(Tensor self, Tensor weight) -> (Tensor)
aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::reciprocal(Tensor self) -> (Tensor)
aten::reflection_pad1d(Tensor self, int[2] padding) -> (Tensor)
aten::reflection_pad2d(Tensor self, int[4] padding) -> (Tensor)
aten::relu(Tensor input) -> (Tensor)
aten::relu_(Tensor(a!) self) -> (Tensor(a!))
aten::repeat(Tensor self, int[] repeats) -> (Tensor)
aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
aten::reshape(Tensor self, int[] shape) -> (Tensor)
aten::roll(Tensor self, int[1] shifts, int[1] dims=[]) -> (Tensor)
aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
aten::sigmoid(Tensor input) -> (Tensor)
aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
aten::sin(Tensor self) -> (Tensor)
aten::sinh(Tensor self) -> (Tensor)
aten::slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> (Tensor(a))
aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::sqrt(Tensor self) -> (Tensor)
aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
aten::sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::t(Tensor self) -> (Tensor)
aten::tan(Tensor self) -> (Tensor)
aten::tanh(Tensor input) -> (Tensor)
aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
aten::to.device(Tensor(a) self, Device device, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor(a))
aten::to.dtype(Tensor self, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
aten::to.other(Tensor self, Tensor other, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(a|b))
aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
aten::unbind.int(Tensor(a -> *) self, int dim=0) -> (Tensor[])
aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
trt::const(Tensor self) -> (Tensor)

Operators Currently Supported Through Evaluators

aten::Bool.float(float b) -> (bool)
aten::Bool.int(int a) -> (bool)
aten::Float.Scalar(Scalar a) -> float
aten::Float.bool(bool a) -> float
aten::Float.int(int a) -> float
aten::Int.Scalar(Scalar a) -> int
aten::Int.bool(bool a) -> int
aten::Int.float(float a) -> int
aten::Int.int(int a) -> int
aten::and(int a, int b) -> (bool)
aten::and.bool(bool a, bool b) -> (bool)
aten::getitem.t(t list, int idx) -> (t(*))
aten::is(t1 self, t2 obj) -> bool
aten::isnot(t1 self, t2 obj) -> bool
aten::not(bool self) -> bool
aten::or(int a, int b) -> (bool)
aten::__range_length(int lo, int hi, int step) -> int
aten::__round_to_zero_floordiv(int a, int b) -> (int)
aten::xor(int a, int b) -> (bool)
aten::add.float(float a, float b) -> (float)
aten::add.int(int a, int b) -> (int)
aten::add.str(str a, str b) -> (str)
aten::add_.t(t self, t[] b) -> (t[])
aten::append.t(t self, t(c -> *) el) -> (t)
aten::arange(Scalar end, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::arange.start_step(Scalar start, Scalar end, Scalar step, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::clone(Tensor self, *, int? memory_format=None) -> (Tensor)
aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
aten::dim(Tensor self) -> int
aten::div.float(float a, float b) -> (float)
aten::div.int(int a, int b) -> (float)
aten::eq.bool(bool a, bool b) -> (bool)
aten::eq.float(float a, float b) -> (bool)
aten::eq.float_int(float a, int b) -> (bool)
aten::eq.int(int a, int b) -> (bool)
aten::eq.int_float(int a, float b) -> (bool)
aten::eq.str(str a, str b) -> (bool)
aten::extend.t(t self, t[] other) -> ()
aten::floor.float(float a) -> (int)
aten::floor.int(int a) -> (int)
aten::floordiv.float(float a, float b) -> (int)
aten::floordiv.int(int a, int b) -> (int)
aten::format(str self, ...) -> (str)
aten::ge.bool(bool a, bool b) -> (bool)
aten::ge.float(float a, float b) -> (bool)
aten::ge.float_int(float a, int b) -> (bool)
aten::ge.int(int a, int b) -> (bool)
aten::ge.int_float(int a, float b) -> (bool)
aten::gt.bool(bool a, bool b) -> (bool)
aten::gt.float(float a, float b) -> (bool)
aten::gt.float_int(float a, int b) -> (bool)
aten::gt.int(int a, int b) -> (bool)
aten::gt.int_float(int a, float b) -> (bool)
aten::is_floating_point(Tensor self) -> (bool)
aten::le.bool(bool a, bool b) -> (bool)
aten::le.float(float a, float b) -> (bool)
aten::le.float_int(float a, int b) -> (bool)
aten::le.int(int a, int b) -> (bool)
aten::le.int_float(int a, float b) -> (bool)
aten::len.t(t[] a) -> (int)
aten::lt.bool(bool a, bool b) -> (bool)
aten::lt.float(float a, float b) -> (bool)
aten::lt.float_int(float a, int b) -> (bool)
aten::lt.int(int a, int b) -> (bool)
aten::lt.int_float(int a, float b) -> (bool)
aten::mul.float(float a, float b) -> (float)
aten::mul.int(int a, int b) -> (int)
aten::ne.bool(bool a, bool b) -> (bool)
aten::ne.float(float a, float b) -> (bool)
aten::ne.float_int(float a, int b) -> (bool)
aten::ne.int(int a, int b) -> (bool)
aten::ne.int_float(int a, float b) -> (bool)
aten::neg.int(int a) -> (int)
aten::numel(Tensor self) -> int
aten::pow.float(float a, float b) -> (float)
aten::pow.float_int(float a, int b) -> (float)
aten::pow.int(int a, int b) -> (float)
aten::pow.int_float(int a, float b) -> (float)
aten::size(Tensor self) -> (int[])
aten::size.int(Tensor self, int dim) -> (int)
aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
aten::sqrt.float(float a) -> (float)
aten::sqrt.int(int a) -> (float)
aten::sub.float(float a, float b) -> (float)
aten::sub.int(int a, int b) -> (int)
aten::tensor(t[] data, *, int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor)
prim::dtype(Tensor a) -> (int)
prim::max.bool(bool a, bool b) -> (bool)
prim::max.float(float a, float b) -> (bool)
prim::max.float_int(float a, int b) -> (bool)
prim::max.int(int a, int b) -> (bool)
prim::max.int_float(int a, float b) -> (bool)
prim::max.self_int(int[] self) -> (int)
prim::min.bool(bool a, bool b) -> (bool)
prim::min.float(float a, float b) -> (bool)
prim::min.float_int(float a, int b) -> (bool)
prim::min.int(int a, int b) -> (bool)
prim::min.int_float(int a, float b) -> (bool)
prim::min.self_int(int[] self) -> (int)
prim::shape(Tensor a) -> (int[])

v1.0.0

2 years ago

New Name!, Support for PyTorch 1.10, CUDA 11.3, New Packaging and Distribution Options, Stabilized APIs, Stabilized Partial Compilation, Adjusted Default Behavior, Usability Improvements, New Converters, Bug Fixes

This is the first stable release of Torch-TensorRT targeting PyTorch 1.10, CUDA 11.3 (on x86_64, CUDA 10.2 on aarch64), cuDNN 8.2 and TensorRT 8.0 with backwards compatible source for TensorRT 7.1. On aarch64 TRTorch targets Jetpack 4.6 primarily with backwards compatible source for Jetpack 4.5. This version also removes deprecated APIs such as InputRange and op_precision

New Name

TRTorch is now Torch-TensorRT! TRTorch started out as a small experimental project compiling TorchScript to TensorRT almost two years ago and now as we are hitting v1.0.0 with APIs and major features stabilizing we felt that the name of the project should reflect the ecosystem of tools it is joining with this release, namely TF-TRT (https://blog.tensorflow.org/2021/01/leveraging-tensorflow-tensorrt-integration.html) and MXNet-TensorRT(https://mxnet.apache.org/versions/1.8.0/api/python/docs/tutorials/performance/backend/tensorrt/tensorrt). Since we were already significantly changing APIs with this release to reflect what we learned over the last two years of using TRTorch, we felt this is was the right time to change the name as well.

The overall process to port forward from TRTorch is as follows:

Python
- The library has been renamed from trtorch to torch_tensorrt
- Components that used to all live under the trtorch namespace have now been separated. IR agnostic components: torch_tensorrt.Input, torch_tensorrt.Device, torch_tensorrt.ptq, torch_tensorrt.logging will continue to live under the top level namespace. IR specific components like torch_tensorrt.ts.compile, torch_tensorrt.ts.convert_method_to_trt_engine, torch_tensorrt.ts.TensorRTCompileSpec will live in a TorchScript specific namespace. This gives us space to explore the other IRs that might be relevant to the project in the future. In the place of the old top level compile and convert_method_to_engine are new ones which will call the IR specific versions based on what is provided to them. This also means that you can now provide a raw torch.nn.Module to torch_tensorrt.compile and Torch-TensorRT will handle the TorchScripting step for you. For the most part the sole change that will be needed to change over namespaces is to exchange trtorch to torch_tensorrt
C++
- Similar to Python the namespaces in C++ have changed from trtorch to torch_tensorrt and components specific to the IR like compile, convert_method_to_trt_engine and CompileSpec are in a torchscript namespace, while agnostic components are at the top level. Namespace aliases for torch_tensorrt -> torchtrt and torchscript -> ts are included. Again the port forward process for namespaces should be a find and replace. Finally the libraries libtrtorch.so, libtrtorchrt.so and libtrtorch_plugins.so have been renamed to libtorchtrt.so, libtorchtrt_runtime.so and libtorchtrt_plugins.so respectively.
CLI:
- trtorch has been renamed to torchtrtc

New Distribution Options and Packaging

Starting with nvcr.io/nvidia/pytorch:21.11, Torch-TensorRT will be distributed as part of the container (https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). The version of Torch-TensorRT in container will be the state of the master at the time of building. Torch-TensorRT will be validated to run correctly with the version of PyTorch, CUDA, cuDNN and TensorRT in the container. This will serve as the easiest way to have a full validated PyTorch end to end training to inference stack and serves as a great starting point for building DL applications.

Also as part of Torch-TensorRT we are now starting to distribute the full C++ package within the wheel files for the Python packages. By installing the wheel you now get the Python API, the C++ libraries + headers and the CLI binary. This is going to be the easiest way to install Torch-TensorRT on your stack. After installing with pip

pip3 install torch-tensorrt -f https://github.com/NVIDIA/Torch-TensorRT/releases

You can add the following to your PATH to set up the CLI

PATH=$PATH:<PATH TO TORCHTRT PYTHON PACKAGE>/bin

Stabilized APIs

Python

Many of the APIs have change slighly in this release to be more self consistent and more usable. These changes begin with the Python API for which compile, convert_method_to_trt_engine and TensorRTCompileSpec now instead of dictionaries use kwargs. As features many features came out of beta and experimental stability the necessity to have multiple levels of nesting in settings has decreased, therefore kwargs make much more sense. You can simply port forward to the new APIs by unwrapping your existing compile_spec dict in the arguments to compile or similar functions.

Example:

compile_settings = {
    "inputs": [torch_tensorrt.Input(
        min_shape=[1, 3, 224, 224],
        opt_shape=[1, 3, 512, 512],
        max_shape=[1, 3, 1024, 1024],
        # For static size shape=[1, 3, 224, 224]
        dtype=torch.half, # Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
    )],
    "enabled_precisions": {torch.half}, # Run with FP16
}

trt_ts_module = torch_tensorrt.compile(torch_script_module, **compile_settings)

This release also introduces support for providing tensors as examples to Torch-TensorRT. In place of a torch_tensorrt.Input in the list of inputs you can pass a Tensor. This can only be used to set a static input size. There are also some things to be aware of which will be discussed later in the release notes.

Now that Torch-TensorRT separates components specific to particular IRs to their own namespaces, there is now a replacement for the old compile and convert_method_to_trt_engine functions on the top level. These functions take any PyTorch generated format including torch.nn.Modules and decides the best way to compile it down to TensorRT. In v1.0.0 this means to go through TorchScript and return a Torch.jit.ScriptModule. You can specify the IR to try using the ir arg for these functions.

Due to partial compilation becoming stable in v1.0.0, there are now four new fields which replace the old torch_fallback struct.

old:

complie_spec = {
  "torch_fallback": {
      "enabled": True, # Turn on or turn off falling back to PyTorch if operations are not supported in TensorRT
      "force_fallback_ops": [
          "aten::max_pool2d" # List of specific ops to require running in PyTorch
      ],
      "force_fallback_modules": [
          "mypymod.mytorchmod" # List of specific torch modules to require running in PyTorch
      ],
      "min_block_size": 3 # Minimum number of ops an engine must incapsulate to be run in TensorRT
  }
}

new:

torch_tensorrt.compile(...,
    require_full_compilation=False, 
    min_block_size=3, 
    torch_executed_ops=[ "aten::max_pool2d" ], 
    torch_executed_modules=["mypymod.mytorchmod"])

C++

The changes for the C++ API other than the reorganization and renaming of the namespaces, mostly serve to make Torch-TensorRT consistent between Python and C++ namely by renaming trtorch::CompileGraph to torch_tensorrt::ts::compile and trtorch::ConvertGraphToTRTEngine to torch_tensorrt::ts::convert_method_to_trt_engine. Beyond that similar to Python, the partial compilation struct TorchFallback has been removed and replaced by four fields in torch_tensorrt::ts::CompileSpec

old:

  /**
   * @brief A struct to hold fallback info
   */
  struct TRTORCH_API TorchFallback {
    /// enable the automatic fallback feature
    bool enabled = false;

    /// minimum consecutive operation number that needs to be satisfied to convert to TensorRT
    uint64_t min_block_size = 1;

    /// A list of names of operations that will explicitly run in PyTorch
    std::vector<std::string> forced_fallback_ops;

    /// A list of names of modules that will explicitly run in PyTorch
    std::vector<std::string> forced_fallback_modules;

    /**
     * @brief Construct a default Torch Fallback object, fallback will be off
     */
    TorchFallback() = default;

    /**
     * @brief Construct from a bool
     */
    TorchFallback(bool enabled) : enabled(enabled) {}

    /**
     * @brief Constructor for setting min_block_size
     */
    TorchFallback(bool enabled, uint64_t min_size) : enabled(enabled), min_block_size(min_size) {}
  };

new:

  /**
   * Require the full module be compiled to TensorRT instead of potentially running unsupported operations in PyTorch
   */
  bool require_full_compilation = false;

  /**
   * Minimum number of contiguous supported operators to compile a subgraph to TensorRT
   */
  uint64_t min_block_size = 3;

  /**
   * List of aten operators that must be run in PyTorch. An error will be thrown if this list is not empty but
   * ``require_full_compilation`` is True
   */
  std::vector<std::string> torch_executed_ops;

  /**
   * List of modules that must be run in PyTorch. An error will be thrown if this list is not empty but
   * ``require_full_compilation`` is True
   */
  std::vector<std::string> torch_executed_modules;

CLI

Similarly these partial compilation fields have been renamed in torchtrtc:

    --require-full-compilation        Require that the model should be fully
                                      compiled to TensorRT or throw an error
    --teo=[torch-executed-ops...],
    --torch-executed-ops=[torch-executed-ops...]
                                      (Repeatable) Operator in the graph that
                                      should always be run in PyTorch for
                                      execution (partial compilation must be
                                      enabled)
    --tem=[torch-executed-mods...],
    --torch-executed-mods=[torch-executed-mods...]
                                      (Repeatable) Module that should always
                                      be run in Pytorch for execution (partial
                                      compilation must be enabled)
    --mbs=consecutive_ops,
    --min-block-size=consecutive_ops
                                      Minimum number of contiguous TensorRT
                                      supported ops to compile a subgraph to
                                      TensorRT

Going forward breaking changes to the API the sort of magnitude seen in this release will be accompanied by a major version bump.

Stabilized Partial Compilation

Partial compilation should be considered stable for static input shape and is now enabled by default. In the case of dynamic shape, set require_full_compilation to True.

Adjusted Defaults

Input Types

Default behavior of Torch-TensorRT has shifted slightly. The most important of these changes is the changes to inferred input type. In prior versions the expected input type for a Tensor barring it being set explicitly was based on the op_precision. With that field being removed in this release and being replaced with enabled_precisions introduced in v0.4.0 this sort of behavior no longer makes sense. Therefore now Torch-TensorRT follows these rules to determine Input type for a Tensor.

If no dtype is specified for an Input, Torch-TensorRT will determine the input type by inspecting the uses of this Input. It will trace the lifetime of this tensor to the first tensor operation using weights stored in the provided module. The type of the weights is the inferred type of the Input using the rule that PyTorch requires like types for Tensor operations. The goal with this behavior is to maintain the concept that Torch-TensorRT modules should feel no different than normal PyTorch modules. Therefore you can expect

Weight Type of Model Expected Input Type For Tensor

FP32 FP32

FP16 FP16

Quantization Workflows FP32

Unknown / Ambiguous FP32 w/ Warning
Users can override this behavior to set the Input type to whatever they wish using the dtype field of torch_tensorrt.Input. Torch-TensorRT will always respect the user setting but may throw a warning stating that the model provided expects a different input type. This is mainly to notify you that just dropping the compiled module in place of the raw torch.nn.Module might throw errors and casting before inference might be necessary.
- With Torch-TensorRT v1.0.0 you can provide example torch Tensors to set the input shape. However, this not only sets the input shape but also the input dtype and tensor format as well. So if you provide a half precision 1x3x32x32 contiguous tensor the expected input would be Input(shape=(1, 3, 32, 32), dtype=dtype.half, format=TensorFormat.contiguous). This is subject to the behavior in 2.

Weight Type of Model	Expected Input Type For Tensor
FP32	FP32
FP16	FP16
Quantization Workflows	FP32
Unknown / Ambiguous	FP32 w/ Warning

Workspace Size

Now by default the workspace size is set to 1GB for all GPUs Pascal based and newer (SM capability 6 or above). Maxwell and older cards including Jetson Nano have a workspace of 256MB by default. This value is user settable.

Dependencies

- Bazel 4.2.1
- LibTorch 1.10.0
- CUDA 11.3 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
- cuDNN 8.2.4.15
- TensorRT 8.0.3.4

1.0.0 (2021-11-09)

Bug Fixes

aten::gelu call was wrong in test (40bc4e3)
Fix a core partitioning algo bug where non-tensor input segments are not updated correctly (cc10876)
Fix modules_as_engines test case to use trt_mod instead of pyt_mod (282e98a)
Fix plugin registration macro (8afab22)
Fix python API tests for mobilenet v2 (e5a38ff)
Partial compilation translation to internal settings was incorrect (648bad3)
//py: Don't crash harshly on import when CUDA is not available (07e16fd)
Renable backtrace and make it less repetitive (1435845)
//core/lowering: Fixes module level fallback recursion (f94ae8f)
//core/partitioing: Fixing support for paritally compiling (748ecf3)
//docker: Update docker container build script to use release path (9982855)
//py: Add new dirs to remove during clean (d2cc1e9)
//py: Fix some api import issues (840ca89)
//py: Fix trtorch.Device alternate contructor options (fa08311)
//py: Fix trtorch.Device alternate contructor options (ac26841)
Update notebooks with new library name Torch-TensorRT (8274fd9)
aten::conv1d: Update namespace, fix typo in dest IR for conv1d (d53f136)
eval: Rollback 1.11a0 change + namespace issues (ba743f5)
Use scripting instead of tracing for module fallback tests (32e8b53)
Workspace defaults for other apis and centralize cuda api use (930321e)

Features

Add functionality for tests to use precompiled libraries (b5c324a)
Add QAT patch which modifies scale factor dtype to INT32 (4a10673)
Add TF32 override flag in bazelrc for CI-Testing (7a0c9a5)
Add VGG QAT sample notebook which demonstrates end-end workflow for QAT models (8bf6dd6)
Augment python package to include bin, lib, include directories (ddc0685)
handle scalar type of size [] in shape_analysis (fca53ce)
support aten::and.bool evaluator (6d73e43)
support aten::conv1d and aten::conv_transpose1d (c8dc6e9)
support aten::eq.str evaluator (5643972)
support setting input types of subgraph in fallback, handle Tensor type in evaluated_value_map branch in MarkOutputs (4778b2b)
support truncate_long_and_double in fallback subgraph input type (0bc3c05)
Update documentation with new library name Torch-TensorRT (e5f96d9)
Updating the pre_built to prebuilt (51412c7)
//:libtrtorch: Ship a WORKSPACE file and BUILD file with the (7ac6f1c)
//core/partitioning: Improved logging and code org for the (8927e77)
//cpp: Adding example tensors as a way to set input spec (70a7bb3)
//py: Add the git revision to non release builds (4a0a918)
//py: Allow example tensors from torch to set shape (01d525d)
feat!: Changing the default behavior for selecting the input type (a234335)
refactor!: Removing deprecated InputRange, op_precision and input_shapes (621bc67)
feat(//py)!: Porting forward the API to use kwargs (17e0e8a)
refactor(//py)!: Kwargs updates and support for shifting internal apis (2a0d1c8)
refactor!(//cpp): Inlining partial compilation settings since the (19ecc64)
refactor! : Update default workspace size based on platforms. (391a4c0)
feat!: Turning on partial compilation by default (52e2f05)
refactor!: API level rename (483ef59)
refactor!: Changing the C++ api to be snake case (f34e230)
refactor! : Update Pytorch version to 1.10 (cc7d0b7)
refactor!: Updating bazel version for py build container (06533fe)

BREAKING CHANGES

This removes the InputRange Class and op_precision and input shape fields which were deprecated in TRTorch v0.4.0

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

This change updates the bazel version to build Torch-TensorRT to 4.2.1.

This was done since the only version of bazel available in our build container for python apis is 4.2.1

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

This changes the API for compile settings from a dictionary of settings to a set of kwargs for the various compilation functions. This will break existing code. However there is simple guidance to port forward your code:

Given a dict of valid TRTorch CompileSpec settings

spec = {
	"inputs": ...
	...
}

You can use this same dict with the new APIs by changing your code from:

trtorch.compile(mod, spec)

to:

trtorch.compile(mod, **spec)

which will unpack the dictionary as arguments to the function

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

This commit changes the APIs from a dictionary of arguements to a set of kwargs. You can port forward using

trtorch.compile(mod, **spec)

Also in preparation for partial compilation to be enabled by default settings related to torch fallback have been moved to the top level

instead of

"torch_fallback": {
  "enabled": True,
  "min_block_size" " 3,
  "forced_fallback_ops" : ["aten::add"],
  "forced_fallback_mods" : ["MySubModule"]
}

now there are new settings

require_full_compilation=False,
min_block_size=3,
torch_executed_ops=["aten::add"],
torch_executed_modules=["MySubModule"]

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

This commit changes the API for automatic fallback to inline settings regarding partial compilation in preparation for it to be turned on by default

Now in the compile spec instead of a torch_fallback field with its associated struct, there are four new fields in the compile spec

bool require_full_compilation = true;
uint64_t min_block_size = 3;
std::vector<std::string> torch_executed_ops = {};
std::vector<std::string> torch_executed_modules = {};

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

This commit sets the default workspace size to 1GB for GPU platforms and 256MB for Jetson Nano/TX1 platforms whose compute capability is < 6.

Signed-off-by: Dheeraj Peri [email protected]

This commit turns on partial compilation by default. Unsupported modules will attempt to be run partially in PyTorch and partially in TensorRT

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

This commit renames the namespaces of all TRTorch/Torch-TensorRT APIs. Now torchscript specific functions are segregated into their own torch_tensorrt::torchscript / torch_tensorrt.ts namespaces. Generic utils will remain in the torch_tensorrt namespace. Guidance on how to port forward will follow in the next commits
This changes the C++ API ::ts APIs to be snake case and for CompileModules to become just compile

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

This commit updates the pytorch version to 1.10. To use python API of torch_tensorrt, please upgrade your local pytorch to 1.10 to avoid ABI incompatibility errors. WORKSPACE and requirements files are updated accordingly

Signed-off-by: Dheeraj Peri [email protected]

This commit changes the default behavior of the compiler where if the user does not specify an input data type explicity instead of using the enabled precision, now the compiler will inspect the model provided to infer the data type for the input that will not cause an error if the model was run in torch. In practice this means

If the weights are in FP32 for the first tensor calculation then default input type is FP32
If the weights are in FP16 for the first tensor calculation then default input type is FP16
etc.

If the data type cannot be determined the compiler will default to FP32.

This calculation is done per input tensor so if one input is inferred to use FP32 and another INT32 then the expected types will be the same (FP32, INT32)

As was the same before if the user defines the data type explicitly or provides an example tensor the data type specified there will be respected

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

Operators Supported

Operators Currently Supported Through Converters

aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
aten::abs(Tensor self) -> (Tensor)
aten::acos(Tensor self) -> (Tensor)
aten::acosh(Tensor self) -> (Tensor)
aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::asin(Tensor self) -> (Tensor)
aten::asinh(Tensor self) -> (Tensor)
aten::atan(Tensor self) -> (Tensor)
aten::atanh(Tensor self) -> (Tensor)
aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
aten::ceil(Tensor self) -> (Tensor)
aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
aten::cos(Tensor self) -> (Tensor)
aten::cosh(Tensor self) -> (Tensor)
aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::erf(Tensor self) -> (Tensor)
aten::exp(Tensor self) -> (Tensor)
aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
aten::floor(Tensor self) -> (Tensor)
aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::gelu(Tensor self) -> (Tensor)
aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
aten::log(Tensor self) -> (Tensor)
aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
aten::matmul(Tensor self, Tensor other) -> (Tensor)
aten::max(Tensor self) -> (Tensor)
aten::max.other(Tensor self, Tensor other) -> (Tensor)
aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::min(Tensor self) -> (Tensor)
aten::min.other(Tensor self, Tensor other) -> (Tensor)
aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::neg(Tensor self) -> (Tensor)
aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
aten::prelu(Tensor self, Tensor weight) -> (Tensor)
aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::reciprocal(Tensor self) -> (Tensor)
aten::relu(Tensor input) -> (Tensor)
aten::relu_(Tensor(a!) self) -> (Tensor(a!))
aten::repeat(Tensor self, int[] repeats) -> (Tensor)
aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
aten::reshape(Tensor self, int[] shape) -> (Tensor)
aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
aten::sigmoid(Tensor input) -> (Tensor)
aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
aten::sin(Tensor self) -> (Tensor)
aten::sinh(Tensor self) -> (Tensor)
aten::slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> (Tensor(a))
aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::sqrt(Tensor self) -> (Tensor)
aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
aten::sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::t(Tensor self) -> (Tensor)
aten::tan(Tensor self) -> (Tensor)
aten::tanh(Tensor input) -> (Tensor)
aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
aten::to.dtype(Tensor self, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
aten::to.other(Tensor self, Tensor other, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(a|b))
aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
trt::const(Tensor self) -> (Tensor)

Operators Currently Supported Through Evaluators

aten::Bool.float(float b) -> (bool)
aten::Bool.int(int a) -> (bool)
aten::Float.Scalar(Scalar a) -> float
aten::Float.bool(bool a) -> float
aten::Float.int(int a) -> float
aten::Int.Scalar(Scalar a) -> int
aten::Int.bool(bool a) -> int
aten::Int.float(float a) -> int
aten::Int.int(int a) -> int
aten::and(int a, int b) -> (bool)
aten::and.bool(bool a, bool b) -> (bool)
aten::getitem.t(t list, int idx) -> (t(*))
aten::is(t1 self, t2 obj) -> bool
aten::isnot(t1 self, t2 obj) -> bool
aten::not(bool self) -> bool
aten::or(int a, int b) -> (bool)
aten::__round_to_zero_floordiv(int a, int b) -> (int)
aten::xor(int a, int b) -> (bool)
aten::add.float(float a, float b) -> (float)
aten::add.int(int a, int b) -> (int)
aten::add_.t(t self, t[] b) -> (t[])
aten::append.t(t self, t(c -> *) el) -> (t)
aten::arange(Scalar end, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::arange.start_step(Scalar start, Scalar end, Scalar step, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::clone(Tensor self, *, int? memory_format=None) -> (Tensor)
aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
aten::dim(Tensor self) -> int
aten::div.float(float a, float b) -> (float)
aten::div.int(int a, int b) -> (float)
aten::eq.bool(bool a, bool b) -> (bool)
aten::eq.float(float a, float b) -> (bool)
aten::eq.float_int(float a, int b) -> (bool)
aten::eq.int(int a, int b) -> (bool)
aten::eq.int_float(int a, float b) -> (bool)
aten::eq.str(str a, str b) -> (bool)
aten::floor.float(float a) -> (int)
aten::floor.int(int a) -> (int)
aten::floordiv.float(float a, float b) -> (int)
aten::floordiv.int(int a, int b) -> (int)
aten::ge.bool(bool a, bool b) -> (bool)
aten::ge.float(float a, float b) -> (bool)
aten::ge.float_int(float a, int b) -> (bool)
aten::ge.int(int a, int b) -> (bool)
aten::ge.int_float(int a, float b) -> (bool)
aten::gt.bool(bool a, bool b) -> (bool)
aten::gt.float(float a, float b) -> (bool)
aten::gt.float_int(float a, int b) -> (bool)
aten::gt.int(int a, int b) -> (bool)
aten::gt.int_float(int a, float b) -> (bool)
aten::is_floating_point(Tensor self) -> (bool)
aten::le.bool(bool a, bool b) -> (bool)
aten::le.float(float a, float b) -> (bool)
aten::le.float_int(float a, int b) -> (bool)
aten::le.int(int a, int b) -> (bool)
aten::le.int_float(int a, float b) -> (bool)
aten::len.t(t[] a) -> (int)
aten::lt.bool(bool a, bool b) -> (bool)
aten::lt.float(float a, float b) -> (bool)
aten::lt.float_int(float a, int b) -> (bool)
aten::lt.int(int a, int b) -> (bool)
aten::lt.int_float(int a, float b) -> (bool)
aten::mul.float(float a, float b) -> (float)
aten::mul.int(int a, int b) -> (int)
aten::ne.bool(bool a, bool b) -> (bool)
aten::ne.float(float a, float b) -> (bool)
aten::ne.float_int(float a, int b) -> (bool)
aten::ne.int(int a, int b) -> (bool)
aten::ne.int_float(int a, float b) -> (bool)
aten::neg.int(int a) -> (int)
aten::numel(Tensor self) -> int
aten::size(Tensor self) -> (int[])
aten::size.int(Tensor self, int dim) -> (int)
aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
aten::sqrt.float(float a) -> (float)
aten::sqrt.int(int a) -> (float)
aten::sub.float(float a, float b) -> (float)
aten::sub.int(int a, int b) -> (int)
aten::tensor(t[] data, *, int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor)
prim::dtype(Tensor a) -> (int)
prim::max.bool(bool a, bool b) -> (bool)
prim::max.float(float a, float b) -> (bool)
prim::max.float_int(float a, int b) -> (bool)
prim::max.int(int a, int b) -> (bool)
prim::max.int_float(int a, float b) -> (bool)
prim::max.self_int(int[] self) -> (int)
prim::min.bool(bool a, bool b) -> (bool)
prim::min.float(float a, float b) -> (bool)
prim::min.float_int(float a, int b) -> (bool)
prim::min.int(int a, int b) -> (bool)
prim::min.int_float(int a, float b) -> (bool)
prim::min.self_int(int[] self) -> (int)
prim::shape(Tensor a) -> (int[])

v0.4.1

2 years ago

TRTorch v0.4.1

Bug Fixes for Module Ignorelist for Partial Compilation, `trtorch.Device`, Version updates for PyTorch, TensorRT, cuDNN

Target Platform Changes

This is the first patch of TRTorch v0.4. It now targets by default PyTorch 1.9.1, TensorRT 8.0.3.4 and cuDNN 8.2.4.15 and CUDA 11.1. Older versions of PyTorch, TensorRT, cuDNN are still supported in the same manner as TRTorch v0.4.0

Module Ignorelist for Partial Compilation

There was an issue with the pass marking modules to be ignored during compilation where it unsafely assumed that methods are named forward all the way down the module tree. While this was fine for 1.8.0, with PyTorch 1.9.0, the TorchScript codegen changed slightly to sometimes use methods of other names for modules which reduce trivially to a functional api. This fix now will identify method calls as the recursion point and then use those method calls to select modules to recurse on. It will also check to verify existence of these modules and methods before recursing. Finally this pass was run by default even if the ignore list was empty causing issues for users not using the feature. Therefore this pass is now disabled unless explicitly enabled

`trtorch.Device`

Some of the constructors for trtorch.Device would not work or incorrectly configure the device. This patch will fix those issues.

Dependencies

- Bazel 4.0.0
- LibTorch 1.9.1
- CUDA 11.1 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
- cuDNN 8.2.3.4
- TensorRT 8.0.3.4

0.4.1 (2021-10-06)

Bug Fixes

//core/lowering: Fixes module level fallback recursion (2fc612d)
Move some lowering passes to graph level logging (0266f41)
//py: Fix trtorch.Device alternate contructor options (ac26841)

Operators Supported

Operators Currently Supported Through Converters

aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
aten::abs(Tensor self) -> (Tensor)
aten::acos(Tensor self) -> (Tensor)
aten::acosh(Tensor self) -> (Tensor)
aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::asin(Tensor self) -> (Tensor)
aten::asinh(Tensor self) -> (Tensor)
aten::atan(Tensor self) -> (Tensor)
aten::atanh(Tensor self) -> (Tensor)
aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
aten::ceil(Tensor self) -> (Tensor)
aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
aten::cos(Tensor self) -> (Tensor)
aten::cosh(Tensor self) -> (Tensor)
aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::erf(Tensor self) -> (Tensor)
aten::exp(Tensor self) -> (Tensor)
aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
aten::floor(Tensor self) -> (Tensor)
aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::gelu(Tensor self) -> (Tensor)
aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
aten::log(Tensor self) -> (Tensor)
aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
aten::matmul(Tensor self, Tensor other) -> (Tensor)
aten::max(Tensor self) -> (Tensor)
aten::max.other(Tensor self, Tensor other) -> (Tensor)
aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::min(Tensor self) -> (Tensor)
aten::min.other(Tensor self, Tensor other) -> (Tensor)
aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::neg(Tensor self) -> (Tensor)
aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
aten::prelu(Tensor self, Tensor weight) -> (Tensor)
aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::reciprocal(Tensor self) -> (Tensor)
aten::relu(Tensor input) -> (Tensor)
aten::relu_(Tensor(a!) self) -> (Tensor(a!))
aten::repeat(Tensor self, int[] repeats) -> (Tensor)
aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
aten::reshape(Tensor self, int[] shape) -> (Tensor)
aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
aten::sigmoid(Tensor input) -> (Tensor)
aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
aten::sin(Tensor self) -> (Tensor)
aten::sinh(Tensor self) -> (Tensor)
aten::slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> (Tensor(a))
aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::sqrt(Tensor self) -> (Tensor)
aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
aten::sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::t(Tensor self) -> (Tensor)
aten::tan(Tensor self) -> (Tensor)
aten::tanh(Tensor input) -> (Tensor)
aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
aten::to.dtype(Tensor self, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
aten::to.other(Tensor self, Tensor other, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(a|b))
aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
trt::const(Tensor self) -> (Tensor)

Operators Currently Supported Through Evaluators

aten::Bool.float(float b) -> (bool)
aten::Bool.int(int a) -> (bool)
aten::Float.Scalar(Scalar a) -> float
aten::Float.bool(bool a) -> float
aten::Float.int(int a) -> float
aten::Int.Scalar(Scalar a) -> int
aten::Int.bool(bool a) -> int
aten::Int.float(float a) -> int
aten::Int.int(int a) -> int
aten::and(int a, int b) -> (bool)
aten::getitem.t(t list, int idx) -> (t(*))
aten::is(t1 self, t2 obj) -> bool
aten::isnot(t1 self, t2 obj) -> bool
aten::not(bool self) -> bool
aten::or(int a, int b) -> (bool)
aten::__round_to_zero_floordiv(int a, int b) -> (int)
aten::xor(int a, int b) -> (bool)
aten::add.float(float a, float b) -> (float)
aten::add.int(int a, int b) -> (int)
aten::add_.t(t self, t[] b) -> (t[])
aten::append.t(t self, t(c -> *) el) -> (t)
aten::arange(Scalar end, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::arange.start_step(Scalar start, Scalar end, Scalar step, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::clone(Tensor self, *, int? memory_format=None) -> (Tensor)
aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
aten::dim(Tensor self) -> int
aten::div.float(float a, float b) -> (float)
aten::div.int(int a, int b) -> (float)
aten::eq.bool(bool a, bool b) -> (bool)
aten::eq.float(float a, float b) -> (bool)
aten::eq.float_int(float a, int b) -> (bool)
aten::eq.int(int a, int b) -> (bool)
aten::eq.int_float(int a, float b) -> (bool)
aten::floor.float(float a) -> (int)
aten::floor.int(int a) -> (int)
aten::floordiv.float(float a, float b) -> (int)
aten::floordiv.int(int a, int b) -> (int)
aten::ge.bool(bool a, bool b) -> (bool)
aten::ge.float(float a, float b) -> (bool)
aten::ge.float_int(float a, int b) -> (bool)
aten::ge.int(int a, int b) -> (bool)
aten::ge.int_float(int a, float b) -> (bool)
aten::gt.bool(bool a, bool b) -> (bool)
aten::gt.float(float a, float b) -> (bool)
aten::gt.float_int(float a, int b) -> (bool)
aten::gt.int(int a, int b) -> (bool)
aten::gt.int_float(int a, float b) -> (bool)
aten::is_floating_point(Tensor self) -> (bool)
aten::le.bool(bool a, bool b) -> (bool)
aten::le.float(float a, float b) -> (bool)
aten::le.float_int(float a, int b) -> (bool)
aten::le.int(int a, int b) -> (bool)
aten::le.int_float(int a, float b) -> (bool)
aten::len.t(t[] a) -> (int)
aten::lt.bool(bool a, bool b) -> (bool)
aten::lt.float(float a, float b) -> (bool)
aten::lt.float_int(float a, int b) -> (bool)
aten::lt.int(int a, int b) -> (bool)
aten::lt.int_float(int a, float b) -> (bool)
aten::mul.float(float a, float b) -> (float)
aten::mul.int(int a, int b) -> (int)
aten::ne.bool(bool a, bool b) -> (bool)
aten::ne.float(float a, float b) -> (bool)
aten::ne.float_int(float a, int b) -> (bool)
aten::ne.int(int a, int b) -> (bool)
aten::ne.int_float(int a, float b) -> (bool)
aten::neg.int(int a) -> (int)
aten::numel(Tensor self) -> int
aten::size(Tensor self) -> (int[])
aten::size.int(Tensor self, int dim) -> (int)
aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
aten::sqrt.float(float a) -> (float)
aten::sqrt.int(int a) -> (float)
aten::sub.float(float a, float b) -> (float)
aten::sub.int(int a, int b) -> (int)
aten::tensor(t[] data, *, int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor)
prim::dtype(Tensor a) -> (int)
prim::max.bool(bool a, bool b) -> (bool)
prim::max.float(float a, float b) -> (bool)
prim::max.float_int(float a, int b) -> (bool)
prim::max.int(int a, int b) -> (bool)
prim::max.int_float(int a, float b) -> (bool)
prim::max.self_int(int[] self) -> (int)
prim::min.bool(bool a, bool b) -> (bool)
prim::min.float(float a, float b) -> (bool)
prim::min.float_int(float a, int b) -> (bool)
prim::min.int(int a, int b) -> (bool)
prim::min.int_float(int a, float b) -> (bool)
prim::min.self_int(int[] self) -> (int)
prim::shape(Tensor a) -> (int[])

v0.4.0

2 years ago

TRTorch v0.4.0

Support for PyTorch 1.9, TensorRT 8.0. Introducing INT8 Execution for QAT models, Module Based Partial Compilation, Auto Device Configuration, Input Class, Usability Improvements, New Converters, Bug Fixes

Target Platform Changes

This is the fourth beta release of TRTorch, targeting PyTorch 1.9, CUDA 11.1 (on x86_64, CUDA 10.2 on aarch64), cuDNN 8.2 and TensorRT 8.0 with backwards compatible source for TensorRT 7.1. On aarch64 TRTorch targets Jetpack 4.6 primarily with backwards compatibile source for Jetpack 4.5. When building on Jetson, the flag --platforms //toolchains:jetpack_4.x must be now be provided for C++ compilation to select the correct dependency paths. For python by default it is assumed the Jetpack version is 4.6. To override this add the --jetpack-version 4.5 flag when building.

TensorRT 8.0

This release adds support for compiling models trained with Quantization aware training (QAT) allowing users using the TensorRT PyTorch Quantization Toolkit (https://github.com/NVIDIA/TensorRT/tree/master/tools/pytorch-quantization) to compile their models using TRTorch. For more information and a tutorial, refer to https://www.github.com/NVIDIA/TRTorch/tree/v0.4.0/examples/int8/qat. It also adds support for sparsity via the sparse_weights flag in the compile spec. This allows TensorRT to utilize specialized hardware in Ampere GPUs to minimize unnecessary computation and therefore increase computational efficiency.

Partial Compilation

In v0.4.0 the partial compilation feature of TRTorch can now be considered beta level stability. New in this release is the ability to specify entire PyTorch modules to run in PyTorch explicitly as part of partial compilation. This should let users isolate troublesome code easily when compiling. Again, feedback on this feature is greatly appreciated.

Automatic Device Configuration at Runtime

v0.4.0 also changes the "ABI" of TRTorch to now include information about the target device for the program. Programs compiled with v0.4.0 will look for and select the most compatible available device. The rules used are: Any valid device option must have the same SM capability as the device building the engine. From there, TRTorch prefers the same device (e.g. Built on A100 so A100 is better than A30) and finally prefers the same device ID. Users will be warned if this selected device is not the current active device in the course of execution as overhead may be incurred in transferring input tensors from the current device to the target device. Users can then modify their code to avoid this. Due to this ABI change, existing compiled TRTorch programs are incompatible with the TRTorch v0.4.0 runtime. From v0.4.0 onwards an internal ABI version will check program compatibility. This ABI version is only incremented with breaking changes to the ABI.

API Changes (Input, enabled_precisions, Device)

TRTorch v0.4.0 changes the API for specifying Input shapes and data types to provide users more control over configuration. The new API makes use of the class trtorch.Input which lets users set the shape (or shape range) as well as memory layout and expected data type. These input specs are set in the input field of the CompileSpec.

"inputs": [
        trtorch.Input((1, 3, 224, 224)), # Static input shape for input #1
        trtorch.Input(
            min_shape=(1, 224, 224, 3),
            opt_shape=(1, 512, 512, 3),
            max_shape=(1, 1024, 1024, 3),
            dtype=torch.int32,
            format=torch.channel_last,
        ) # Dynamic input shape for input #2, input type int and channel last format
    ],

The legacy input_shapes field and associated usage with lists of tuples/InputRanges should now be considered deprecated. They remain usable in v0.4.0 but will be removed in the next release. Similarly, the compile spec field op_precision is now also deprecated in favor of enabled_precisions. enabled_precisions is a set containing the data types that kernels will be allowed to use. Whereas setting op_precision = torch.int8 would implicitly enable FP32 and FP16 kernels as well, now enabled_precisions should be set as {torch.float32, torch.float16, torch.int8} to do the same. In order to maintain similar behavior to normal PyTorch, if FP16 is the lowest precision enabled but no explicit data type is set for the inputs to the model, the expectation will be that inputs will be in FP16 . For other cases (FP32, INT8) FP32 is the default, similar to PyTorch and previous versions of TRTorch. Finally in the Python API, a class trtorch.Device has been added. While users can continue to use torch.Device or other torch APIs, trtorch.Device allows for better control for the specific use cases of compiling with TRTorch (e.g. setting DLA core and GPU fallback). This class is very similar to the C++ version with a couple additions of syntactic sugar to make the class easier and more familiar to use:

trtorch.Device("dla:0", allow_gpu_fallback=False) #Set device as DLA Core 0 (implicitly sets the GPU managing DLA cores as the GPU and sets fallback to false)

trtorch.Device can be used instead of a dictionary in the compile spec if desired.

trtorchc has been updated to reflect these API changes. Users can set the shape, dtype and format of inputs from the command line using the following format "[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]@DTYPE%FORMAT" e.g. (3, 3, 32,32)@f16%NHWC. -p is now a repeatable flag to enable multiple precisions. Also added are repeatable flags --ffm and --ffo to mark specific modules and operators for running in PyTorch respectively. To use these two options, --allow-torch-fallback should be set. Options for embedding serialized engines (--embed-engine) and sparsity (--sparse-weights) added as well.

Usability

Finally, TRTorch v0.4.0 also now includes the ability to provide backtraces for locations in your model which TRTorch does not support. This can help in identifying locations in the model that might need to change for TRTorch support or modules which should run fully in PyTorch via partial compilation.

Dependencies

- Bazel 4.0.0
- LibTorch 1.9.0
- CUDA 11.1 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
- cuDNN 8.2.2.3
- TensorRT 8.0.1.6

0.4.0 (2021-08-24)

feat(serde)!: Refactor CudaDevice struct, implement ABI versioning, (9327cce)
feat(//py)!: Implementing top level python api changes to reflect new (482265f)
feat(//cpp)!: Changes to TRTorch C++ api reflecting Input and (08b4942)
feat!: Pytorch 1.9 version bump (a12d249)
feat(//core/runtime)!: Better and more portable names for engines (6eb3bb2)

Bug Fixes

//core/conversion/conversionctx: Guard final engine building (dfa9ae8)
//core/lowering: use lower_info as parameter (370aeb9)
//cpp/ptq: fixing bad accuracy in just the example code (7efa11d)
//py: Fix python setup.py with new libtrtorch.so location (68ba63c)
//tests: fix optional jetson tests (4c32a83)
//tests: use right type for masked_fill test (4a5c28f)
aten::cat: support neg dim for cat (d8ca182)
aten::select and aten::var: Fix converters to handle negative axes (3a734a2)
aten::slice: Allow slicing of pytorch tensors (50f012e)
aten::tensor: Last dim doesnt always get written right (b68d4aa)
aten::tensor: Last dim doesnt always get written right (38744bc)
Address review comments, fix failing tests due to bool mishandling (13eef91)
Final working version of QAT in TRTorch (521a0cb)
fix aten::sub.scalar operator (9a09514)
Fix linear lowering pass, lift layer_norm scale layer restriction and matmul layer nbdims restriction (930d582)
Fix testcases using old InputRange API (ff87956)
Fix TRT8 engine capability flags (2b69742)
Fix warnings thrown by noexcept functions (c5f7eea)
Fix warnings thrown by noexcept functions (ddc8950)
Minor fixes to qat scripts (b244423)
Restrict TRTorch to compile only forward methods (9f006d5)
Transfer calibration data to gpu when it is not a batch (23739cb)
typo in aten::batch_norm (d47f48f)
qat: Rescale input data for C++ application (9dc6061)
Use len() to get size of dataset (ccc60d5)
device_conf: Devices never actually got swithed in multi device (f1d0a43)
exception_elimination: Exception branches are no longer consistent (d61b667)
to_backend: Clean up to_backend implementation (4e15605)
trtorchc: Allow for workspaces larger than 2G and better debugging (e1e7812)
Using TensorRT 8 new API calls (14691e7)
Using TensorRT 8 new API calls (fa969a5)

Features

//core/conversion: Adding error prefix to python source traceback (4bf2a41)
//core/conversion: Handle adding and wrapping ITensors as (a22e99b)
//core/ir: Implementing new internal input spec type (316df28)
//core/lowering: Adding two passes, one to delimit and one to mark (2e04ce5)
//core/lowering: additional logging in module fallback (ad07645)
//core/plugins: Add adaptive_max_pool2d plugin, enable the plugins to run on GPU (6f4aa40)
//cpp/int8/qat: QAT application release (d8f5d29)
//examples/int8: Implement Makefile based execution for ptq and qat (b7f6d8a)
//examples/int8/qat: Install pytorch-quantization with (1ca1484)
//py: add user level device class in py for embed engine (d99169f)
aten::masked_fill: In progress implementation of masked_fill (fa7d6d9)
aten::ones: Adding support for aten::ones (2b45a3d)
aten::slice: Patching slice for new optional params (a11287f)
aten::sqrt: Adding support for sqrt evaluators (6aaba3b)
aten::std|aten::masked_fill: Implement masked_fill, aten::std (a086a5b)
aten::std|aten::masked_fill: Implement masked_fill, aten::std (2866627)
jetson: Support for Jetpack 4.6 (9760fe3)
to_backend: Updating backend integration preproc function (080b594)
Enable sparsity support in TRTorch (f9e1f2b)
trtorchc: Adding flag for sparse weights (bfdc6f5)
Add aten::full converter, quantization ops testcases (9f2ffd0)
Add aten::type_as lowering pass (b57a6dd)
Add functionality for QAT workflow (fc8eafb)
Add functionality for QAT workflow (f776e76)
Add support for providing input datatypes in TRTorch (a3f4a3c)
Adding automatic casting to compare layers (90af26e)
Enable sparsity support in TRTorch (decd0ed)
Enable TRT 8.0 QAT functionality in TRTorch (c76a28a)
Makefile for trtorchrt.so example (c60c521)
show pytorch code of unsupported operators (2ee2a84)
support aten::Int (5bc977d)
trtorchc: Adding more dtype aliases (652fb13)
trtorchc: Adding new support for dtypes and formats in (c39bf81)
Support fallback options in trtorchc (ad966b7)
Using shared_ptrs to manage TRT resources in runtime (e336630)
trtorchc: Embedding engines in modules from the CLI (2b4b9e3)

BREAKING CHANGES

This commit cleans up the WIP CudaDevice class, simplifying implementation and formalizing the seralized format for CUDA devices.

It also implements ABI Versioning. The first entry in the serialized format of a TRTEngine now records the ABI that the engine was compiled with, defining expected compatibility with the TRTorch runtime. If the ABI version does not match, the runtime will error out asking to recompile the program.

ABI version is a monotonically increasing integer and should be incremented everytime the serialization format changes in some way.

This commit cleans up the CudaDevice class, implementing a number of constructors to replace the various utility functions that populate the struct. Descriptive utility functions remain but solely call the relevant constructor.

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

This commit introduces the next iteration of the Python TRTorch API. Starting in TRTorch v0.5.0 support for the "input_shapes" and "op_precision" compile spec keys will be removed. Users should port forward to using the "inputs" key which expects a list of trtorch.Input objects and the "enabled_precisions" key which expects a set of data type specifying enums.

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

This change deprecates InputRange, and the CompileSpec fields "input_shapes", "op_precision" and associated contructors and functions. These are replaced wtih Input, "inputs" and "enabled_precisions" respectively. Deprecated components will be removed in TRTorch v0.5.0

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

Updating PyTorch version to 1.9.0 which includes breaking changes to the to_backend api

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

This bumps the TRTorch ABI version to 3 due to a new field for engine name included in the serialized form of TRTEngine. This lets deserialized engines have the same name they serialized with

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]>

Supported Operators in TRTorch v0.4.0

Operators Currently Supported Through Converters

aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
aten::abs(Tensor self) -> (Tensor)
aten::acos(Tensor self) -> (Tensor)
aten::acosh(Tensor self) -> (Tensor)
aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::asin(Tensor self) -> (Tensor)
aten::asinh(Tensor self) -> (Tensor)
aten::atan(Tensor self) -> (Tensor)
aten::atanh(Tensor self) -> (Tensor)
aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
aten::ceil(Tensor self) -> (Tensor)
aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
aten::cos(Tensor self) -> (Tensor)
aten::cosh(Tensor self) -> (Tensor)
aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::erf(Tensor self) -> (Tensor)
aten::exp(Tensor self) -> (Tensor)
aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
aten::floor(Tensor self) -> (Tensor)
aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::gelu(Tensor self) -> (Tensor)
aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
aten::log(Tensor self) -> (Tensor)
aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
aten::matmul(Tensor self, Tensor other) -> (Tensor)
aten::max(Tensor self) -> (Tensor)
aten::max.other(Tensor self, Tensor other) -> (Tensor)
aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::min(Tensor self) -> (Tensor)
aten::min.other(Tensor self, Tensor other) -> (Tensor)
aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::neg(Tensor self) -> (Tensor)
aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
aten::prelu(Tensor self, Tensor weight) -> (Tensor)
aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::reciprocal(Tensor self) -> (Tensor)
aten::relu(Tensor input) -> (Tensor)
aten::relu_(Tensor(a!) self) -> (Tensor(a!))
aten::repeat(Tensor self, int[] repeats) -> (Tensor)
aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
aten::reshape(Tensor self, int[] shape) -> (Tensor)
aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
aten::sigmoid(Tensor input) -> (Tensor)
aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
aten::sin(Tensor self) -> (Tensor)
aten::sinh(Tensor self) -> (Tensor)
aten::slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> (Tensor(a))
aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::sqrt(Tensor self) -> (Tensor)
aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
aten::sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::t(Tensor self) -> (Tensor)
aten::tan(Tensor self) -> (Tensor)
aten::tanh(Tensor input) -> (Tensor)
aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
aten::to.dtype(Tensor self, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
aten::to.other(Tensor self, Tensor other, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(a|b))
aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
trt::const(Tensor self) -> (Tensor)

Operators Currently Supported Through Evaluators

aten::Bool.float(float b) -> (bool)
aten::Bool.int(int a) -> (bool)
aten::Float.Scalar(Scalar a) -> float
aten::Float.bool(bool a) -> float
aten::Float.int(int a) -> float
aten::Int.Scalar(Scalar a) -> int
aten::Int.bool(bool a) -> int
aten::Int.float(float a) -> int
aten::Int.int(int a) -> int
aten::and(int a, int b) -> (bool)
aten::getitem.t(t list, int idx) -> (t(*))
aten::is(t1 self, t2 obj) -> bool
aten::isnot(t1 self, t2 obj) -> bool
aten::not(bool self) -> bool
aten::or(int a, int b) -> (bool)
aten::__round_to_zero_floordiv(int a, int b) -> (int)
aten::xor(int a, int b) -> (bool)
aten::add.float(float a, float b) -> (float)
aten::add.int(int a, int b) -> (int)
aten::add_.t(t self, t[] b) -> (t[])
aten::append.t(t self, t(c -> *) el) -> (t)
aten::arange(Scalar end, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::arange.start_step(Scalar start, Scalar end, Scalar step, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
aten::clone(Tensor self, *, int? memory_format=None) -> (Tensor)
aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
aten::dim(Tensor self) -> int
aten::div.float(float a, float b) -> (float)
aten::div.int(int a, int b) -> (float)
aten::eq.bool(bool a, bool b) -> (bool)
aten::eq.float(float a, float b) -> (bool)
aten::eq.float_int(float a, int b) -> (bool)
aten::eq.int(int a, int b) -> (bool)
aten::eq.int_float(int a, float b) -> (bool)
aten::floor.float(float a) -> (int)
aten::floor.int(int a) -> (int)
aten::floordiv.float(float a, float b) -> (int)
aten::floordiv.int(int a, int b) -> (int)
aten::ge.bool(bool a, bool b) -> (bool)
aten::ge.float(float a, float b) -> (bool)
aten::ge.float_int(float a, int b) -> (bool)
aten::ge.int(int a, int b) -> (bool)
aten::ge.int_float(int a, float b) -> (bool)
aten::gt.bool(bool a, bool b) -> (bool)
aten::gt.float(float a, float b) -> (bool)
aten::gt.float_int(float a, int b) -> (bool)
aten::gt.int(int a, int b) -> (bool)
aten::gt.int_float(int a, float b) -> (bool)
aten::is_floating_point(Tensor self) -> (bool)
aten::le.bool(bool a, bool b) -> (bool)
aten::le.float(float a, float b) -> (bool)
aten::le.float_int(float a, int b) -> (bool)
aten::le.int(int a, int b) -> (bool)
aten::le.int_float(int a, float b) -> (bool)
aten::len.t(t[] a) -> (int)
aten::lt.bool(bool a, bool b) -> (bool)
aten::lt.float(float a, float b) -> (bool)
aten::lt.float_int(float a, int b) -> (bool)
aten::lt.int(int a, int b) -> (bool)
aten::lt.int_float(int a, float b) -> (bool)
aten::mul.float(float a, float b) -> (float)
aten::mul.int(int a, int b) -> (int)
aten::ne.bool(bool a, bool b) -> (bool)
aten::ne.float(float a, float b) -> (bool)
aten::ne.float_int(float a, int b) -> (bool)
aten::ne.int(int a, int b) -> (bool)
aten::ne.int_float(int a, float b) -> (bool)
aten::neg.int(int a) -> (int)
aten::numel(Tensor self) -> int
aten::size(Tensor self) -> (int[])
aten::size.int(Tensor self, int dim) -> (int)
aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
aten::sqrt.float(float a) -> (float)
aten::sqrt.int(int a) -> (float)
aten::sub.float(float a, float b) -> (float)
aten::sub.int(int a, int b) -> (int)
aten::tensor(t[] data, *, int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor)
prim::dtype(Tensor a) -> (int)
prim::max.bool(bool a, bool b) -> (bool)
prim::max.float(float a, float b) -> (bool)
prim::max.float_int(float a, int b) -> (bool)
prim::max.int(int a, int b) -> (bool)
prim::max.int_float(int a, float b) -> (bool)
prim::max.self_int(int[] self) -> (int)
prim::min.bool(bool a, bool b) -> (bool)
prim::min.float(float a, float b) -> (bool)
prim::min.float_int(float a, int b) -> (bool)
prim::min.int(int a, int b) -> (bool)
prim::min.int_float(int a, float b) -> (bool)
prim::min.self_int(int[] self) -> (int)
prim::shape(Tensor a) -> (int[])

v0.3.0

3 years ago

TRTorch v0.3.0

Support for PyTorch 1.8.x (by default 1.8.1), Introducing Plugin Library, PTQ from Python, Arbitrary TRT engine embedding, Preview Release of Partial Compilation, New Converters, Bug Fixes

This is the third beta release of TRTorch, targeting PyTorch 1.8.x, CUDA 11.1 (on x86_64), TensorRT 7.2, cuDNN 8. TRTorch 0.3.0 binary releases target PyTorch 1.8.1 specifically, these builds are not compatible with 1.8.0, though the source code remains compatible with any PyTorch 1.8.x version. On aarch64 TRTorch targets JetPack 4.5.x. This release introduces libtrtorch_plugins.so. This library is a portable distribution of all TensorRT plugins used in TRTorch. The intended usecase is to support TRTorch programs that utilize TensorRT plugins deployed on systems with only the runtime library available or in the case that TRTorch was used to create a TensorRT engine to be run outside the TRTorch runtime, which makes uses of TRTorch plugins. An example on how to use this library can be found here: https://www.github.com/NVIDIA/TRTorch/tree/v0.3.0/examples/sample_rt_app. TRTorch 0.3.0 also now allows users to repurpose PyTorch Dataloaders to do post training quantization in Python similar to the workflow supported in C++ currently. It also introduces a new API to wrap arbitrary TensorRT engines in a PyTorch Module wrapper, making the serializable by torch.jit.save and completely compatible with other PyTorch modules. Finally, TRTorch 0.3.0 also includes a preview of the new partial compilation capability of the TRTorch compiler. With this feature, users can now instruct TRTorch to keep operations that are not supported but TRTorch/TensorRT in PyTorch. Partial compilation should be considered alpha stability and we are seeking feedback on bugs, pain points and feature requests surrounding using this feature.

Dependencies:

- Bazel 4.0.0
- LibTorch 1.8.1 (on x86_64), 1.8.0 (on aarch64)
- CUDA 11.1 (on x86_64, by default , newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
- cuDNN 8.1.1
- TensorRT 7.2.3.4

0.3.0 (2021-05-13)

Bug Fixes

//plugins: Readding cuBLAS BUILD to allow linking of libnvinfer_plugin on Jetson (a8008f4)
//tests/../concat: Concat test fix (2432fb8)
//tests/core/partitioning: Fixing some issues with the partition (ff89059)
erase the repetitive nodes in dependency analysis (80b1038)
fix a typo for debug (c823ebd)
fix typo bug (e491bb5)
aten::linear: Fixes new issues in 1.8 that cause script based (c5057f8)
register the torch_fallback attribute in Python API (8b7919f)
support expand/repeat with IValue type input (a4882c6)
support shape inference for add_, support non-tensor arguments for segmented graphs (46950bb)
feat!: Updating versions of CUDA, cuDNN, TensorRT and PyTorch (71c4dcb)
feat(WORKSPACE)!: Updating PyTorch version to 1.8.1 (c9aa99a)

Features

//.github: Linter throws 1 when there needs to be style changes to (a39dea7)
//core: New API to register arbitrary TRT engines in TorchScript (3ec836e)
//core/conversion/conversionctx: Adding logging for truncated (96245ee)
//core/partitioing: Adding ostream for Partition Info (b3589c5)
//core/partitioning: Add an ostream implementation for (ee536b6)
//core/partitioning: Refactor top level partitioning API, fix a bug with (abc63f6)
//core/plugins: Gating plugin logging based on global config (1d5a088)
added user level API for fallback (f4c29b4)
allow users to set fallback block size and ops (6d3064a)
insert nodes by dependencies for nonTensor inputs/outputs (4e32eff)
support aten::arange converter (014e381)
support aten::transpose with negative dim (4a1d2f3)
support Int/Bool and other constants' inputs/outputs for TensorRT segments (54e407e)
support prim::Param for fallback inputs (ec2bbf2)
support prim::Param for input type after refactor (3cebe97)
support Python APIs for Automatic Fallback (100b090)
support the case when the injected node is not supported in dependency analysis (c67d8f6)
support truncate long/double to int/float with option (740eb54)
Try to submit review before exit (9a9d7f0)
update truncate long/double python api (69e49e8)
//docker: Adding Docker 21.03 (9b326e8)
update truncate long/double warning message (60dba12)
//docker: Update CI container (df63467)
//py: Allowing people using the PyTorch backend to use TRTorch/TRT (6c3e0ad)
//py: Catch when bazel is not in path and error out when running (1da999d)
//py: Gate partial compilation from to_backend API (bf1b2d8)
//py: New API to embed engine in new module (88d07a9)
aten::floor: Adds floor.int evaluator (a6a46e5)

BREAKING CHANGES

PyTorch version has been bumped to 1.8.0 Default CUDA version is CUDA 11.1 TensorRT version is TensorRT 7.2.3.4 cuDNN version is now cuDNN 8.1

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

Due to issues with compatability between PyTorch 1.8.0 and 1.8.1 in the Torch Python API, TRTorch 0.3.0 compiled for 1.8.0 does not work with PyTorch 1.8.1 and will show an error about use_input_stats. If you see this error make sure the version of libtorch you are compiling with is PyTorch 1.8.1

TRTorch 0.3.0 will target PyTorch 1.8.1. There is no backwards compatability with 1.8.0. If you need this specific version compile from source with the dependencies in WORKSPACE changed

Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

Supported Operators in TRTorch v0.3.0

Operators Currently Supported Through Converters

aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
aten::abs(Tensor self) -> (Tensor)
aten::acos(Tensor self) -> (Tensor)
aten::acosh(Tensor self) -> (Tensor)
aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::asin(Tensor self) -> (Tensor)
aten::asinh(Tensor self) -> (Tensor)
aten::atan(Tensor self) -> (Tensor)
aten::atanh(Tensor self) -> (Tensor)
aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
aten::ceil(Tensor self) -> (Tensor)
aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
aten::cos(Tensor self) -> (Tensor)
aten::cosh(Tensor self) -> (Tensor)
aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::erf(Tensor self) -> (Tensor)
aten::exp(Tensor self) -> (Tensor)
aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
aten::floor(Tensor self) -> (Tensor)
aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
aten::log(Tensor self) -> (Tensor)
aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::matmul(Tensor self, Tensor other) -> (Tensor)
aten::max(Tensor self) -> (Tensor)
aten::max.other(Tensor self, Tensor other) -> (Tensor)
aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::min(Tensor self) -> (Tensor)
aten::min.other(Tensor self, Tensor other) -> (Tensor)
aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
aten::neg(Tensor self) -> (Tensor)
aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
aten::prelu(Tensor self, Tensor weight) -> (Tensor)
aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::reciprocal(Tensor self) -> (Tensor)
aten::relu(Tensor input) -> (Tensor)
aten::relu_(Tensor(a!) self) -> (Tensor(a!))
aten::repeat(Tensor self, int[] repeats) -> (Tensor)
aten::reshape(Tensor self, int[] shape) -> (Tensor)
aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
aten::sigmoid(Tensor input) -> (Tensor)
aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
aten::sin(Tensor self) -> (Tensor)
aten::sinh(Tensor self) -> (Tensor)
aten::slice.Tensor(Tensor(a) self, int dim=0, int start=0, int end=9223372036854775807, int step=1) -> (Tensor(a))
aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
aten::sqrt(Tensor self) -> (Tensor)
aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
aten::tan(Tensor self) -> (Tensor)
aten::tanh(Tensor input) -> (Tensor)
aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
trt::const(Tensor self) -> (Tensor)

Operators Currently Supported Through Evaluators

aten::Bool.float(float b) -> (bool)
aten::Bool.int(int a) -> (bool)
aten::Float.Scalar(Scalar a) -> float
aten::Float.bool(bool a) -> float
aten::Float.int(int a) -> float
aten::and(int a, int b) -> (bool)
aten::getitem.t(t list, int idx) -> (t(*))
aten::is(t1 self, t2 obj) -> bool
aten::isnot(t1 self, t2 obj) -> bool
aten::not(bool self) -> bool
aten::or(int a, int b) -> (bool)
aten::__round_to_zero_floordiv(int a, int b) -> (int)
aten::xor(int a, int b) -> (bool)
aten::add.float(float a, float b) -> (float)
aten::add.int(int a, int b) -> (int)
aten::add_.t(t self, t[] b) -> (t[])
aten::append.t(t self, t(c -> *) el) -> (t)
aten::dim(Tensor self) -> int
aten::div.float(float a, float b) -> (float)
aten::div.int(int a, int b) -> (float)
aten::eq.bool(bool a, bool b) -> (bool)
aten::eq.float(float a, float b) -> (bool)
aten::eq.float_int(float a, int b) -> (bool)
aten::eq.int(int a, int b) -> (bool)
aten::eq.int_float(int a, float b) -> (bool)
aten::floor.float(float a) -> (int)
aten::floordiv.float(float a, float b) -> (int)
aten::floordiv.int(int a, int b) -> (int)
aten::ge.bool(bool a, bool b) -> (bool)
aten::ge.float(float a, float b) -> (bool)
aten::ge.float_int(float a, int b) -> (bool)
aten::ge.int(int a, int b) -> (bool)
aten::ge.int_float(int a, float b) -> (bool)
aten::gt.bool(bool a, bool b) -> (bool)
aten::gt.float(float a, float b) -> (bool)
aten::gt.float_int(float a, int b) -> (bool)
aten::gt.int(int a, int b) -> (bool)
aten::gt.int_float(int a, float b) -> (bool)
aten::le.bool(bool a, bool b) -> (bool)
aten::le.float(float a, float b) -> (bool)
aten::le.float_int(float a, int b) -> (bool)
aten::le.int(int a, int b) -> (bool)
aten::le.int_float(int a, float b) -> (bool)
aten::len.t(t[] a) -> (int)
aten::lt.bool(bool a, bool b) -> (bool)
aten::lt.float(float a, float b) -> (bool)
aten::lt.float_int(float a, int b) -> (bool)
aten::lt.int(int a, int b) -> (bool)
aten::lt.int_float(int a, float b) -> (bool)
aten::mul.float(float a, float b) -> (float)
aten::mul.int(int a, int b) -> (int)
aten::ne.bool(bool a, bool b) -> (bool)
aten::ne.float(float a, float b) -> (bool)
aten::ne.float_int(float a, int b) -> (bool)
aten::ne.int(int a, int b) -> (bool)
aten::ne.int_float(int a, float b) -> (bool)
aten::neg.int(int a) -> (int)
aten::numel(Tensor self) -> int
aten::size(Tensor self) -> (int[])
aten::size.int(Tensor self, int dim) -> (int)
aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
aten::sub.float(float a, float b) -> (float)
aten::sub.int(int a, int b) -> (int)
prim::max.bool(bool a, bool b) -> (bool)
prim::max.float(float a, float b) -> (bool)
prim::max.float_int(float a, int b) -> (bool)
prim::max.int(int a, int b) -> (bool)
prim::max.int_float(int a, float b) -> (bool)
prim::max.self_int(int[] self) -> (int)
prim::min.bool(bool a, bool b) -> (bool)
prim::min.float(float a, float b) -> (bool)
prim::min.float_int(float a, int b) -> (bool)
prim::min.int(int a, int b) -> (bool)
prim::min.int_float(int a, float b) -> (bool)
prim::min.self_int(int[] self) -> (int)
prim::shape(Tensor a) -> (int[])

TRTorch Versions Save

v2.2.0

Dynamo Frontend for Torch-TensorRT, PyTorch 2.2, CUDA 12.1, TensorRT 8.6

Output Format

Multi-GPU Safety

Capability Validators

Breaking Changes

What's Changed

New Contributors

v1.4.0

PyTorch 2.0, CUDA 11.8, TensorRT 8.6, Support for the new torch.compile API, compatibility mode for FX frontend

torch.compile` Backend for Torch-TensorRT

fx_ts_compat Frontend

What's Changed

New Contributors

v1.3.0

PyTorch 1.13, CUDA 11.7, TensorRT 8.5, Support for Dynamic Batch for Partially Compiled Modules, Engine Profiling, Experimental Unified Runtime for FX and TorchScript Frontends

Dynamic Batch Sizes for Partially Compiled Modules via the TorchScript Frontend

Engine Profiling [Experimental]

Unified Runtime for FX and TorchScript Frontends [Experimental]

Basic Usage

TRTModuleNext

Examples

Using TRTModuleNext as an arbirary TensorRT engine holder

What's Changed

New Contributors

v1.2.0

PyTorch 1.12, Collections based I/O, FX Frontend, torchtrtc custom op support, CMake build system and Community Window Support

New Dependencies

nvidia-tensorrt

torchvision

Jetson

Collections based I/O [Experimental]

Adding FX frontend to Torch-TensorRT [Beta]

Custom operators and converters in Torch-TensorRT

Community CMake and Windows support

Known Limitations

Dependencies

Operators Supported (TorchScript)

Operators Currently Supported Through Converters

Operators Currently Supported Through Evaluators

What's Changed

New Contributors

v1.1.1

Adding support for Torch-TensorRT on Jetpack 5.0 Developer Preview

Known Limitations

What's Changed

Operators Supported

Operators Currently Supported Through Converters

Operators Currently Supported Through Evaluators

v1.1.0

Support for PyTorch 1.11, Various Bug Fixes, Partial aten::Int support, New Debugging Tools, Removing Max Batch Size

New Debugging Tools

Removing Max Batch Size, Strict Types

Porting forward from max_batch_size, strict_types:

Dependencies

1.1.0 (2022-05-10)

Bug Fixes

Code Refactoring

Features

BREAKING CHANGES

Operators Supported

Operators Currently Supported Through Converters

Operators Currently Supported Through Evaluators

v1.0.0

New Name!, Support for PyTorch 1.10, CUDA 11.3, New Packaging and Distribution Options, Stabilized APIs, Stabilized Partial Compilation, Adjusted Default Behavior, Usability Improvements, New Converters, Bug Fixes

New Name

Python

C++

CLI:

New Distribution Options and Packaging

Stabilized APIs

Python

Example:

C++

CLI

Stabilized Partial Compilation

Adjusted Defaults

Input Types

Workspace Size

PyTorch 2.0, CUDA 11.8, TensorRT 8.6, Support for the new `torch.compile` API, compatibility mode for FX frontend

`fx_ts_compat` Frontend

`nvidia-tensorrt`

`torchvision`

Support for PyTorch 1.11, Various Bug Fixes, Partial `aten::Int` support, New Debugging Tools, Removing Max Batch Size

Porting forward from `max_batch_size`, `strict_types`:

Bug Fixes for Module Ignorelist for Partial Compilation, `trtorch.Device`, Version updates for PyTorch, TensorRT, cuDNN

`trtorch.Device`