PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
Torch-TensorRT 2.2.0 targets PyTorch 2.2, CUDA 12.1 (builds for CUDA 11.8 are available via the PyTorch package index - https://download.pytorch.org/whl/cu118) and TensorRT 8.6. This release is the second major release of Torch-TensorRT as the default frontend has changed from TorchScript to Dynamo allowing for users to more easily control and customize the compiler in Python.
The dynamo frontend can support both JIT workflows through torch.compile
and AOT workflows through torch.export + torch_tensorrt.compile
. It targets the Core ATen Opset (https://pytorch.org/docs/stable/torch.compiler_ir.html#core-aten-ir) and currently has 82% coverage. Just like in Torchscript graphs will be partitioned based on the ability to map operators to TensorRT in addition to any graph surgery done in Dynamo.
Through the Dynamo frontend, different output formats can be selected for AOT workflows via the output_format
kwarg. The choices are torchscript
where the resulting compiled module will be traced with torch.jit.trace
, suitable for Pythonless deployments, exported_program
a new serializable format for PyTorch models or finally if you would like to run further graph transformations on the resultant model, graph_module
will return a torch.fx.GraphModule
.
To address a long standing source of overhead, single GPU systems will now operate without typical required device checks. This check can be re-added when multiple GPUs are available to the host process using torch_tensorrt.runtime.set_multi_device_safe_mode
# Enables Multi Device Safe Mode
torch_tensorrt.runtime.set_multi_device_safe_mode(True)
# Disables Multi Device Safe Mode [Default Behavior]
torch_tensorrt.runtime.set_multi_device_safe_mode(False)
# Enables Multi Device Safe Mode, then resets the safe mode to its prior setting
with torch_tensorrt.runtime.set_multi_device_safe_mode(True):
...
More information can be found here: https://pytorch.org/TensorRT/user_guide/runtime.html
In the Dynamo frontend, tests can be written and associated with converters to dynamically enable or disable them based on conditions in the target graph.
For example, the convolution converter in dynamo only supports 1D, 2D, and 3D convolution. We can therefore create a lambda which given a convolution FX node can determine if the convolution is supported:
@dynamo_tensorrt_converter(
torch.ops.aten.convolution.default,
capability_validator=lambda conv_node: conv_node.args[7] in ([0], [0, 0], [0, 0, 0])
) # type: ignore[misc]
def aten_ops_convolution(
ctx: ConversionContext,
target: Target,
args: Tuple[Argument, ...],
kwargs: Dict[str, Argument],
name: str,
) -> Union[TRTTensor, Sequence[TRTTensor]]:
In such a case where the Node
is not supported, the node will be partitioned out and run in PyTorch.
All capability validators are run prior to partitioning, after the lowering phase.
More information on writing converters for the Dynamo frontend can be found here: https://pytorch.org/TensorRT/contributors/dynamo_converters.html
torch.nn.Module
s or torch.fx.GraphModule
s provided to torch_tensorrt.compile
will by default be exported using torch.export
then compiled. This default can be overridden by setting the ir=[torchscript|fx]
kwarg. Any bugs reported will first be attempted to be resolved in the dynamo stack before attempting other frontends however pull requests for additional functionally in the TorchScript and FX frontends from the community will still be accepted.main
by @gs-olive in https://github.com/pytorch/TensorRT/pull/1784
input_signature
) by @gs-olive in https://github.com/pytorch/TensorRT/pull/1656
TRTEngine.to_str()
method by @gs-olive in https://github.com/pytorch/TensorRT/pull/1846
aten.mean.default
and aten.mean.dim
converters by @gs-olive in https://github.com/pytorch/TensorRT/pull/1810
torch._dynamo
import in __init__
by @gs-olive in https://github.com/pytorch/TensorRT/pull/1881
acc_ops
convolution layers in FX by @gs-olive in https://github.com/pytorch/TensorRT/pull/1886
main
to TRT 8.6, CUDA 11.8, CuDNN 8.8, Torch Dev by @gs-olive in https://github.com/pytorch/TensorRT/pull/1852
torch.compile
path by @gs-olive in https://github.com/pytorch/TensorRT/pull/1879
aten.cat
by @gs-olive in https://github.com/pytorch/TensorRT/pull/1863
.numpy()
issue on fake tensors by @gs-olive in https://github.com/pytorch/TensorRT/pull/1949
aten::Int.Tensor
uses by @gs-olive in https://github.com/pytorch/TensorRT/pull/1937
aten.addmm
by @gs-olive in https://github.com/pytorch/TensorRT/pull/1953
convert_method_to_trt_engine
calls by @gs-olive in https://github.com/pytorch/TensorRT/pull/1945
TRTInterpreter
impl in Dynamo compile [1 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/2002
options
kwargs for Torch compile [3 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/2005
TRTInterpreter
[2 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/2004
2.1.0.dev20230605
[4 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/1975
impl
+ add feature (FX converter refactor) by @gs-olive in https://github.com/pytorch/TensorRT/pull/1972
TorchTensorRTModule
in Dynamo [1 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/2003
truncate_long_and_double
in Dynamo [8 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/1983
aten
PRs to Dynamo converter registry by @gs-olive in https://github.com/pytorch/TensorRT/pull/2070
torch_tensorrt.dynamo.compile
path [1.1 / x] by @gs-olive in https://github.com/pytorch/TensorRT/pull/1966
main
by @gs-olive in https://github.com/pytorch/TensorRT/pull/2129
pyyaml
import to GHA Docker job by @gs-olive in https://github.com/pytorch/TensorRT/pull/2170
aten.embedding
to reflect schema by @gs-olive in https://github.com/pytorch/TensorRT/pull/2182
_to_copy
, operator.get
and clone
ATen converters by @gs-olive in https://github.com/pytorch/TensorRT/pull/2161
aten.where
by @gs-olive in https://github.com/pytorch/TensorRT/pull/2228
dynamic=False
in torch.compile
call by @gs-olive in https://github.com/pytorch/TensorRT/pull/2240
aten.expand
by @gs-olive in https://github.com/pytorch/TensorRT/pull/2234
pip
installation by @gs-olive in https://github.com/pytorch/TensorRT/pull/2239
require_full_compilation
in Dynamo by @gs-olive in https://github.com/pytorch/TensorRT/pull/2138
clone
and to_copy
where input of graph is output by @gs-olive in https://github.com/pytorch/TensorRT/pull/2265
get_ir
prefixes by @gs-olive in https://github.com/pytorch/TensorRT/pull/2369
aten.where
with Numpy + Broadcast by @gs-olive in https://github.com/pytorch/TensorRT/pull/2372
release/2.1
by @gs-olive in https://github.com/pytorch/TensorRT/pull/2387
release/2.1
by @gs-olive in https://github.com/pytorch/TensorRT/pull/2414
release
to Torch 2.1.1 by @gs-olive in https://github.com/pytorch/TensorRT/pull/2472
release/2.1
CI Repair by @gs-olive in https://github.com/pytorch/TensorRT/pull/2528
main
by @gs-olive in https://github.com/pytorch/TensorRT/pull/2574
release/2.2
by @gs-olive in https://github.com/pytorch/TensorRT/pull/2628
compile
(#2635) by @gs-olive in https://github.com/pytorch/TensorRT/pull/2638
Full Changelog: https://github.com/pytorch/TensorRT/compare/v1.4.0...v2.2.0
torch.compile
API, compatibility mode for FX frontendTorch-TensorRT 1.4.0 targets PyTorch 2.0, CUDA 11.8, TensorRT 8.5. This release introduces a number of beta features to set the stage for working with PyTorch and TensorRT in the 2.0 ecosystem. Primarily, this includes a new torch.compile
backend targeting Torch-TensorRT. It also adds a compatibility layer that allows users of the TorchScript frontend for Torch-TensorRT to seamlessly try FX and Dynamo.
One of the most prominent new features in PyTorch 2.0 is the torch.compile
workflow, which enables users to accelerate code easily by specifying a backend of their choice. Torch-TensorRT 1.4.0 introduces a new backend for torch.compile
as a beta feature, including a convenience frontend to perform accelerated inference. This frontend can be accessed in one of two ways:
import torch_tensorrt
torch_tensorrt.dynamo.compile(model, inputs, ...)
##### OR #####
torch_tensorrt.compile(model, ir="dynamo_compile", inputs=inputs, ...)
For more examples, see the provided sample scripts, which can be found here This compilation method has a couple key considerations:
aten
library of converters to accelerate modelsfx_ts_compat
FrontendAs the ecosystem transitions from TorchScript to Dynamo, users of Torch-TensorRT may want start to experiment with this stack. As such we have introduced a new frontend for Torch-TensorRT which exposes the same APIs as the TorchScript frontend but will use the FX/Dynamo compiler stack. You can try this frontend by using the ir="fx_ts_compat"
setting
torch_tensorrt.compile(..., ir="fx_ts_compat")
aten::where
with differing-shape inputs bugfix by @gs-olive in https://github.com/pytorch/TensorRT/pull/1533
align_corners=False
- FX interpolate by @gs-olive in https://github.com/pytorch/TensorRT/pull/1561
aten::full_like
evaluator by @gs-olive in https://github.com/pytorch/TensorRT/pull/1584
RemoveDropout
lowering pass implementation with modified JIT pass by @gs-olive in https://github.com/pytorch/TensorRT/pull/1589
aten::select
by @gs-olive in https://github.com/pytorch/TensorRT/pull/1623
input_signature
by @gs-olive in https://github.com/pytorch/TensorRT/pull/1698
release/1.4
by @gs-olive in https://github.com/pytorch/TensorRT/pull/1884
acc
convolution fix to release/1.4
by @gs-olive in https://github.com/pytorch/TensorRT/pull/1910
release/1.4
) by @gs-olive in https://github.com/pytorch/TensorRT/pull/1931
release/1.4
to Torch 2.0.1 + TensorRT 8.6.1 by @gs-olive in https://github.com/pytorch/TensorRT/pull/1896
release/1.4
) by @gs-olive in https://github.com/pytorch/TensorRT/pull/1956
Full Changelog: https://github.com/pytorch/TensorRT/compare/v1.3.0...v1.4.0
Torch-TensorRT 1.3.0 targets PyTorch 1.13, CUDA 11.7, cuDNN 8.5 and TensorRT 8.5. This release focuses on adding support for Dynamic Batch Sizes for partially compiled modules using the TorchScript frontend (this is also supported with the FX frontend). It also introduces a new execution profiling utility to understand the execution of specific engine sub blocks that can be used in conjunction with PyTorch profiling tools to understand the performance of your model post compilation. Finally this release introduces a new experimental unified runtime shared by both the TorchScript and FX frontends. This allows you to start using the FX frontend to generate torch.jit.trace
able compiled modules.
A long-standing limitation of the partitioning system in the TorchScript function is lack of support for dynamic shapes. In this release we address a major subset of these use cases with support for dynamic batch sizes for modules that will be partially compiled. Usage is the same as the fully compiled workflow where using the torch_tensorrt.Input
class, you may define the range of shapes that an input may take during runtime. This is represented as a set of 3 shape sizes: min
, max
and opt
. min
and max
define the dynamic range of the input Tensor. opt
informs TensorRT what size to optimize for provided there are multiple valid kernels available. TensorRT will select kernels that are valid for the full range of input shapes but most efficient at the opt
size. In this release, partially compiled module inputs can vary in shape for the highest order dimension.
For example:
min_shape: (1, 3, 128, 128)
opt_shape: (8, 3, 128, 128)
max_shape: (32, 3, 128, 128)
Is a valid shape range, however:
min_shape: (1, 3, 128, 128)
opt_shape: (1, 3, 256, 256)
max_shape: (1, 3, 512, 512)
is still not supported.
This release introduces a number of profiling tools to measure the performance of TensorRT sub blocks in compiled modules. This can be used in conjunction with PyTorch profiling tools to get a picture of the performance of your model. Profiling for any particular sub block can be enabled by the enabled_profiling()
method of any __torch__.classes.tensorrt.Engine
attribute, or of any torch_tensorrt.TRTModuleNext
. The profiler will dump trace files by default in /tmp
, though this path can be customized by either setting the profile_path_prefix
of __torch__.classes.tensorrt.Engine
or as an argument to torch_tensorrt.TRTModuleNext.enable_precision(profiling_results_dir="")
. Traces can be visualized using the Perfetto tool (https://perfetto.dev)
Engine Layer information can also be accessed using get_layer_info
which returns a JSON string with the layers / fusions that the engine contains.
In previous versions of Torch-TensorRT, the FX and TorchScript frontends were mostly separate and each had their distinct benefits and limitations. Torch-TensorRT 1.3.0 introduces a new unified runtime to support both FX and TorchScript meaning that you can choose the compilation workflow that makes the most sense for your particular use case, be it pure Python conversion via FX or C++ Torchscript compilation. Both frontends use the same primitives to construct their compiled graphs be it fully compiled or just partially.
The TorchScript frontend uses the new runtime by default. No additional workflow changes are necessary.
Note: The runtime ABI version was increased to support this feature, as such models compiled with previous versions of Torch-TensorRT will need to be recompiled
For the FX frontend, the new runtime can be chosen but setting use_experimental_fx_rt=True
as part of your compile settings to either torch_tensorrt.compile(my_mod, ir="fx", use_experimental_fx_rt=True, explicit_batch_dimension=True)
or torch_tensorrt.fx.compile(my_mod, use_experimental_fx_rt=True, explicit_batch_dimension=True)
Note: The new runtime only supports explicit batch dimension
The FX frontend will return a torch.nn.Module
containing torch_tensorrt.TRTModuleNext
submodules instead of torch_tensorrt.fx.TRTModule
s. The features of these modules are nearly identical but with a few key improvements.
TRTModuleNext
profiling dumps a trace visualizable with Perfetto (see above for more details).TRTModuleNext
modules are torch.jit.trace
-able, meaning you can save FX compiled modules as TorchScript for python-less / C++ deployment scenarios. Traced compiled modules have the same deployment instructions as compiled modules produced by the TorchScript frontend.TRTModule
supports as well (state_dict / extra_state, torch.save/torch.load)model_fx = model_fx.cuda()
inputs_fx = [i.cuda() for i in inputs_fx]
trt_fx_module_f16 = torch_tensorrt.compile(
model_fx,
ir="fx",
inputs=inputs_fx,
enabled_precisions={torch.float16},
use_experimental_fx_rt=True,
explicit_batch_dimension=True
)
# Save model using torch.save
torch.save(trt_fx_module_f16, "trt.pt")
reload_trt_mod = torch.load("trt.pt")
# Trace and save the FX module in TorchScript
scripted_fx_module = torch.jit.trace(trt_fx_module_f16, example_inputs=inputs_fx)
scripted_fx_module.save("/tmp/scripted_fx_module.ts")
scripted_fx_module = torch.jit.load("/tmp/scripted_fx_module.ts")
... #Get a handle for a TRTModuleNext submodule
# Extract state dictionary
st = trt_mod.state_dict()
# Load the state dict into a new module
new_trt_mod = TRTModuleNext()
new_trt_mod.load_state_dict(st)
Using TorchScript you have long been able to embed an arbritrary TensorRT engine from any source in a TorchScript module using torch_tensorrt.ts.embed_engine_in_new_module
. Now you can do this at the torch.nn.Module
level by directly using TRTModuleNext
and access all the benefits enumerated above.
trt_mod = TRTModuleNext(
serialized_engine,
name="TestModule",
input_binding_names=input_names,
output_binding_names=output_names,
)
The intention is in a future release to have torch_tensorrt.TRTModuleNext
replace torch_tensorrt.fx.TRTModule
as the default TensorRT Module implementation. Feedback on this class or how it is used, the runtime in general or associated features (profiler, engine inspector) is welcomed.
aten::index.Tensor
by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1314
torch.std
and torch.var
support multi-dimensional reductions by @gs-olive in https://github.com/pytorch/TensorRT/pull/1395
aten::split
behavior with negative indexing by @gs-olive in https://github.com/pytorch/TensorRT/pull/1403
aten::masked_fill
by @gs-olive in https://github.com/pytorch/TensorRT/pull/1430
noxfile.py
by @gs-olive in https://github.com/pytorch/TensorRT/pull/1443
aten
operators by @gs-olive in https://github.com/pytorch/TensorRT/pull/1416
aten::div
when using truncation with Int32 tensor inputs by @gs-olive in https://github.com/pytorch/TensorRT/pull/1442
Full Changelog: https://github.com/pytorch/TensorRT/compare/v1.1.0...v1.3.0
Torch-TensorRT 1.2.0 targets PyTorch 1.12, CUDA 11.6, cuDNN 8.4 and TensorRT 8.4. This release focuses on a couple key new APIs to handle function I/O that uses collection types which should enable whole new model classes to be compiled by Torch-TensorRT without source code modification. It also introduces the "FX Frontend", a new frontend for Torch-TensorRT which leverages FX, a high level IR built into PyTorch with extensive Python APIs. For uses cases which do not need to be run outside of Python this may be a strong option to try as it is easily extensible in a familar development enviornment. In Torch-TensorRT 1.2.0, the FX frontend should be considered beta level in stability. torchtrtc
has received improvements which target the ability to handle operators outside of the core PyTorch op set. This includes custom operators from libraries such as torchvision
and torchtext
. Similarlly users can provide custom converters to torchtrtc to extend the compilers support from the command line instead of having to write an application to do so. Finally, Torch-TensorRT introduces community supported Windows and CMake support.
nvidia-tensorrt
For previous versions of Torch-TensorRT, users had to install TensorRT via system package manager and modify their LD_LIBRARY_PATH
in order to set up Torch-TensorRT. Now users should install the TensorRT Python API as part of the installation proceedure. This can be done via the following steps:
pip install nvidia-pyindex
pip install nvidia-tensorrt==8.4.3.1
pip install torch-tensorrt==1.2.0 -f https://github.com/pytorch/tensorrt/releases
Installing the TensorRT pip package will allow Torch-TensorRT to automatically load the TensorRT libraries without any modification to enviornment variables. It is also a necessary dependency for the FX Frontend.
torchvision
Some FX frontend converters are designed to target operators from 3rd party libraries like torchvision. As such, you must have torchvision installed in order to use them. However, this dependency is optional for cases where you do not need this support.
Starting from this release we will be distributing precompiled binaries of our NGC release branches for aarch64 (as well as x86_64), starting with ngc/22.11. These releases are designed to be paired with NVIDIA distributed builds of PyTorch including the NGC containers and Jetson builds and are equivalent to the prepackaged distribution of Torch-TensorRT that comes in the containers. They represent the state of the master branch at the time of branch cutting so may lag in features by a month or so. These releases will come separately to minor version releases like this one. Therefore going forward, these NGC releases should be the primary release channel used on Jetson (including for building from source).
NOTE: NGC PyTorch builds are not identical to builds you might install through normal channels like pytorch.org. In the past this has caused issues in portability between pytorch.org builds and NGC builds. Therefore we strongly recommend in workflows such as exporting a TorchScript module on an x86 machine and then compiling on Jetson to ensure you are using the NGC container release on x86 for your host machine operations. More information about Jetson support can be found along side the 22.07 release (https://github.com/pytorch/TensorRT/releases/tag/v1.2.0a0.nv22.07)
Torch-TensorRT previously has operated under the assumption that nn.Module
forward functions can trivially be reduced to the form forward([Tensor]) -> [Tensor]
. Typically this implies functions fo the form forward(Tensor, Tensor, ... Tensor) -> (Tensor, Tensor, ..., Tensor)
. However as model complexity increases, grouping inputs may make it easier to manage many inputs. Therefore, function signatures similar to forward([Tensor], (Tensor, Tensor)) -> [Tensor]
or forward((Tensor, Tensor)) -> (Tensor, (Tensor, Tensor))
might be more common. In Torch-TensorRT 1.2.0, more of these kinds of uses cases are supported using the new experimental input_signature
compile spec API. This API allows users to group Input specs similar to how they might group the input Tensors they would use to call the original module's forward function. This informs Torch-TensorRT on how to map a Tensor input from its location in a group to the engine and from the engine into its grouping returned back to the user.
To make this concrete consider the following standard case:
class StandardTensorInput(nn.Module):
def __init__(self):
super(StandardTensorInput, self).__init__()
def forward(self, x, y):
r = x + y
return r
x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = StandardTensorInput().eval().to("cuda")
trt_module = torch_tensorrt.compile(
module,
inputs=[
torch_tensorrt.Input(x.shape),
torch_tensorrt.Input(y.shape)
],
min_block_size=1
)
out = trt_module(x,y)
print(out)
Here a user has defined two explicit tensor inputs and used the existing list based API to define the input specs.
With Torch-TensorRT the following use cases are now possible using the new input_signature
API:
class TupleInput(nn.Module):
def __init__(self):
super(TupleInput, self).__init__()
def forward(self, z: Tuple[torch.Tensor, torch.Tensor]):
r = z[0] + z[1]
return r
x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = TupleInput().eval().to("cuda")
trt_module = torch_tensorrt.compile(
module,
input_signature=((x, y),), # Note how inputs are grouped with the new API
min_block_size=1
)
out = trt_module((x,y))
print(out)
class ListInput(nn.Module):
def __init__(self):
super(ListInput, self).__init__()
def forward(self, z: List[torch.Tensor]):
r = z[0] + z[1]
return r
x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = ListInput().eval().to("cuda")
trt_module = torch_tensorrt.compile(
module,
input_signature=([x,y],), # Again, note how inputs are grouped with the new API
min_block_size=1
)
out = trt_module([x,y])
print(out)
Note how the input specs (in this case just example tensors) are provided to the compiler. The input_signature
argument expects a Tuple[Union[torch.Tensor, torch_tensorrt.Input, List, Tuple]]
grouped in a format representative of how the function would be called. In these cases its just a list or tuple of specs.
More advanced cases are supported as we:
class TupleInputOutput(nn.Module):
def __init__(self):
super(TupleInputOutput, self).__init__()
def forward(self, z: Tuple[torch.Tensor, torch.Tensor]):
r1 = z[0] + z[1]
r2 = z[0] - z[1]
r1 = r1 * 10
r = (r1, r2)
return r
x = torch.Tensor([1,2,3For previous versions of Torch-TensorRT, users had to install TensorRT via ]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = TupleInputOutput()
trt_module = torch_tensorrt.compile(
module,
input_signature=((x,y),), # Again, note how inputs are grouped with the new API
min_block_size=1
)
out = trt_module((x,y))
print(out)
class ListInputOutput(nn.Module):
def __init__(self):
super(ListInputOutput, self).__init__()
def forward(self, z: List[torch.Tensor]):
r1 = z[0] + z[1]
r2 = z[0] - z[1]
r = [r1, r2]
return r
x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = ListInputOutput()
trt_module = torch_tensorrt.compile(
module,
input_signature=([x,y],), # Again, note how inputs are grouped with the new API
min_block_size=1
)
out = trt_module((x,y))
print(out)
class MultiGroupIO(nn.Module):
def __init__(self):
super(MultiGroupIO, self).__init__()
def forward(self, z: List[torch.Tensor], a: Tuple[torch.Tensor, torch.Tensor]):
r1 = z[0] + z[1]
r2 = a[0] + a[1]
r3 = r1 - r2
r4 = [r1, r2]
return (r3, r4)
x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = MultiGroupIO().eval.to("cuda")
trt_module = torch_tensorrt.compile(
module,
input_signature=([x,y],(x,y)), # Again, note how inputs are grouped with the new API
min_block_size=1
)
out = trt_module([x,y],(x,y))
print(out)
These features are also supported in C++ as well:
torch::jit::Module mod;
try {
// Deserialize the ScriptModule from a file using torch::jit::load().
mod = torch::jit::load(path);
} catch (const c10::Error& e) {
std::cerr << "error loading the model\n";
}
mod.eval();
mod.to(torch::kCUDA);
std::vector<torch::jit::IValue> inputs_;
for (auto in : inputs) {
inputs_.push_back(torch::jit::IValue(in.clone()));
}
std::vector<torch::jit::IValue> complex_inputs;
auto input_list = c10::impl::GenericList(c10::TensorType::get());
input_list.push_back(inputs_[0]);
input_list.push_back(inputs_[0]);
torch::jit::IValue input_list_ivalue = torch::jit::IValue(input_list);
complex_inputs.push_back(input_list_ivalue);
auto input_shape = torch_tensorrt::Input(in0.sizes(), torch_tensorrt::DataType::kHalf);
auto input_shape_ivalue = torch::jit::IValue(std::move(c10::make_intrusive<torch_tensorrt::Input>(input_shape)));
c10::TypePtr elementType = input_shape_ivalue.type();
auto list = c10::impl::GenericList(elementType);
list.push_back(input_shape_ivalue);
list.push_back(input_shape_ivalue);
torch::jit::IValue complex_input_shape(list);
std::tuple<torch::jit::IValue> input_tuple2(complex_input_shape);
torch::jit::IValue complex_input_shape2(input_tuple2);
auto compile_settings = torch_tensorrt::ts::CompileSpec(complex_input_shape2);
compile_settings.min_block_size = 1;
compile_settings.enabled_precisions = {torch::kHalf};
// // Compile module
auto trt_mod = torch_tensorrt::ts::compile(mod, compile_settings);
auto trt_out = trt_mod.forward(complex_inputs);
Currently this feature should be considered experimental, APIs may be subject to change or folded into existing APIs. There are also limitations introduced by using this feature including the following:
Dict
, namedtuple
)require_full_compilation
while using this featureThis release includes the FX as one of its supported IRs to convert torch models to TensorRT through the new FX frontend. At a high level, this path transforms the model into or consumes an FX graph and similar to the TorchScript frontend converts the graph to TensorRT through the use of a library of converters. The key difference is that it is implemented purely in Python. The role of this FX frontend is to supplement the TS lowering path and to provide users better ease of use and easier extensibility in use cases where removing Python as a dependency is not strictly necessary. Detailed user instructions can be find in the document.
The FX path examples are located under //examples/fx
The FX path unit tests are located under //py/torch_tensorrt/fx/tests
While both the C++ API and Python API provide systems to include and convert custom operators in your model (for instance those implemented in torchvision
) torchtrtc
has been limited to the core opset. In Torch-TensorRT 1.2.0 two new flags have been added to torchtrtc
.
--custom-torch-ops (repeatable) Shared object/DLL containing custom torch operators
--custom-converters (repeatable) Shared object/DLL containing custom converters
These arguments accept paths to .so or DLL files which define custom operators for PyTorch or custom converters for Torch-TensorRT. These files will get DL_OPEN
'd at runtime to extend the op and converter libraries.
For example:
torchtrtc tests/modules/ssd_traced.jit.pt ssd_trt.ts --custom-torch-ops=<path to custom library .so file> --custom-converters=<path to custom library .so file> "[(1,3,300,300); (1,3,512,512); (1, 3, 1024, 1024)]@fp16%contiguous" -p f16
Thanks to the great work of @gcuendet and others, CMake and consequentially Windows support has been added to the project! Users on Linux and Windows can now build the C++ API using this system and using torch_tensorrt_runtime.dll
add support for executing Torch-TensorRT programs on Windows in both Python and C++. Detailed information on how to use this build system can be found here: https://pytorch.org/TensorRT/getting_started/installation.html
Bazel will continue to be the primary build system for the project and all testing and distributed builds will be built and run with Bazel (including future official Windows support) so users should consider this still the canonical version of Torch-TensorRT. However we aim to ensure as best as we can that the CMake system will be able to build the project properly including on Windows. Contributions to continue to grow the support for this build system and Windows as a platform are definitely welcomed.
Dict
, namedtuple
)require_full_compilation
while using this feature- Bazel 5.2.0
- LibTorch 1.12.1
- CUDA 11.6 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build)
- cuDNN 8.4.1.50
- TensorRT 8.4.3.1
aten::index.Tensor
by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1314
Full Changelog: https://github.com/pytorch/TensorRT/compare/v1.1.0...v1.2.0
Torch-TensorRT 1.1.1 is a patch release for Torch-TensorRT 1.1 that targets PyTorch 1.11, CUDA 11.4/11.3, TensorRT 8.4 EA/8.2 and cuDNN 8.3/8.2 intended to add support for Torch-TensorRT on Jetson / Jetpack 5.0 DP. As this release is primarily targeted at adding support for Jetpack 5.0DP for the 1.1 feature set we will not be distributing pre-compiled binaries for this release so as not to break compatibility with the current stack for existing users who install directly from GitHub. Please follow the instructions for installation on Jetson in the documentation to install this release: https://pytorch.org/TensorRT/tutorials/installation.html#compiling-from-source
Full Changelog: https://github.com/pytorch/TensorRT/compare/v1.1.0...v1.1.1
aten::Int
support, New Debugging Tools, Removing Max Batch SizeTorch-TensorRT 1.1.0 targets PyTorch 1.11, CUDA 11.3, cuDNN 8.2 and TensorRT 8.2. Due to recent JetPack upgrades, this release does not support Jetson (Jetpack 5.0DP or otherwise). Jetpack 5.0DP support will arrive in a mid-cycle release (Torch-TensorRT 1.1.x) along with support for TensorRT 8.4. 1.1.0 also drops support for Python 3.6 as it has reached end of life. Following 1.0.0, this release is focused on stabilizing and improving the core of Torch-TensorRT. Many improvements have been made to the partitioning system addressing limitation many users hit while trying to partially compile PyTorch modules. Torch-TensorRT 1.1.0 also addresses a long standing issue with aten::Int
operators (albeit) partially. Now certain common patterns which use aten::Int
can be handled by the compiler without resorting to partial compilation. Most notably, this means that models like BERT can be run end to end with Torch-TensorRT, resulting in significant performance gains.
With this release we are introducing new syntax sugar that can be used to more easily debug Torch-TensorRT compilation and execution through the use of context managers. For example, in Torch-TensorRT 1.0.0 this may be a common pattern to turn on then turn off debug info:
import torch_tensorrt
...
torch_tensorrt.logging.set_reportable_log_level(torch_tensorrt.logging.Level.Debug)
trt_module = torch_tensorrt.compile(my_module, ...)
torch_tensorrt.logging.set_reportable_log_level(torch_tensorrt.logging.Level.Warning)
results = trt_module(input_tensors)
With Torch-TensorRT 1.1.0, this now can be done with the following code:
import torch_tensorrt
...
with torch_tensorrt.logging.debug():
trt_module = torch_tensorrt.compile(my_module,...)
results = trt_module(input_tensors)
You can also use this API to debug the Torch-TensorRT runtime as well:
import torch_tensorrt
torch_tensorrt.logging.set_reportable_log_level(torch_tensorrt.logging.Level.Error)
...
trt_module = torch_tensorrt.compile(my_module,...)
with torch_tensorrt.logging.warnings():
results = trt_module(input_tensors)
The following levels are available:
# Only internal TensorRT failures will be logged
with torch_tensorrt.logging.internal_errors():
# Internal TensorRT failures + Torch-TensorRT errors will be logged
with torch_tensorrt.logging.errors():
# All Errors plus warnings will be logged
with torch_tensorrt.logging.warnings():
# First verbosity level, information about major steps occurring during compilation and execution
with torch_tensorrt.logging.info():
# Second verbosity level, each step is logged + information about compiler state will be outputted
with torch_tensorrt.logging.debug():
# Third verbosity level, all above information + intermediate transformations of the graph during lowering
with torch_tensorrt.logging.graphs():
In this release we are removing the max_batch_size
and strict_types
settings. These settings directly corresponded to the TensorRT settings, however were not always respected which often lead to confusion. Therefore we thought it best to disable these features as deterministic behavior could not be ensured.
max_batch_size
, strict_types
:max_batch_size
: The first dim in shapes provided to Torch-TensorRT are considered batch dimensions, therefore instead of setting max_batch_size
, you can just use the Input objects directlystrict_types
: A replacement with more deterministic behavior will come with an upcoming TensorRT release.- Bazel 5.1.1
- LibTorch 1.11.0
- CUDA 11.3 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build)
- cuDNN 8.2.4.15
- TensorRT 8.2.4.2
Signed-off-by: Dheeraj Peri [email protected]
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
--max-batch-size
from the CLI
as it has no real functional effectSigned-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
This is the first stable release of Torch-TensorRT targeting PyTorch 1.10, CUDA 11.3 (on x86_64, CUDA 10.2 on aarch64), cuDNN 8.2 and TensorRT 8.0 with backwards compatible source for TensorRT 7.1. On aarch64 TRTorch targets Jetpack 4.6 primarily with backwards compatible source for Jetpack 4.5. This version also removes deprecated APIs such as InputRange
and op_precision
TRTorch is now Torch-TensorRT! TRTorch started out as a small experimental project compiling TorchScript to TensorRT almost two years ago and now as we are hitting v1.0.0 with APIs and major features stabilizing we felt that the name of the project should reflect the ecosystem of tools it is joining with this release, namely TF-TRT (https://blog.tensorflow.org/2021/01/leveraging-tensorflow-tensorrt-integration.html) and MXNet-TensorRT(https://mxnet.apache.org/versions/1.8.0/api/python/docs/tutorials/performance/backend/tensorrt/tensorrt). Since we were already significantly changing APIs with this release to reflect what we learned over the last two years of using TRTorch, we felt this is was the right time to change the name as well.
The overall process to port forward from TRTorch is as follows:
trtorch
to torch_tensorrt
trtorch
namespace have now been separated. IR agnostic components: torch_tensorrt.Input
, torch_tensorrt.Device
, torch_tensorrt.ptq
, torch_tensorrt.logging
will continue to live under the top level namespace. IR specific components like torch_tensorrt.ts.compile
, torch_tensorrt.ts.convert_method_to_trt_engine
, torch_tensorrt.ts.TensorRTCompileSpec
will live in a TorchScript specific namespace. This gives us space to explore the other IRs that might be relevant to the project in the future. In the place of the old top level compile
and convert_method_to_engine
are new ones which will call the IR specific versions based on what is provided to them. This also means that you can now provide a raw torch.nn.Module
to torch_tensorrt.compile
and Torch-TensorRT will handle the TorchScripting step for you. For the most part the sole change that will be needed to change over namespaces is to exchange trtorch
to torch_tensorrt
trtorch
to torch_tensorrt
and components specific to the IR like compile
, convert_method_to_trt_engine
and CompileSpec
are in a torchscript
namespace, while agnostic components are at the top level. Namespace aliases for torch_tensorrt
-> torchtrt
and torchscript
-> ts
are included. Again the port forward process for namespaces should be a find and replace. Finally the libraries libtrtorch.so
, libtrtorchrt.so
and libtrtorch_plugins.so
have been renamed to libtorchtrt.so
, libtorchtrt_runtime.so
and libtorchtrt_plugins.so
respectively.trtorch
has been renamed to torchtrtc
Starting with nvcr.io/nvidia/pytorch:21.11
, Torch-TensorRT will be distributed as part of the container (https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). The version of Torch-TensorRT in container will be the state of the master at the time of building. Torch-TensorRT will be validated to run correctly with the version of PyTorch, CUDA, cuDNN and TensorRT in the container. This will serve as the easiest way to have a full validated PyTorch end to end training to inference stack and serves as a great starting point for building DL applications.
Also as part of Torch-TensorRT we are now starting to distribute the full C++ package within the wheel files for the Python packages. By installing the wheel you now get the Python API, the C++ libraries + headers and the CLI binary. This is going to be the easiest way to install Torch-TensorRT on your stack. After installing with pip
pip3 install torch-tensorrt -f https://github.com/NVIDIA/Torch-TensorRT/releases
You can add the following to your PATH
to set up the CLI
PATH=$PATH:<PATH TO TORCHTRT PYTHON PACKAGE>/bin
Many of the APIs have change slighly in this release to be more self consistent and more usable. These changes begin with the Python API for which compile
, convert_method_to_trt_engine
and TensorRTCompileSpec
now instead of dictionaries use kwargs. As features many features came out of beta and experimental stability the necessity to have multiple levels of nesting in settings has decreased, therefore kwargs make much more sense. You can simply port forward to the new APIs by unwrapping your existing compile_spec
dict in the arguments to compile
or similar functions.
compile_settings = {
"inputs": [torch_tensorrt.Input(
min_shape=[1, 3, 224, 224],
opt_shape=[1, 3, 512, 512],
max_shape=[1, 3, 1024, 1024],
# For static size shape=[1, 3, 224, 224]
dtype=torch.half, # Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
)],
"enabled_precisions": {torch.half}, # Run with FP16
}
trt_ts_module = torch_tensorrt.compile(torch_script_module, **compile_settings)
This release also introduces support for providing tensors as examples to Torch-TensorRT. In place of a torch_tensorrt.Input
in the list of inputs you can pass a Tensor. This can only be used to set a static input size. There are also some things to be aware of which will be discussed later in the release notes.
Now that Torch-TensorRT separates components specific to particular IRs to their own namespaces, there is now a replacement for the old compile
and convert_method_to_trt_engine
functions on the top level. These functions take any PyTorch generated format including torch.nn.Module
s and decides the best way to compile it down to TensorRT. In v1.0.0 this means to go through TorchScript and return a Torch.jit.ScriptModule
. You can specify the IR to try using the ir
arg for these functions.
Due to partial compilation becoming stable in v1.0.0, there are now four new fields which replace the old torch_fallback
struct.
complie_spec = {
"torch_fallback": {
"enabled": True, # Turn on or turn off falling back to PyTorch if operations are not supported in TensorRT
"force_fallback_ops": [
"aten::max_pool2d" # List of specific ops to require running in PyTorch
],
"force_fallback_modules": [
"mypymod.mytorchmod" # List of specific torch modules to require running in PyTorch
],
"min_block_size": 3 # Minimum number of ops an engine must incapsulate to be run in TensorRT
}
}
torch_tensorrt.compile(...,
require_full_compilation=False,
min_block_size=3,
torch_executed_ops=[ "aten::max_pool2d" ],
torch_executed_modules=["mypymod.mytorchmod"])
The changes for the C++ API other than the reorganization and renaming of the namespaces, mostly serve to make Torch-TensorRT consistent between Python and C++ namely by renaming trtorch::CompileGraph
to torch_tensorrt::ts::compile
and trtorch::ConvertGraphToTRTEngine
to torch_tensorrt::ts::convert_method_to_trt_engine
. Beyond that similar to Python, the partial compilation struct TorchFallback
has been removed and replaced by four fields in torch_tensorrt::ts::CompileSpec
/**
* @brief A struct to hold fallback info
*/
struct TRTORCH_API TorchFallback {
/// enable the automatic fallback feature
bool enabled = false;
/// minimum consecutive operation number that needs to be satisfied to convert to TensorRT
uint64_t min_block_size = 1;
/// A list of names of operations that will explicitly run in PyTorch
std::vector<std::string> forced_fallback_ops;
/// A list of names of modules that will explicitly run in PyTorch
std::vector<std::string> forced_fallback_modules;
/**
* @brief Construct a default Torch Fallback object, fallback will be off
*/
TorchFallback() = default;
/**
* @brief Construct from a bool
*/
TorchFallback(bool enabled) : enabled(enabled) {}
/**
* @brief Constructor for setting min_block_size
*/
TorchFallback(bool enabled, uint64_t min_size) : enabled(enabled), min_block_size(min_size) {}
};
/**
* Require the full module be compiled to TensorRT instead of potentially running unsupported operations in PyTorch
*/
bool require_full_compilation = false;
/**
* Minimum number of contiguous supported operators to compile a subgraph to TensorRT
*/
uint64_t min_block_size = 3;
/**
* List of aten operators that must be run in PyTorch. An error will be thrown if this list is not empty but
* ``require_full_compilation`` is True
*/
std::vector<std::string> torch_executed_ops;
/**
* List of modules that must be run in PyTorch. An error will be thrown if this list is not empty but
* ``require_full_compilation`` is True
*/
std::vector<std::string> torch_executed_modules;
Similarly these partial compilation fields have been renamed in torchtrtc
:
--require-full-compilation Require that the model should be fully
compiled to TensorRT or throw an error
--teo=[torch-executed-ops...],
--torch-executed-ops=[torch-executed-ops...]
(Repeatable) Operator in the graph that
should always be run in PyTorch for
execution (partial compilation must be
enabled)
--tem=[torch-executed-mods...],
--torch-executed-mods=[torch-executed-mods...]
(Repeatable) Module that should always
be run in Pytorch for execution (partial
compilation must be enabled)
--mbs=consecutive_ops,
--min-block-size=consecutive_ops
Minimum number of contiguous TensorRT
supported ops to compile a subgraph to
TensorRT
Going forward breaking changes to the API the sort of magnitude seen in this release will be accompanied by a major version bump.
Partial compilation should be considered stable for static input shape and is now enabled by default. In the case of dynamic shape, set require_full_compilation
to True
.
Default behavior of Torch-TensorRT has shifted slightly. The most important of these changes is the changes to inferred input type. In prior versions the expected input type for a Tensor barring it being set explicitly was based on the op_precision
. With that field being removed in this release and being replaced with enabled_precisions
introduced in v0.4.0 this sort of behavior no longer makes sense. Therefore now Torch-TensorRT follows these rules to determine Input type for a Tensor.
If no dtype is specified for an Input, Torch-TensorRT will determine the input type by inspecting the uses of this Input. It will trace the lifetime of this tensor to the first tensor operation using weights stored in the provided module. The type of the weights is the inferred type of the Input using the rule that PyTorch requires like types for Tensor operations. The goal with this behavior is to maintain the concept that Torch-TensorRT modules should feel no different than normal PyTorch modules. Therefore you can expect
Weight Type of Model | Expected Input Type For Tensor |
---|---|
FP32 | FP32 |
FP16 | FP16 |
Quantization Workflows | FP32 |
Unknown / Ambiguous | FP32 w/ Warning |
Users can override this behavior to set the Input type to whatever they wish using the dtype
field of torch_tensorrt.Input
. Torch-TensorRT will always respect the user setting but may throw a warning stating that the model provided expects a different input type. This is mainly to notify you that just dropping the compiled module in place of the raw torch.nn.Module
might throw errors and casting before inference might be necessary.
Input(shape=(1, 3, 32, 32), dtype=dtype.half, format=TensorFormat.contiguous)
. This is subject to the behavior in 2.Now by default the workspace size is set to 1GB for all GPUs Pascal based and newer (SM capability 6 or above). Maxwell and older cards including Jetson Nano have a workspace of 256MB by default. This value is user settable.
- Bazel 4.2.1
- LibTorch 1.10.0
- CUDA 11.3 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
- cuDNN 8.2.4.15
- TensorRT 8.0.3.4
Add functionality for tests to use precompiled libraries (b5c324a)
Add QAT patch which modifies scale factor dtype to INT32 (4a10673)
Add TF32 override flag in bazelrc for CI-Testing (7a0c9a5)
Add VGG QAT sample notebook which demonstrates end-end workflow for QAT models (8bf6dd6)
Augment python package to include bin, lib, include directories (ddc0685)
handle scalar type of size [] in shape_analysis (fca53ce)
support aten::and.bool evaluator (6d73e43)
support aten::conv1d and aten::conv_transpose1d (c8dc6e9)
support aten::eq.str evaluator (5643972)
support setting input types of subgraph in fallback, handle Tensor type in evaluated_value_map branch in MarkOutputs (4778b2b)
support truncate_long_and_double in fallback subgraph input type (0bc3c05)
Update documentation with new library name Torch-TensorRT (e5f96d9)
Updating the pre_built to prebuilt (51412c7)
//:libtrtorch: Ship a WORKSPACE file and BUILD file with the (7ac6f1c)
//core/partitioning: Improved logging and code org for the (8927e77)
//cpp: Adding example tensors as a way to set input spec (70a7bb3)
//py: Add the git revision to non release builds (4a0a918)
//py: Allow example tensors from torch to set shape (01d525d)
feat!: Changing the default behavior for selecting the input type (a234335)
refactor!: Removing deprecated InputRange, op_precision and input_shapes (621bc67)
feat(//py)!: Porting forward the API to use kwargs (17e0e8a)
refactor(//py)!: Kwargs updates and support for shifting internal apis (2a0d1c8)
refactor!(//cpp): Inlining partial compilation settings since the (19ecc64)
refactor! : Update default workspace size based on platforms. (391a4c0)
feat!: Turning on partial compilation by default (52e2f05)
refactor!: API level rename (483ef59)
refactor!: Changing the C++ api to be snake case (f34e230)
refactor! : Update Pytorch version to 1.10 (cc7d0b7)
refactor!: Updating bazel version for py build container (06533fe)
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
This was done since the only version of bazel available in our build container for python apis is 4.2.1
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
Given a dict of valid TRTorch CompileSpec settings
spec = {
"inputs": ...
...
}
You can use this same dict with the new APIs by changing your code from:
trtorch.compile(mod, spec)
to:
trtorch.compile(mod, **spec)
which will unpack the dictionary as arguments to the function
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
trtorch.compile(mod, **spec)
Also in preparation for partial compilation to be enabled by default settings related to torch fallback have been moved to the top level
instead of
"torch_fallback": {
"enabled": True,
"min_block_size" " 3,
"forced_fallback_ops" : ["aten::add"],
"forced_fallback_mods" : ["MySubModule"]
}
now there are new settings
require_full_compilation=False,
min_block_size=3,
torch_executed_ops=["aten::add"],
torch_executed_modules=["MySubModule"]
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
Now in the compile spec instead of a torch_fallback
field with its
associated struct, there are four new fields in the compile spec
bool require_full_compilation = true;
uint64_t min_block_size = 3;
std::vector<std::string> torch_executed_ops = {};
std::vector<std::string> torch_executed_modules = {};
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
Signed-off-by: Dheeraj Peri [email protected]
Signed-off-by: Dheeraj Peri [email protected]
Signed-off-by: Dheeraj Peri [email protected]
Signed-off-by: Dheeraj Peri [email protected]
Signed-off-by: Dheeraj Peri [email protected]
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
Signed-off-by: Dheeraj Peri [email protected]
Signed-off-by: Dheeraj Peri [email protected]
If the data type cannot be determined the compiler will default to FP32.
This calculation is done per input tensor so if one input is inferred to use FP32 and another INT32 then the expected types will be the same (FP32, INT32)
As was the same before if the user defines the data type explicitly or provides an example tensor the data type specified there will be respected
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
trtorch.Device
, Version updates for PyTorch, TensorRT, cuDNNThis is the first patch of TRTorch v0.4. It now targets by default PyTorch 1.9.1, TensorRT 8.0.3.4 and cuDNN 8.2.4.15 and CUDA 11.1. Older versions of PyTorch, TensorRT, cuDNN are still supported in the same manner as TRTorch v0.4.0
There was an issue with the pass marking modules to be ignored during compilation where it unsafely assumed that methods are named forward
all the way down the module tree. While this was fine for 1.8.0, with PyTorch 1.9.0, the TorchScript codegen changed slightly to sometimes use methods of other names for modules which reduce trivially to a functional api. This fix now will identify method calls as the recursion point and then use those method calls to select modules to recurse on. It will also check to verify existence of these modules and methods before recursing. Finally this pass was run by default even if the ignore list was empty causing issues for users not using the feature. Therefore this pass is now disabled unless explicitly enabled
trtorch.Device
Some of the constructors for trtorch.Device
would not work or incorrectly configure the device. This patch will fix those issues.
- Bazel 4.0.0
- LibTorch 1.9.1
- CUDA 11.1 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
- cuDNN 8.2.3.4
- TensorRT 8.0.3.4
This is the fourth beta release of TRTorch, targeting PyTorch 1.9, CUDA 11.1 (on x86_64, CUDA 10.2 on aarch64), cuDNN 8.2 and TensorRT 8.0 with backwards compatible source for TensorRT 7.1. On aarch64 TRTorch targets Jetpack 4.6 primarily with backwards compatibile source for Jetpack 4.5. When building on Jetson, the flag --platforms //toolchains:jetpack_4.x
must be now be provided for C++ compilation to select the correct dependency paths. For python by default it is assumed the Jetpack version is 4.6. To override this add the --jetpack-version 4.5
flag when building.
This release adds support for compiling models trained with Quantization aware training (QAT) allowing users using the TensorRT PyTorch Quantization Toolkit (https://github.com/NVIDIA/TensorRT/tree/master/tools/pytorch-quantization) to compile their models using TRTorch. For more information and a tutorial, refer to https://www.github.com/NVIDIA/TRTorch/tree/v0.4.0/examples/int8/qat. It also adds support for sparsity via the sparse_weights
flag in the compile spec. This allows TensorRT to utilize specialized hardware in Ampere GPUs to minimize unnecessary computation and therefore increase computational efficiency.
In v0.4.0 the partial compilation feature of TRTorch can now be considered beta level stability. New in this release is the ability to specify entire PyTorch modules to run in PyTorch explicitly as part of partial compilation. This should let users isolate troublesome code easily when compiling. Again, feedback on this feature is greatly appreciated.
v0.4.0 also changes the "ABI" of TRTorch to now include information about the target device for the program. Programs compiled with v0.4.0 will look for and select the most compatible available device. The rules used are: Any valid device option must have the same SM capability as the device building the engine. From there, TRTorch prefers the same device (e.g. Built on A100 so A100 is better than A30) and finally prefers the same device ID. Users will be warned if this selected device is not the current active device in the course of execution as overhead may be incurred in transferring input tensors from the current device to the target device. Users can then modify their code to avoid this. Due to this ABI change, existing compiled TRTorch programs are incompatible with the TRTorch v0.4.0 runtime. From v0.4.0 onwards an internal ABI version will check program compatibility. This ABI version is only incremented with breaking changes to the ABI.
TRTorch v0.4.0 changes the API for specifying Input shapes and data types to provide users more control over configuration. The new API makes use of the class trtorch.Input
which lets users set the shape (or shape range) as well as memory layout and expected data type. These input specs are set in the input
field of the CompileSpec
.
"inputs": [
trtorch.Input((1, 3, 224, 224)), # Static input shape for input #1
trtorch.Input(
min_shape=(1, 224, 224, 3),
opt_shape=(1, 512, 512, 3),
max_shape=(1, 1024, 1024, 3),
dtype=torch.int32,
format=torch.channel_last,
) # Dynamic input shape for input #2, input type int and channel last format
],
The legacy input_shapes
field and associated usage with lists of tuples/InputRanges
should now be considered deprecated. They remain usable in v0.4.0 but will be removed in the next release. Similarly, the compile spec field op_precision
is now also deprecated in favor of enabled_precisions
. enabled_precisions
is a set containing the data types that kernels will be allowed to use. Whereas setting op_precision = torch.int8
would implicitly enable FP32 and FP16 kernels as well, now enabled_precisions
should be set as {torch.float32, torch.float16, torch.int8}
to do the same. In order to maintain similar behavior to normal PyTorch, if FP16 is the lowest precision enabled but no explicit data type is set for the inputs to the model, the expectation will be that inputs will be in FP16 . For other cases (FP32, INT8) FP32 is the default, similar to PyTorch and previous versions of TRTorch. Finally in the Python API, a class trtorch.Device
has been added. While users can continue to use torch.Device
or other torch APIs, trtorch.Device
allows for better control for the specific use cases of compiling with TRTorch (e.g. setting DLA core and GPU fallback). This class is very similar to the C++ version with a couple additions of syntactic sugar to make the class easier and more familiar to use:
trtorch.Device("dla:0", allow_gpu_fallback=False) #Set device as DLA Core 0 (implicitly sets the GPU managing DLA cores as the GPU and sets fallback to false)
trtorch.Device
can be used instead of a dictionary in the compile spec if desired.
trtorchc
has been updated to reflect these API changes. Users can set the shape, dtype and format of inputs from the command line using the following format "[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]@DTYPE%FORMAT"
e.g. (3, 3, 32,32)@f16%NHWC
. -p
is now a repeatable flag to enable multiple precisions. Also added are repeatable flags --ffm
and --ffo
to mark specific modules and operators for running in PyTorch respectively. To use these two options, --allow-torch-fallback
should be set. Options for embedding serialized engines (--embed-engine
) and sparsity (--sparse-weights
) added as well.
Finally, TRTorch v0.4.0 also now includes the ability to provide backtraces for locations in your model which TRTorch does not support. This can help in identifying locations in the model that might need to change for TRTorch support or modules which should run fully in PyTorch via partial compilation.
- Bazel 4.0.0
- LibTorch 1.9.0
- CUDA 11.1 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
- cuDNN 8.2.2.3
- TensorRT 8.0.1.6
It also implements ABI Versioning. The first entry in the serialized format of a TRTEngine now records the ABI that the engine was compiled with, defining expected compatibility with the TRTorch runtime. If the ABI version does not match, the runtime will error out asking to recompile the program.
ABI version is a monotonically increasing integer and should be incremented everytime the serialization format changes in some way.
This commit cleans up the CudaDevice class, implementing a number of constructors to replace the various utility functions that populate the struct. Descriptive utility functions remain but solely call the relevant constructor.
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]>
This is the third beta release of TRTorch, targeting PyTorch 1.8.x, CUDA 11.1 (on x86_64), TensorRT 7.2, cuDNN 8. TRTorch 0.3.0 binary releases target PyTorch 1.8.1 specifically, these builds are not compatible with 1.8.0, though the source code remains compatible with any PyTorch 1.8.x version. On aarch64 TRTorch targets JetPack 4.5.x. This release introduces libtrtorch_plugins.so
. This library is a portable distribution of all TensorRT plugins used in TRTorch. The intended usecase is to support TRTorch programs that utilize TensorRT plugins deployed on systems with only the runtime library available or in the case that TRTorch was used to create a TensorRT engine to be run outside the TRTorch runtime, which makes uses of TRTorch plugins. An example on how to use this library can be found here: https://www.github.com/NVIDIA/TRTorch/tree/v0.3.0/examples/sample_rt_app. TRTorch 0.3.0 also now allows users to repurpose PyTorch Dataloaders to do post training quantization in Python similar to the workflow supported in C++ currently. It also introduces a new API to wrap arbitrary TensorRT engines in a PyTorch Module wrapper, making the serializable by torch.jit.save
and completely compatible with other PyTorch modules. Finally, TRTorch 0.3.0 also includes a preview of the new partial compilation capability of the TRTorch compiler. With this feature, users can now instruct TRTorch to keep operations that are not supported but TRTorch/TensorRT in PyTorch. Partial compilation should be considered alpha stability and we are seeking feedback on bugs, pain points and feature requests surrounding using this feature.
- Bazel 4.0.0
- LibTorch 1.8.1 (on x86_64), 1.8.0 (on aarch64)
- CUDA 11.1 (on x86_64, by default , newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
- cuDNN 8.1.1
- TensorRT 7.2.3.4
//plugins: Readding cuBLAS BUILD to allow linking of libnvinfer_plugin on Jetson (a8008f4)
//tests/../concat: Concat test fix (2432fb8)
//tests/core/partitioning: Fixing some issues with the partition (ff89059)
erase the repetitive nodes in dependency analysis (80b1038)
fix a typo for debug (c823ebd)
fix typo bug (e491bb5)
aten::linear: Fixes new issues in 1.8 that cause script based (c5057f8)
register the torch_fallback attribute in Python API (8b7919f)
support expand/repeat with IValue type input (a4882c6)
support shape inference for add_, support non-tensor arguments for segmented graphs (46950bb)
feat!: Updating versions of CUDA, cuDNN, TensorRT and PyTorch (71c4dcb)
feat(WORKSPACE)!: Updating PyTorch version to 1.8.1 (c9aa99a)
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]
TRTorch 0.3.0 will target PyTorch 1.8.1. There is no backwards compatability with 1.8.0. If you need this specific version compile from source with the dependencies in WORKSPACE changed
Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]