Open deep learning compiler stack for cpu, gpu and specialized accelerators
The TVM community has worked since the v0.9 release to deliver the following new exciting improvments!
database
changes, tune_relay
)PadEinsum
DeclBuffer
And many other general improvements to code quality, TVMScript, and more! Please visit the full listing of commits for a complete view: https://github.com/apache/tvm/compare/v0.9.0...v0.10.0rc0.
These RFCs have been merged in apache/tvm-rfcs since the last release.
Please visit the full listing of commits for a complete view: https://github.com/apache/tvm/compare/v0.9.0...v0.10.0rc0.
Note that this list is not comprehensive of all PRs and discussions since v0.9. A non-truncated summary can be found here: https://github.com/apache/tvm/issues/12979
The TVM community has worked since the v0.8 release to deliver many exciting features and improvements. v0.9.0 is the first release on the new quarterly release schedule and includes many highlights, such as:
tvm.relay.build
parameters - runtime=
, executor=
,llvm
and c
targets only) and support for host-driven AOT in the C runtimeschedule.transform_layout
- Applies a layout transformation to a buffer as specified by an IndexMap.schedule.transform_block_layout
- Applies a schedule transformation to a block as specified by an IndexMap.schedule.set_axis_separators
- Sets axis separators in a buffer to lower to multi-dimensional memory (e.g. texture memory).transform.InjectSoftwarePipeline
- Transforms annotated loop nest into a pipeline prologue, body and epilogue where producers and consumers are overlapped.transform.CommonSubexprElimTIR
- Implements common-subexpression elimination for TIR.transform.InjectPTXAsyncCopy
- Rewrites global to shared memory copies in CUDA with async copy when annotated tir::attr::async_scope.transform.LowerCrossThreadReduction
- Enables support for reductions across threads on GPUs.These RFCs have been merged in apache/tvm-rfcs since the last release.
48d47c5
)cfcf114
)87ff1fa
)f47c6ad
)6990e13
)a518000
)70293c7
)7aed0ca
)ac15f2a
)de4fe97
)4203bd2
)b9e246f
)23250f5
)540c1f8
)b675ef8
)d9dd6eb
)9b6203a
)41e5ba0
)PackedFunc
into TVM Object System (#51) (2e0de6c
)f5ef65f
)f9fa824
)1b14456
)a3a7d2c
)263335f
)1a3d4f1
)67c39d2
)Note that this list is not comprehensive of all PRs and discussions since v0.8. Please visit the full listing of commits for a complete view: https://github.com/apache/tvm/compare/v0.8.0...v0.9.0.rc0.
conv2d_backward_weight
op (without topi)PackedFunc
into TVM Object Systemget_input_info
to graph_executoraccess_ptr
rewriting, add a GPU test with depth 4Apache TVM v0.8 brings several major exciting experimental features, including:
Besides, The community has been working together to refactor and evolve the existing infrastructure, including but not limited to:
Full changelog: https://gist.github.com/junrushao1994/c669905dbc41edc2e691316df49d8562.
The community has adopted a formal RFC process. Below is a list of the formal RFCs accepted by the community since then:
tir.allocate
nodescompute-inline
, reverse-compute-inline
, fuse
, split
, rfactor
, storage-align
, vectorize
, unroll
, bind
, reorder
, cache-read
, cache-write
, compute-at
, reverse-compute-at
, decompose-reduction
#8170 #8467 #8544 #8693 #8716 #8767 #8863 #8943 #9041
specialize
#8354
PointerType
#8017 #8366 #8463
set_output_zero_copy
in graph executor #8497
LLVM backend: recover LLVM support on windows; support target feature strings in function attributes; atomic support in NVPTX, ROCm; LLVM compatibility to LLVM 12+ #9305 #9223 #9138 #8860 #8958 #6763 #6698 #6717 #6738 #8293 #6907 #7051
ROCm 3.9 bitcode files search #6865
Vulkan and SPIR-V refactoring and major improvement in codegen and runtime. A critical bug fix in SPIRV codegen allows the Vulkan backend to produce correct outputs on more hardwares and drivers. Added support for querying device specific hardware parameters and capabilities, dynamic shapes, irregular ops such as sorting and NMS, UBO, fp16, and vectorization. We can now run complicated models like MaskRCNN on Vulkan end to end. #8904 #7833 #7717 #7681 #8746 #8813 #7609 #8882 #7607 #7591 #7574 #7572 #7833 #6662 #7969 #8013 #8048 #8098 #8102 #8107 #8127 #8151 #8196 #8320 #8588 #8332 #8333 #8348 #8528
Metal language version upgrade (MTLLanguageVersion2_3
), better codegen support, int64 support, various bug fixes #7830 #7819 #7714 #7118 #7116 #7105 #7980 #8054 #8175 #8202 #8206 #8313
OpenCL, VTA, Verilator: refactored code generator, better error messages, various bug fixes #7834 #7777 #7761 #7100 #6125 #6126 #6191 #7834 #8256 #8257 #8731 #8756 #8973
CUDA: enable __launch_bounds__
, dynamic shared memory, TensorCore, BF16, half2, NVCC version upgrade #9341 #8678 #7561 #7273 #7146 #7147 #7099 #7065 #7033 #7014 #7907 #7964 #9087 #8135 #8137 #8457 #8466 #8571
ARM: CMSIS-NN, Ethos-N #8653 #7628 #8951 #7506 #7443 #7858 #6982 #8795 #8806 #8833 #9147 #9159 #9160 #9162 #9163 #9167 #9209 #9386 #9387
Hexagon: build, compilation, model launcher, more target options and better runtime #7784 #6718 #8821 #8822 #9033 #8823 #8859 #8865 #8915 #8954 #9024 #9025 #8960 #8986 #9010 #9011 #9189 #9220 #9355 #9356
WASM: Update support for latest emcc, add ffi test. #6751
--disable-pass
and --config
#7816 #8253
Apache TVM (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC.
Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects.
While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
Apache TVM (incubating) 0.6.1 is a maintenance release incorporating important bug fixes and important performance improvements. All users of Apache TVM (incubating) 0.6.0 are advised to upgrade. Please review following release notes to learn the bug fixes.
NOTE: This is a release pre apache incubation
This release features several major improvements. Some of the highlights are: Arbitrary bits quantization algorithm; High-level auto-differentiable programming IR--Relay(NNVMv2).
The community welcomes new reviewers @nishi-t @were @siju-samuel @jroesch @xqdan @zhiics @grwlf @ajtulloch @vinx13 @junrushao1994 @FrozenGene @liangfu , new committers @srkreddy1238 @eqy @masahi @nhynes @phisiart @merrymercy @Laurawly @adityaatluri @Huyuwei
Code reviewers
Code contributions
NOTE: This is a release pre apache incubation
This release features several major improvements. The high-level graph optimizer is now part of TVM repo. Some of the highlights are: Initial support of AutoTVM for automated optimization; customized accelerator backend VTA. Please also check out tvm.ai for latest blogposts.
The community welcomes new reviewers @kazum @alex-weaver @masahi @zhreshold @PariksheetPinjari909 @srkreddy1238 @eqy, new code owner @merrymercy, and new committer @yzhliu
See the complete list here. Thanks to all the contributors to contribute to this release.
Code reviewers
Compiler
TOPI, graph optimization
Frontends
Deploy
NOTE: This is a release pre apache incubation
This release features numerous improvements in TOPI and backends. We make the first step toward object detection support in TOPI, featuring operators necessary for YOLO and SSDs. The topi now supports numpy-style API and operator overloading. RPC is significantly improved to support resource allocation and using a pool of devices. We are adding two new backends: WebGL for running GPUs on the browser, and Vulkan for running on next-generation graphics API. Please also check out tvm blogs for latest blogposts
See complete list here. Thanks to all the contributors to contribute to this release.
Code Reviewers
TOPI:
Compiler: