Incubator Mxnet Versions Save

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

1.8.0

3 years ago

Features

CUDA Graphs

  • Enable CUDA Graphs for TRT (#19184)
  • CUDA graphs support (#19142)
  • Update cudnn version. (#19375)

CUDA 11 Support

  • Update CUB and include it only for CUDA < 11 #18799' (#18975)
  • Add new CI pipeline for building and testing with cuda 11.0. (#19149)
  • Enable CUDA 11.0 on nightly development builds (#19314)

TensorRT

  • TensorRT: add int8 with calibration (#19011)
  • Add TRT verbose mode (#19100)
  • Backporting TensorRT-Gluon Partition API (and TensorRT 7 support) (#18916)
  • Backport TRT test update #19296 (#19298)

OneDNN

  • Upgrade to oneDNN v1.6.3 (#19153) (#19161)
  • Update oneDNN to official v1.6 release (#18867) (#18867)
  • Upgrade to oneDNN v1.6 (#18822)
  • bumped version to v1.6.5 (#19437)
  • Upgrade to oneDNN v1.7 (#19560)

IntGemm

  • Backport of intgemm #17559 (#19099)
  • Change intgemm to a submodule instead of fetch. (#19406)

Subgraph API

  • Backport Fix for duplicate subgraph inputs/outputs (#16131) (#19112)

Extensions

  • Backport #19103 (#19117)
  • Backporting #19016 (#19069)
  • Backport: Change Partition API's options_map to std::unordered_map #18929 (#18964)
  • Backporting #18779 to v1.x (#18894)
  • Backport extension bug fixes to v1.8.x (#19469) (#19504)
  • fix for MX_ERROR_MSG namespace (#19756)

ONNX

  • Update onnx support to work with onnx 1.7.0 with most CV models (#19017)

Large Tensor

  • Fix linalg_potri and linalg_potrf operators for large tensor. (#18752)
  • Add forward, backward test for linalg.gemm2 (#18784)
  • Add large matrix tests for linalg ops: det, inverse, trsm, trmm (#18744)
  • Add Large Tensor Test for linalg_syrk (#18782)
  • Add Large Dim Checks for linalg Operators (#18816)
  • Add forward & backward linalg.gemm test for large size (#18825)
  • Adding error message when attempting to use Large tensor with linalg_syevd (#18807)

Website Improvements

  • v1.8 website patch (#19212)
  • Automate website artifacts uploading (#19244)

Documentation

  • Fix mxnet.test_utils.check_numeric_gradient documentation (#19060)
  • Update windows_setup.md (#18874)

License

  • Stop packaging GPL libquadmath.so (#19055)
  • Remove mention of nightly in pypi (#18635) (#18884)
  • Mkldnn header fix v1x for nightly binaries (#18797)
  • Update LICENSE for all submodules. (#19440)
  • LICENSE update (#19443)
  • Update LICENSE (#19704) (#19707)

CI Improvements

  • Upgrade unix gpu toolchain (#18186) (#18785)
  • Fix CI in v1.x branch (#18907)
  • Remove extra --build-arg causing docker command to fail. (#19412)
  • Fix CI builds failing due to invalid GPG keys. (#19377) (#19388)

Bug Fixes

  • Backport #19656 - fix R builds (#19658)
  • remove cleanup on side threads (#19557)
  • Don't use namespace for pow() function, since it is built into cuda math library, and cast the second argument so it will find an acceptable form. (#19533)
  • Remove temporary fix for RNN (#19451)
  • backport #19393 to v1.8.x (#19398)
  • Fix SoftReLU fused operator numerical stability (#17849) (#19390)
  • Temporary fix for RNN with oneDNN seg faults/core dumps (#19308)
  • Fix MKLDNN BatchNorm with even number of channels (#19150) #19299 #19425 (#19428)
  • Relaxing type requirements for broadcast_like (#17977) (#19448)
  • Backporting: Fixed setting attributes in reviewSubgraph (#19278)
  • Include oneDNN gemm fix (#19251)
  • Fix for breaking change introduced in #17123 when batch_axis=0 (#19283)
  • Backport PR #19272 to v1.8.x (#19273)
  • Backport PRs in v1.7.x missing from v1.x to v1.8.x (#19262)
  • Delete executor before reallocating it memory (#19222)
  • Nightly Large Tensor test cherrypicks (#19194) (#19215)
  • Tweeking syntax to be closer to other tests (#19186) (#19206)
  • ElementWiseSum fix for oneDNN (#18777) (#19200)
  • Fix flaky intgemm test in v1.8.x too (#19204)
  • Revert "Fix memory leaks in Gluon (#18328) (#18359)" (#19181)
  • Improve environment variable handling in unittests (#18424) (#19173)
  • Backport Unittest tolerance handling improvements (#18694). Also test seeding (#18762). (#19148)
  • Fix the error of gradient of np.pad (#19044) (#19167)
  • Backport Add cmake flag USE_FATBIN_COMPRESSION, ON by default (#19123) (#19158)
  • SymbolBlock.imports ignore_extra & allow_missing (#19156)
  • Fix race condition in NaiveEngine::PushAsync (#19108) (#19122)
  • Empty list cannot be cleared issue fixed. (#14882)
  • Update base_module.py (#19096)
  • Fix block.export (#17970) (#19075)
  • Support for fp16 in SpM x DnsM on GPU (#18930) (#19074)
  • Backport of Fix LeakyRelu behaviour on empty input (#18934) (#19009)
  • Get rid of monkey patching in LossScaler overflow handling (#18959) (#18973)
  • Remove upper bound (#18857) (#18910)
  • Fix gelu to use erf based algorithm (#18827) (#18946)
  • Cherry-pick #18635 to v1.7.x (#18935) (#18945)
  • Backporting backward inference from 2.x #18348 and #18378 (#18895)
  • Backport Invoke mkldnn and cudnn BatchNorm when axis != 1 to v1.7.x (#18676) (#18890)
  • Bump version to 1.8.0 (#18899)
  • Fixing ONNX spatial export for batchnorm (#17711) (#18846)
  • Fix softmax, logsoftmax failed on empty ndarray (#18602) (#18708)
  • Add unit tests for potri and potrf backward and check output shape in unit tests. (#18803)
  • Add syrk test shape check (#18812)
  • Back port optimization to broadcast_axis to MXNet1.x (#18773)
  • Fix crash when accessing already destructed static variables (#18768) (#18778)
  • Cherrypick #18677 #18713 (#18742)

v2.0.0.alpha.rc3

3 years ago

v2.0.0 Alpha RC3

v2.0.0.alpha.rc2

3 years ago

v2.0.0.alpha.rc0

3 years ago

v2.0.0.alpha.rc1

3 years ago

1.7.0

3 years ago

New features

MXNet Extensions: custom operators, partitioning, and graph passes

Adds support for extending MXNet with custom operators, partitioning strategies, and graph passes. All implemented in a library easily compiled separately from the MXNet codebase, and dynamically loaded at runtime into any prebuilt installation of MXNet.

fix for number of inputs/outputs for backward custom ops (#17069) Enhancements for custom subgraph op (#17194) Disable flaky test_custom_op_fork (#17481) fix custom op makefile (#17516) Update CustomOp doc with changes for GPU support (#17486) [WIP] MXNet Extensions enhancements (#17885) (#18128) Dynamic subgraph property (#17034) Dynamic subgraph property doc (#17585) [1.7] Backport MXNet Extension PRs (#17623, #17569, #17762) #18063 (#18069)

OpPerf utility enabled in the binary distribution

[OpPerf] Add Neural network loss ops (#17482) [OpPerf] Fixes the issue when you pass NDArray to run_perf_test (#17508) [OpPerf] Fix markdown for native profile and add profile param in function desc (#17494) [OpPerf] Add Indexing ops (#16253) [OpPerf] Implement remaining random sampling ops (#17502) [OpPerf] Implement remaining GEMM ops (#17501) [OpPerf] Implement all linalg ops (#17528) [OpPerf] Fixed native output ordering, added warmup & runs command line args (#17571) [OpPerf] Add norm, cast ops, remaining optimizer ops (#17542) [Large Tensor] Fixed Embedding op (#17599) [OpPerf] Fixed Python profiler bug (#17642)

MKL-DNN

MKL-DNN as the default CPU backend in binary distribution

Branding change to DNNL

Upgrade MKL-DNN dependency to v1.1 (#16823)

Support bfloat16 datatype

Add bfloat16 floating-point format support based on AMP (#17265)

New operators

[New Op] Add deformable conv v2 (#16341) Add MXNet Ops for fast multihead attention (#16408) Support boolean elemwise/broadcast binary add, multiply and true_divide (#16728) add gammaln, erf, erfinv (#16811) add aligned roi introduced in Detectron2 (#16619) Implement atleast_1d/2d/3d (#17099) Interleaved MHA for CPU path (#17138) Lamb optimizer update (#16715) Quantized Embedding (#16691) Add gelu fuse ops (#18082) (#18092)

Feature improvements

Numpy compatible interface(experimental)

[NumPy] NumPy support for linalg.inv (#16730) add numpy op nan_to_num (#16717) [Numpy] Add sampling method for bernoulli (#16638) Fix numpy-compatible mean output type for integer inputs (#16792) [Numpy] Fix collect_params().zero_grad() in gluon numpy interface (#16716) [Numpy][Operator] 'where' Implementation in MXNet (#16829) [Numpy] Random.normal() with backward (#16330) Add OP diag [numpy] (#16786) Mixed precison binary op backward (use in) for numpy (#16791) add numpy op diagflat [numpy] (#16813) add op bitwise_or [numpy] (#16801) [Numpy] Implementation npx.{sample}n (#16876) [Numpy] Add NumPy support for np.linalg.det and np.linalg.slogdet (#16800) Op Unravel_index PR [Numpy] (#16862) [Numpy] Fix imperative basic indexing in numpy (#16902) [Numpy] Basic indexing in symbolic interface of DeepNumpy (#16621) [Numpy] add op full_like, c++ impl, fix zeros_like, ones_like type inference (#16804) [Numpy] Implement numpy operator 'average' (#16720) [Bugfix] [Numpy] Add kAddTo and kNullOp to Transpose (#16979) set rtol = 1e-2 and atol = 1e-4 when dtype == np.float32 in test_numpy_op.py:test_np_linalg_solve (#17025) Op_Diagonal [Numpy] (#16989) numpy bincount (#16965) [numpy] add op bitwise_not (#16947) [Numpy ]Modify np.random.shuffle to enable inplace by default (#17133) [numpy] fix argsort typo (#17150) [numpy] add op round (#17175) [numpy]Add op delete (#17023) [numpy] add op flipud, fliplr (#17192) [CI] Re-enable testing with numpy 1.18 (#17200) [Numpy] Add broadcast_to scalar case (#17233) [Numpy] Random.gamma() implemented (#16152) [Numpy] add row_stack (=vstack) (#17171) [Numpy] Add infra for performing constraint check (#17272) porting numpy-compatible hstack to master and add dstack for interoperability (#17030) adding asnumpy() to output of gather(implicitly called) to fix gather test in large vector and tensor tests (#17290) [numpy] add op random.exponential (#17280) [NumPy] Add NumPy support for norm (#17014) [numpy]add op random.lognormal (#17415) Add numpy random weibull operator (#17505) [numpy] Add np.random.pareto and np.random.power (#17517) [Numpy] Add sort op (#17393) [numpy]implement exponential backward (#17401) [Numpy] Where operator scalar version (#17249) [numpy] add op matmul (#16990) [numpy]add op random.logistic, random.gumbel (#17302) [numpy][Do Not Review]add op insert (#16865) [numpy] add op random.rayleigh (#17541) [numpy] add fallback ops (#17609) [numpy] add op pad (#17328) [numpy] add op fabs, sometrue, round (#17619) Add arange_like to npx (#16883) try to move shape_array to npx (#16897) support np.argsort (#16949) np.broadcast_to extension (#17358) support bitwise_and (#16861) fix np.argmax/argmin output data type (#17476) add op random.beta (#17390) add op isnan isinf (#17535) array_split pr (#17032) Mixed data type binary ops (#16699) randn implemented (#17141) refactor and reduce float types for some functions, also add bitwise_xor (#16827) any/all (#17087) amax (#17176) fix format (#17100) add op empty_like, add nan_to_num to dispatch (#17169) handle array_like fill_value for np.full; add unit test coverage (#17245) add np.amin (#17538) add npx.gather_nd (#17477) add np.random.chisquare (#17524) add polyval (#17416) add isposinf isneginf isfinite (#17563) Support broadcast assign for npi_boolean_mask_assign_tensor (#17131) Implement Weibull backward (#17590) support np.dsplit, fix some error msgs and corner cases for hsplit and vsplit, add interoperability tests for h/v/dsplit (#17478) add np.product (#17489) Implement np.random.pareto backward (#17607) add np.ediff1d (#17624) more support for boolean indexing and assign (#18352) Fix einsum gradient (#18482) [v1.7.x] Backport PRs of numpy features (#18653) [v1.7.x] backport mixed type binary ops to v1.7.x (#18649) revise activations (#18700)

Large tensor support

[Large Tensor] Add support to Random Sample & Pdf ops (#17445) [Large Tensor] Add LT support for NN optimizers and 1 activation function (#17444) [Large Tensor] Fixed SoftmaxActivation op (#17634) [Large Tensor] Fixed col2im op (#17622) [Large Tensor] Fixed Spatial Transformer op (#17617) [Large Tensor] Fix ravel_multi_index op (#17644) Sparse int64 Large tensor support (#16898) Re-Enabling Large Tensor Nightly on GPU (#16164) enabling build stage gpu_int64 to enable large tensor nightly runs (#17546)

MKL-DNN enhancement

MKLDNN FC : Add error info when mkldnn fc bias dimension is wrong (#16692) [MKLDNN] support mkldnn gelu (#16710) [MKLDNN] Fix int8 convolution/fc bias overflow (#16734) [MKLDNN] use dim_t instead of int in slice/transpose operators (#16737) Mkldnn fullyConnect bwd bug fix (#16890) Revert Mkldnn fullyConnect bwd bug fix (#16890) (#16907) [MKLDNN] Use MKLDNNRun (#16772) [MKLDNN] mkldnn RNN operator enhancement (#17075) [MKLDNN] enable MaxPooling with full pooling convention (#16860) update mkldnn to v1.1.2 (#17165) improve mkldnn doc (#17198) [MKLDNN] Fix _copyto (#17173) [MKLDNN] Support channel wise quantization for FullyConnected (#17187) fixed seed for mkldnn test (#17386) add mkldnn softmax backward (#17170) cmake: copy dnnl headers to include/mkldnn (#17647) [mkldnn]Mkldnn bn opt backport from master to 1.7x (#18009) [v1.x] Update 3rdparty/mkldnn remote URL and pin to v1.3 (#17972) (#18033) [v1.x] backport #17900 [MKLDNN] support using any format in pooling backward (#18067) Static link MKL-DNN library (#16731) Add large tensor nightly tests for MKL-DNN operators (#16184) [MKL-DNN] Enable and Optimization for s8 eltwise_add (#16931) [MKL-DNN] Enhance Quantization Method (#17161) Static Build and CD for mxnet-cu102/mxnet-cu102mkl (#17074) MKL-DNN RNN backward path enhancement (#17183) cmake: check USE_OPENMP and pass proper MKL-DNN build flags (#17356) update mkl to 2020.0 (#17355) Enable MKL-DNN by default in pip packages (#16899) Enable MKL-DNN FullyConnected backward (#17318) Softmax primitive cache and in-place computation (#17152) boolean_mask_assign with start_axis (#16886) use identity_with_cast (#16913) change error tolerance for bf16 bn (#18110) [v1.x] Backport #17689 and #17884 to v1.x branch (#18064) refactor codes and add an option to skip/check weight's version to reduce overhead (#17707) (#18039) [v1.x] Backport #17702 and #17872 to v1.x branch (#18038)

TensorRT integration

Update TensorRT tutorial to build-from-source. (#14860) Minor fix, use RAII for TensorRT builder and network object (#17189)

Quantization

Add silent option to quantization script (#17094)

Profiler

Implemented final two binary ops, added default params for functionality (#17407) Implement remaining nn_activation ops in opperf (#17475) Implement all miscellaneous ops (#17511) Implement remaining nn_basic ops in opperf (#17456)

ONNX

Fix memory leak reported by ASAN in NNVM to ONNX conversion (#15516) ONNX export: Gather (#15995) ONNX export: Slice op - Handle None value for ends (#14942)

New models

[Model] Implement Neural Collaborative Filtering with MXNet (#16689) Further optimization for NCF model (#17148) HMM Model (#17120)

Operator improvements

Faster GPU NMS operator (#16542) [MXNET-1421] Added (CuDNN)BatchNorm operator to the list of mirrored operators (#16022) dynamic custom operator support (#15921) Multi Precision Lamb Update operator (#16885) Add im2col and col2im operator (#16502) Quantized Elemwise Mul Operator (#17147) Enhancements for MXTensor for custom operators (#17204) Enabling large tensor support for binary broadcast operators (#16755) Fix operators lying about their number of inputs (#17049) [WIP] Fallback mechanism for mx.np operators (#16923) Dynamic custom operator GPU support (#17270) Fix flaky - test_operator_gpu.test_np_insert (#17620) MXNet FFI for Operator Imperative Invocation (#17510) [MXNET-978] Higher Order Gradient Support logp1, expm1, square. (#15416) [MXNET-978] Higher Order Gradient Support arcsin, arccos. (#15515) [MXNET-978] Higher Order Gradient Support rsqrt, rcbrt. (#15476) gather_nd: check bound and wrap negative indices (#17208) Remove dilation restriction for conv3d (#17491) Fix storage type infer of softmax backward (#17576) Fix and optimize handling of vectorized memory accesses (#17767) (#18113) Cherry-pick of #17995 and #17937 to 1.x branch (#18041) No tensor cores for fp32 interleaved attention, remove div by 8 restriction (#17994) (#18085) GPU gemms true fp16 (#17466) (#18023) Add support for boolean inputs to FusedOp (#16796)

Bug fixes

[BUG FIX] Always preserve batch dimension in batches returned from dataloader (#16233) Fix SliceChannel Type inference (#16748) change _generate_op_module_signature get_module_file open with encoding=utf-8,it fix some encode error in Chinese windows system. (#16738) Fix rtrue_divide grad (#16769) fix inv test flakiness using random matrices generated by SVD (#16782) [MXNET-1426] Fix the wrong result of sum, mean, argmin, argmax when inputs contain inf or nan (#16234) Fix (#16781) fix expand_dims fall back when input's ndim is 0 (#16837) [fix] missing input log higher order. (#15331) Fix IndentationError in setup.py (#16857) Fix a few np issues (#16849) Fix InferAttr/InferShapeAttr not calling inference for all nodes in a graph (#16836) fix for enable model parallelism for non-fp32 data (#16683) Fix NDArrayIter iteration bug when last_batch_handle='pad' (#16166) Fix crashing on Windows in ObjectPool ~ctor (#16941) Fix NDArrayIter cant pad when size is large (#17001) fix axis=-1 bug (#17016) Fix CUDNN detection for CMake build (#17019) Fix omp assert issue (#17039) mshadow: fix vector access (#17021) [BUGFIX] Fix race condition in kvstore.pushpull (#17007) [BUGFIX] Fix trainer param order (#17068) [BugFix] fix filter channel calculation in ModulatedDeformableConvV2 (#17070) Fix reshape interoperability test (#17155) fix norm sparse fallback (#17149) fix py27 quantization (#17153) fix int8 add ut (#17166) Fix and clean up Ubuntu build from source instructions (#17229) fix lstm layer with projection save params (#17266) Fix rendering of ubuntu_setup.md codeblocks (#17294) Fix #17267, add expected and got datatype for concat error msgs (#17271) [BUGFIX] fix model zoo parallel download (#17372) fix use int8, uint8, int32, int64 (#17188) [Fix] Add ctx to the original ndarray and revise the usage of context to ctx (#16819) Fix ndarray indexing bug (#16895) fix requantize flaky test (#16709) Initial checkin (#16856) Fix flakey test_ndarray.py:test_reduce (#17312) fix flaky test: boolean index and fix bugs (#17222) Fix IOT Devices section of Get Started page (#17326) add logic for no batch size while getting data arrays from executors (#17772) (#18122) Fix reverse shape inference in LayerNorm (#17683) fix full and full_like when input is boolean (#17668) Fix MBCC inference (#17660) Additional fix for vector access. (#17230) Cherrypick Fix nightly large_vector test caused by incorrect with_seed path (#18178) (#18220) [1.7] Pass args fix3 (#18237) fixing batch_norm and layer_norm for large tensors (#17805) (#18261) [1.7.x] Backport of LSTM and GRU fix (#17898) and RNN op (#17632) (#18316) [v1.7.x] backport #18500 - [Bug Fixed] Fix batch norm when grad_req is add (#18517) Fix the monitor_callback invalid issue during calibration with variable input shapes (#18632) (#18703)

Front end API

Fix the problem in printing feature in c++ API examples : feature_extract (#15686) updating MXNet version to 1.6.0 in base.h for C APIs (#16905) [API] unified API for custom kvstores (#17010) fix parameter names in the estimator api (#17051) adding docs for 64bit C APIs of large tensor (#17309) Add API docs to INT64 APIs (#16617)

Gluon

[Quantization] Enhance gluon quantization API (#16695) [Gluon] Improve estimator usability and fix logging logic (#16810) Fix test_gluon.py:test_sync_batchnorm when number of GPUS > 4 (#16834) [Gluon] Update contrib.Estimator LoggingHandler to support logging per batch interval (#16922) Include eval_net the validation model in the gluon estimator api (#16957) Fix Gluon Estimator nightly test (#17042) [MXNET-1431] Multiple channel support in Gluon PReLU (#16262) Fix gluon.Trainer regression if no kvstore is used with sparse gradients (#17199) refactor gluon.utils.split_data() following np.array_split() (#17123) Add RandomApply in gluon's transforms (#17242) Partitioning Gluon HybridBlocks (#15969) Random rotation (#16794) bump up atol for gradient check (#16843) Extend estimator.evaluate() to support event handlers (#16971) [MXNET-1438] Adding SDML loss function (#17298)

Symbol

Add unoptimized symbol to executor for sharing (#16798) Enforces NDArray type in get_symbol (#16871) Fix #17164 symbolblock with BatchNorm inside during cast to fp16 (#17212) autograd video and image link fixes and removing symbol tutorials (#17227) Fix CosineEmbeddingLoss in when symbol API is used (#17308) Fix Horovod build error due to missing exported symbols (#17348) Update symbol.py (#17408) update symbol to json (#16948)

Language Bindings

Python

Python 2 compatibility fix in base.py adding stacktrace in Jenkinsfile_utils.groovy to inspect Python2 failure cause in CI (#17065) Fix image display in python autograd tutorial (#17243) Fix Python 3 compatibility in example/speech_recognition (#17354) Stop testing Python 2 on CI (#15990) Docs: Python tutorials doc fixes (#17435) pin python dependencies (#17556) Python 2 cleanup (#17583)

C/C++

Simplify C++ flags (#17413)

R

fix R docs (#16733) [R package] Make R package compilation support opencv 4.0 (#16934) Support R-package with cmake build and fix installation instructions (#17228) Fix R-package/src/Makevars for OpenCV 4 (#17404) Fix typo in Install the MXNet Package for R (#17340)

Clojure

Julia

[MXNET-1440] julia: porting current_context (#17142) julia: porting context.empty_cache (#17172) pin Markdown version to 3.1 in Julia doc build (#17549)

Perl

[Perl] - ndarray operator overloading enhancements (#16779) MXNET-1447 [Perl] Runtime features and large tensor support. (#17610)

Scala

Fix scala publish & nvidia-docker cublas issue (#16968) Fix publishing scala gpu with cpu instance (#16987) swap wget to curl in Scala scripts (#17041) [Scala/Java] Remove unnecessary data slicing (#17544) quantile_scalar (#17572) Fix get_started scala gpu (#17434) Fix MBCC & scala publish pipeline (#17643) Bump up additional scala 1.x branch to 1.7.0 (#17765)

Performance improvements

Build.py improvement (#16976) Improvements to config.cmake (#17639) [Done] BilinearResize2D optimized (#16292) Speed fused_op compilation by caching ptx and jit-compiled functions (#16783) Improve the speed of the pointwise fusion graph pass (#17114) broadcast_axis optimization (#17091) Optimize AddTakeGrad Tensor Sum (#17906) (#18045)

Example and tutorials

Add CustomOp tutorial doc (#17241) Correct the grammar in 1-ndarray tutorial (#17513)

Website and documentation

Website edits (#17050) [Website 2.0] Nightly Build for v1.x (#17956) [docs] Fix runtime feature detection documentation (#16746) Adding user guidelines for using MXNet built with Large Tensor Support (#16894) fix typo and doc (#16921) large tensor faq doc fix (#16953) [DOC] Add a few tips for running horovod (#17235) Update NOTICE to fix copyright years (#17330) [DOC] Fix tutorial link, and better error msg (#17057) doc fix for argmax & argmin (#17604)

CI/CD

support mixed-precision true_divide (#16711) Try to fix CI (#16908) mixed precision for power (#16859) Fix desired precision for test_ndarray.py:test_reduce (#16992) [reproducibility] multi_sum_sq review, AtomicAdd removal (#17002) fix precision problem in linalg_solve, linalg_tensorinv, linalg_cholesky op test (#16981) grouping large array tests based on type and updating nightly CI function (#17305) [LICENSE] fix cpp predcit license (#17377) [CI] Fix static build pipeline (#17474) skipping tests that cannot fit in nightly CI machine corrected imports (#17450) Update Windows CI scripts to use syntax compatible with Win 2019 server powershell. (#17526) Fix Non-ASCII character in docstring (#17600) [CI] Follow redirects when downloading apache-maven-3.3.9-bin.tar.gz (#17608) [CI] Upgrade sphinx and autodocsumm (#17594) Reduce load on CI due to excessive log flood (#17629) Enable users to specify BLAS (#17648) [CI] Add AMI id to instance info on builds (#17649) [v1.7.x] Backport staggered CI builds (#17999 & #18119) (#18142) [v1.7.x] Backport #17177 to 1.7.x (Fix incorrect calculation results when the C locale is set to a locale that uses commas as the decimal separator) (#18147) Fix formatting and typos in CD README.md (#16703) [CD] dynamic libmxet pipeline fix + small fixes (#16966) [CD] enable s3 publish for nightly builds in cd (#17112) [CD] fix CD pipeline (#17259) [CD] update publish path (#17453) fix CD and remove leftover from #15990 (#17551) Fix nightly build (#16773) Update pypi_publish.py to disable nighlty build upload to Pypi (#17082) [v1.7.x] update jetson dockerfile to support CUDA 10.0 (#18339) Remove manually created symbolic link to ninja-build (#18437) (#18456) Increase staggered build timeout to 180 min (#18568) (#18585)

License

Don't relicense FindCUDAToolkit.cmake (#17334) fix license and copyright issues (#17364) Update ps-lite LICENSE (#17351) remove unused file with license issue (#17371) Update LICENSE for fonts (#17365) license np_einsum file under bsd (#17367) Update Apache License for mshadow (#18109) (#18134)

Miscellaneous changes

Link fixes4 (#16764) Refactoring names for mxnet version of nnvm to avoid conflicting with the original tvm/nnvm. (#15303) minor typo fix (#17008) Add micro averaging strategy to pearsonr metric (#16878) introduce gradient update handler to the base estimator (#16900) fix latency calculation and print issue (#17217) add inference benchmark script (#16978) change the wording and log level to be more in line with the general use (#16626) Updated logos. (#16719) Pinning rvm version to satisfy Jekyll build (#18016) Workaround gnu_tls handshake error on Ubuntu 14.04 Nvidia Docker (#18044)

How to build MXNet

Please follow the instructions at https://mxnet.incubator.apache.org/get_started

List of submodules used by Apache MXNet (Incubating) and when they were updated last

name commit-id last updated in MXNet last update in module
dlpack 3efc489 Jan 20, 2020 Feb 16, 2020
dmlc-core b3a4c71 Dec 10, 2019 Apr 25, 2020
googletest eb9225c Jan 14, 2019 Apr 16, 2020
mkldnn 07579e6 Mar 31, 2020 Apr 24, 2020
nvidia_cub c3cceac Feb 16, 2018 Jul 17, 2019
onnx-tensorrt f4745fc Jul 12, 2019 Apr 23, 2020
openmp b76842e Jul 18, 2019 Oct 15, 2019
ps-lite f601054 Jan 24, 2020 Feb 28, 2020
tvm 9bd2c7b Jan 23, 2020 Apr 26, 2020

v0.9.2

3 years ago

WARNING: THIS IS NOT AN APACHE SOFTWARE FOUNDATION RELEASE OF MXNET AS IT PREDATES MXNET JOINING THE APACHE SOFTWARE FOUNDATION

v0.9.1

3 years ago

WARNING: THIS IS NOT AN APACHE SOFTWARE FOUNDATION RELEASE OF MXNET AS IT PREDATES MXNET JOINING THE APACHE SOFTWARE FOUNDATION

1.6.0

4 years ago

Deprecation of Python 2

MXNet community voted to no longer support Python 2 in future releases of MXNet. Therefore, MXNet 1.6 release is going to be the last MXNet release to support Python 2.

New features

NumPy compatible interface and using TVM to generate operators

NumPy has long been established as the standard math library in Python, the most prevalent language for the deep learning community. With this library as the cornerstone, there are now the largest ecosystem and community for scientific computing. The popularity of NumPy comes from its flexibility and generality.

In #14253, the MXNet community reached consensus on moving towards a NumPy-compatible programing experience and committed to a major endeavor on providing NumPy compatible operators.

The primary goal of the projects below is to provide the equivalent usability and expressiveness of NumPy in MXNet to facilitate Deep Learning model development, which not only helps existing deep learning practitioners but also provides people in the existing NumPy community with a shortcut for getting started in Deep Learning. The efforts towards this goal would also help a secondary goal, which is to enable the existing NumPy ecosystem to utilize GPUs and accelerators to speed up large scale computation.

  • Infra to use tvm write op kernels (#15550)
  • fix boolean_mask for 0-size output (#15731)
  • fix tvm cmake (#15781)
  • Numpy-compatible Infra (#15581)
  • [MXNET-1206] Support NDArray indexing with None and Ellipsis (#13143)
  • numpy-compatible sum (#15810)
  • [Numpy] Numpy compatible slicing (#15798)
  • Numpy Tensordot and Dot Operator (#15820)
  • numpy linspace (#15852)
  • tvm infra for op attrs (#15854)
  • Port several np ops to master (#15867)
  • numpy-compatible split upstream (#15841)
  • Numpy-compatible concatenate upstream (#15894)
  • Numpy-compatible stack upstream (#15842)
  • [Numpy] Numpy behavior random.uniform() (#15858)
  • Tvm broadcast backward (#15938)
  • np elemwise unary ops upstream (#15831)
  • [Numpy] random.randint() implemented (#15956)
  • Refines NDArray indexing and adds numpy ndarray indexing [READY FOR REVIEW] (#15942)
  • Port ops from np branch (#16018)
  • numpy-compatible cumsum upstream (#15924)
  • NumPy-compatible infrastructure on Gluon (#16024)
  • [OP] Support range as advanced index for ndarrays (#16047)
  • Numpy compatible max min (#16046)
  • NumPy-compatible Mean, Std and Var (#16014)
  • Add fluent methods mean, std, var for ndarray (#16077)
  • numpy multinomial op (#15878)
  • add numpy operator remainder (#16080)
  • [Numpy] Random.choice implemented (#16089)
  • Fix sample.normal shape inference
  • Numpy add numpy op indices (#15837)
  • [Numpy] Numpy copysign (#15851)
  • numpy operator ravel, derive from reshape (#16016)
  • Add array_function
  • Improved error mesages
  • Fix np.choice
  • add exception check for numpy reshape (#16180)
  • [Numpy] Numpy behavior normal distribution (#16109)
  • fix multinomial bug on gpu (#16204)
  • [Numpy] Differentiable svd (#15795)
  • add epsilon to sum(pvalue) upperbound (#16211)
  • np compatible vstack (#15850)
  • Numpy add numpy op roll (#15902)
  • add numpy compatible trace (#16008)
  • add numpy op hanning, hamming, blackman (#15815)
  • [Numpy]flip (#15819)
  • numpy operator around (#16126)
  • numpy operator arctan2 (#15890)
  • numpy operator nonzero (#15838)
  • numpy operator hypot (#15901)
  • tvm numpy operator deg2rad && rad2deg (#16015)
  • numpy op unique
  • try to fix bug
  • fix memory bug and disable some test
  • fix according to review
  • Numpy operators: lcm, tril, identity and take (#16264)
  • [numpy] Cosmetic improvement on mxnet.numpy builtin op signature in documentation (#16305)
  • Disable Pylint false error in numpy_op_signature (#16370)
  • boolean_mask_assign operator for future boolean indexing (#16361)
  • Implements ldexp. (#15845)
  • Numpy Operators: Inner, Outer, vdot (#15846)
  • Numpy det and slogdet operators (#15861)
  • Fix random op signature
  • fix choice signature
  • add raise test for shape
  • Add boolean ndarray (#15940)
  • global numpy shape flag (#16335)
  • numpy-compatible histogram (#16266)
  • [Numpy] Numpy compatible dstack (#15871)
  • numpy eye op (#16132)
  • Numpy compatible vsplit; minor changes to split (#15983)
  • add numpy op logspace (#15825)
  • add numpy op bitwise_xor, hsplit, moveaxis, rot90 (#16257)
  • Fix optimizer bug for np attribute (#16494)
  • Tests of NumPy interoperability (#16469)
  • improve unary and binary operator handling and refactor tests (#16423)
  • [DOC] Fix numpy op doc (#16504)
  • [Numpy] More numpy dispatch tests (#16426)
  • [Numpy] einsum (#15911)
  • Add test pipeline for USE_TVM_OP=OFF on Unix (#16450)
  • Numpy dispatch test of ...... (#16422)
  • setup and concatenate, copy, expand_dims, expm1 (#16493)
  • add sum for boolean type in mainline (#16436)
  • [Numpy] SVD outputs tuple (#16530)
  • numpy op doc: max, min, prod (#16506)
  • add interface for rand
  • Fix numpy bugs (#16537)
  • pickler override for np ndarrays (#16561)
  • [numpy]op test in new pattern (#16556)
  • Enforce adding documentation for builtin numpy operators (#16575)
  • [Numpy] Support N_D(N>=3) batch_dot (#16586)
  • [Numpy] Loading numpy-incompatible NDArray in numpy-compatible mode (#16597)
  • Fix index overflow bug in einsum (#16589)
  • add npx reshape (#16640)
  • add type switch to weight tensor (#16543)
  • numpy doc enhancement (#16637)
  • Infra for tvm op runtime dispatch (#16100)
  • [NumPy][Operator] NumPy operator may_share_memory and shares_memory (#16533)
  • [Numpy] Numpy operator diff (#15906)
  • Miscellaneous fix for several numpy issues (#16664)
  • [Numpy] implement np.column_stack (#16594)
  • [numpy] add numpy operator : append (#16564)
  • Backport of #16711, #16737, #16408 to 1.6 branch (#16763)
  • Backport to 1.6 (#16773, #16781, #16783, #16716, #16699, #16728, #16769, #16792) (#16832)
  • [Backport][v1.6.x] Fix the wrong result of sum, mean, argmin, argmax when inputs contain inf or nan (#16884)
  • Backport of #16827, #16791 and #16888 to 1.6 branch (#16901)
  • port shape op to 1.6.x (#16912)
  • [Numpy] Fix imperative basic indexing in numpy (#16902) (#16919)
  • Backport #16895, #16922, #16878, #16979 and #16900 to 1.6 (#17029)

Graph optimizations

Pointwise fusion for GPU

DL models, besides compute intensive operations like convolutions and fully connected layers, feature a lot of simple pointwise (aka elementwise) operations (like elementwise addition etc.). Performance of those operations is fully memory bandwidth bound and so limit speedups from newer GPU hardware, which typically has high compute/memory bandwidth ratio. When multiple of such operations are chained one after another, it results in a series of unnecessary stores and loads as well as potential increased memory usage to store the intermediate results. Pointwise fusion helps in alleviating those problems by just-in-time generation of fused operators, which do not store intermediate results in memory, resulting in performance and memory usage improvements.

  • Pointwise fusion for GPU (#15167)
  • Backport #16798, #16836 and #16838 to 1.6 (#16874)
  • Add support for boolean inputs to FusedOp (#16796) (#16892)
  • Workaround problem with fusion in CUDA 9 (#17028) (#17035)

Eliminate common subexpressions

  • Eliminate common expressions (#15657)

Default MKLDNN Subgraph fusion

  • [MKLDNN] Enable subgraph backend mkldnn by default. (#15518)

New operators

  • [OP] Add a new arange_like operator to contrib (#15400)
  • PDF operators for each distribution for which we have a random sampler (plus also the PDF of the Dirichlet). Supports probabilities and log-probabilities, as well as gradients. (#14617)
  • Group Normalization (#14959)
  • Add RROIAlign (#16017)
  • Add fast implementation of LARS (#16122)
  • Round and sign straight-through-estimators C operators. (#16373)
  • New ops for RCNN + old ops improvements for RCNN (#16215)
  • Comparison ops implemented using mshadow (#16414)
  • Add mask target generator operator for Mask-RCNN (#16268)
  • Move MRCNNMaskTarget op to contrib (#16486)
  • Mxnet allclose (#14443)
  • Aggregated adamw update (#16398)
  • Make mrcnn_mask_target arg mask_size a 2d tuple (#16567)
  • Dgl ops 2 (#16416)
  • Lamb optimizer update (#16715)
  • [OP] changing data type of 't' to int in lamb_update_phase1 (#16903)
  • Multi Precision Lamb Update operator (#16885)
  • Interleaved MHA for CPU path (#17138) (#17211)

Feature improvements

Automatic Mixed Precision

  • [AMP] Move topk from FP16_FP32_FUNCS to FP32_FUNCS (#15342)
  • Conversion from FP32 model to Mixed Precision model (#15118)
  • Update fp16 docs: Block.cast is inplace (#15458)
  • FP16 Support for C Predict API (#15245)
  • Add AMP Conversion support for BucketingModule (#15528)

Gluon Fit API

  • Fixing build for gluon estimator test, including libtvm in pack libs (#16148)
  • [Estimator] handle composite metrics in estimator (#16676)
  • [Estimator] refactor estimator to allow overriding evaluate/fit of a batch (#16678)
  • [Estimator] refactor estimator and clarify docs (#16694)
  • [Gluon] Improve estimator usability and fix logging logic (#16810) (#16846)
  • Backport Gluon estimator changes to 1.6 (#17048)
  • fix parameter names in the estimator api (#17051) (#17162)

MKLDNN

  • Upgrade MKL-DNN submodule to v0.20 release (#15422)
  • Fix quantized concat when inputs are mixed int8 and uint8 (#15693)
  • [MKLDNN]Enhance Quantization APIs and Tutorial (#15448)
  • Add quantization support for GluonCV (#15754)
  • add int8 bn mkldnn implementation and test (#15664)
  • [Quantization]support exclude operators while quantization (#15910)
  • [MKLDNN]Support fullyconnected and element-wise ops fusion (#15950)
  • Disable test coverage for Clang MKLDNN (#15977)
  • update support MKLDNN BN conditions (#15870)
  • [MKLDNN] Fix out of bound access of req vector (#16000)
  • add uint8 bn mkldnn implementation (#16003)
  • Improve quantization flow (#15961)
  • [MKLDNN] fix uint8 batch norm memory misuse (#16034)
  • MKL-DNN RNN checks NDArray version (#16071)
  • Float64 fallback for mkldnn subgraph and rnn op (#15853)
  • Update MKL-DNN dependency (#16073)
  • Integrate MKL-DNN leakyrelu (#16075)
  • [MKLDNN] NDArray reorder in C API and deconv (#16265)
  • Fix mkldnn reshape (#16455)
  • [MKLDNN] Fix uint quantized fc when not fusing with requantize (#16523)
  • [MKLDNN]Fix reorder2default (#16602)
  • Upgrade MKL-DNN dependency to v1.0 (#16555)
  • Revert "[MKLDNN]Fix reorder2default (#16602)" (#16697)
  • [v1.6.x] Backport #16837 into v1.6.x (#16847)
  • Initial checkin (#16856) (#16872)

Large tensor support

  • [MXNET-1413] Adding Large Tensor support for sort operators (#15170)
  • Large Index Support for Slice (#15593)
  • Add large tensor support binary arithmetic (#15785)
  • Large tensor support for random ops (#15783)
  • Add Large Tensor Support for Sequence, NN Ops (#15807)
  • Add power, exponent, log ops large tensor support (#15794)
  • removing unnecessary int64 C apis that were added to support Large Tensors and Vectors (#15944)
  • creating ndarray directly using mxnet ndarray primitives to reduce memory footprint of tests for topk, sort and argsort (#15900)
  • Adding tests to verify support for Large Tensors in additional Ops along with new C_Apis supporting 64bit indexing (#15895)
  • Added tests to verify Large Vector Support for initial set of ops (#15943)
  • Added more tests for Large Indices (#15960)
  • Add Large tensor vector test cases (#15941)
  • Test large vector mean operator and fix a few bugs (#16079)
  • Reducing memory footprint of one_hot for Large Array Testing (#16136)
  • removing MXNDArrayLoadFromBuffer64 and MXNDArrayLoad64 (#16203)
  • Fix large array tests (#16328)
  • added more tests to verify support for large vector (#16477)
  • added support for large tensors for Dropout operator and tests to verify support for more operators (#16409)
  • adding large tensor support for add_n and tests for more ops (#16476)
  • adding large tensor support for pad operator (#15126)
  • Added large tensor support and test for gather_nd (#16371)
  • Large Vector tests for DGL Ops Part 2 (#16497)
  • Showing proper error message when an attempt is made to create large tensor but MXNet is not built with it (#16570)

TensorRT integration

  • enable TensorRT integration with cpp api (#15335)
  • Add unit tests for TensorRT integration and fix some bugs (#15399)

Higher order gradient support

  • [MXNET-978] Higher order gradient for sigmoid (#15288)
  • [MXNET-978] Higher Order Gradient Support reciprocal, abs. (#15413)
  • [MXNET-978] Add higher order gradient support tan, tanh (#15253)
  • [MXNET-978] Higher Order Gradient Support arctan, arctanh, radians. (#15531)
  • [MXNET-978] Higher Order Gradient Support sqrt, cbrt. (#15474)
  • [MXNET-978] Higher Order Gradient Support clip, dropout. (#15746)
  • [MXNET-978] Higher Order Gradient Support sinh, cosh. (#15412)
  • [MXNET-978] n-th order gradient test support. (#15611)
  • [MXNET-978] Fully connected, higher order grad (#14779)
  • [MXNET-978] Higher Order Gradient Support arcsinh, arccosh. (#15530)

Operator improvements

  • broadcast axis is alias to broadcast axes; doc fix (#15546)
  • Utility to help developers debug operators: Tensor Inspector (#15490)
  • Softmax with length (#15169)
  • in-place reshape ops (#14053)
  • Add missing default axis value to symbol.squeeze op (#15707)
  • Add matrix determinant operator in linalg (#15007)
  • Add fp16 support for topk (#15560)
  • [MXNET-1399] multiclass-mcc metric enhancements (#14874)
  • new raise mode for nd.take and fix backward for wrap mode (#15887)

Profiler

  • Fixing duplication in operator profiling (#15240)
  • Custom Operator Profiling Enhancement (#15210)
  • [Opperf] Make module/namespace of the operator parameterized (#15226)
  • Opperf: Support Python<3.6 (#15487)
  • Add transpose_conv, sorting and searching operator benchmarks to Opperf (#15475)
  • Deprecate USE_PROFILER flag (#15595)
  • Update profiler.md (#15477)
  • [Opperf] Add array rearrange operators to opperf (#15606)
  • [OpPerf] PDF Random ops fix (#15661)
  • [Opperf] Add optimizer update operator benchmarks to opperf (#15522)
  • fix broadcast op param (#15714)
  • [OpPerf] Profiler flag for Python, Cpp (#15881)
  • [Opperf] Filter out deprecated ops (#15541)
  • [OpPerf] Handle positional arguments (#15761)
  • [OpPerf] Take care of 4d param (#15736)
  • Add Median,p50,p99 to python profiler (#15953)
  • adding "total" (total time) to profiler aggregate stats sorting criteria (#16055)

ONNX import/export

  • Correct ONNX documentation (#15914)
  • [MXNET-895] ONNX import/export: TopK (#13627)

Runtime discovery of features

  • Making Features as a singleton for improved caching (#15835)

Bug fixes

  • [bug] fix higher grad log (#15120)
  • Showing proper error when csr array is not 2D in shape. (#15242)
  • add 'asnumpy' dtype option to check_symbolic_backward (#15186)
  • point fix the vector declaration in MultiBoxDetection (#15300)
  • Temporarily Commenting out Flaky Test (#15436)
  • Fix memory leak in NaiveEngine (#15405)
  • fix nightly CI failure (#15452)
  • Small typo fixes in batch_norm-inl.h (#15527)
  • Bypass cuda/cudnn checks if no driver. (#15551)
  • Julia path patch (#15561)
  • Fix AMP Tutorial failures (#15526)
  • Fix warnings in CLang: (#15270)
  • Fix dumps for Constant initializer (#15150)
  • fix normalize mean error bug (#15539)
  • [fix] print self in warning. (#15614)
  • [MXNET-1411] solve pylint error issue#14851 (#15113)
  • [Flaky test] Skip test_operator_gpu.test_convolution_independent_gradients (#15631)
  • Fix subgraph with custom_op (#15671)
  • Fix USE_BLAS == openblas check (#15691)
  • update previous flaky naive engine test (#15651)
  • make TransposeShape infer shape form both sides (#15713)
  • Skip Flaky Test (#15722)
  • Revert "Dynamic Library Loading Support" (#15755)
  • Fix flaky test test_global_metric (#15756)
  • Fix PR #15489 (Dynamic Library Loading Support) (#15760)
  • Refactor LibraryInitializer so it's thread safe. Fixes random sporadical concurrency crashes. (#15762)
  • Fix backward_clip num inputs and type of clip params (#15688)
  • fixing problem with existing Singleton Caching (#15868)
  • Allow operators with multiple outputs in get_atomic_symbol (#15740)
  • Fix ConcatType backward type inference (#15829)
  • Add disable attr to subgraph property (#15926)
  • Re-enable flaky test_prelu (#15777)
  • declare explicitly the tblob default assign operator and copy constructor (#15937)
  • Discard needless test cases in test_convolution_independent_gradients (#15939)
  • fix naive engine for multi-threaded inference (#15574)
  • Fix get_rows_per_block (#15979)
  • Fix a memory misalignment in topk operator (#15948)
  • Decouple dtype from shape for Random multinomial (#15980)
  • Fix dtype inference in arange_like operator (#15930)
  • Disable laop_6 (#15976)
  • Fix flaky clojure profile test (#16058)
  • fix test_pick test time is too long (#16066)
  • [fix] Support nullop in transpose (#15865)
  • fix flaky test (#16074)
  • fix some test files test time is too long (#16067)
  • Fix gradient tensor mutate in {adam/ftrl/rmprop/rmspropalex}_update. (#15768)
  • Fix unary operator ceil/floor/trunc when data type is integer (#14251)
  • Fix failing tests (#16117)
  • Fixes NAG optimizer #15543 (#16053)
  • avoid test relu at the origin due to discontinuous gradient (#16133)
  • Fix remaining errors reported by D2L (#16157)
  • use 1E-4 in groupnorm test(#16169)
  • Sequence last fix (#16156)
  • fixing test for model compatibility checker (#16159)
  • assert_allclose -> rtol=1e-10 (#16198)
  • [MEMORY] retry GPU memory allocation if fragmented (#16194)
  • improve dataloader signals and messages (#16114)
  • Update ndarray.py (#16205)
  • fix flaky test (#16191)
  • Solve #14116, #15143 (#15144)
  • [MXNET-1422] Fix wrong results of min([inf, inf]) and max([-inf,-inf]) (#16226)
  • Fix inconsistent interpolation method values (#16212)
  • set fixed seed for profiler (#16155)
  • Fix MXNDArrayGetData (#16289)
  • fix atol for test_preloaded_multi_sgd (#16356)
  • Fix windows flakiness (#16415)
  • cuDNN non-persistant bidirectional RNN dgrad sync fix (#16391)
  • [BUGFIX] Minor type issues in Squeeze (#16448)
  • Fix Nightly Tests for Binaries (#16451)
  • Fix dtype bug (#16467)
  • Fix flakey pylint CI failures (#16462)
  • Load NDArray only to GPU if GPU is present (#16432)
  • Bug fix for the input of same axes of the swapaxes operator (#16513)
  • Fix learning rate scheduler being unexpectedly overwritten by optimizer's default value (#16487)
  • disable tests (#16536)
  • fix pylint in CI (#16540)
  • image crop gpu (#16464)
  • Build dmlc-core with old thread_local implementation (#16526)
  • fix doc for topk (#16571)
  • RNNOp to call cudaEventCreate lazily (#16584)
  • add encoding to the stub files for potential utf8 char in doc strings (#16580)
  • Surpress subgraph log in CI (#16607)
  • Fix dequantize memory corruption (#16606)
  • Fix for wrong reqs set after switching from training to inference (#16553)
  • Disables test_bulking_operator_gpu due to flakiness (#16611)
  • Imagenet inference to nightly fix (#16599)
  • Move some subgraph verbose to MXNET_SUBGRAPH_VERBOSE=2 (#16622)
  • RNNOp only call cuda/cudnn if GPU ctx is requested (#16632)
  • fix bad encode (#16641)
  • Disable float16 test (#16643)
  • Fix GetMKLDNNData for delay alloc (#16618)
  • Move ops which don't support FP16 dtype to FP32 list (#16668)
  • no such method => modified function args (#16610)
  • fix cuDNN RNN dtype_with_fallback_ bug (#16671)
  • Add check if scipy is imported in sparse.py (#16574)
  • Added launch bounds to the reduce kernels (#16397)
  • fix install dir (#16690)
  • fix binary dependencies in CD and nightly (#16693)
  • Fix SliceChannel Type inference (#16748) (#16797)
  • fix flakiness of test_np_mixed_precision_binary_funcs (#16873)
  • Fix test_gluon.py:test_sync_batchnorm when number of GPUS > 4 (#16835)
  • Omp fork numthreads fix 1.6 (#17000)
  • [BUGFIX] Fix race condition in kvstore.pushpull (#17007) (#17052)
  • Backport #17002, #17068 and #17114 to 1.6 branch (#17137)
  • Backport 3rdparty/openmp fixes (#17193)
  • fix norm sparse fallback (#17149)

Front end API

  • Expose get_all_registered_operators and get_operator_arguments in the… (#15364)
  • Add magic method abs to NDArray and Symbol. (#15680)
  • Dynamic Library Loading Support (#15489)
  • [MXNET-1294] Add KVSTORE PushPull API (#15559)

Gluon

  • [Dataset] Add take, filter, sample API to dataset (#16078)
  • Add register_op_hook for gluon (#15839)
  • [Dataset] add shard API (#16175)
  • Add list_ctx to ParameterDict (#16185)
  • [Gluon] Support None argument in HybridBlock (#16280)
  • Aggregated zero grad (#16446)
  • try to fix block (#16465)
  • [Gluon] Don't serialize shared parameters twice (#16582)
  • Initializer.eq (#16680)

Symbol

  • Add symbol api for randn and fix shape issue for randn ndarray and symbol api (#15772)
  • Graph Partition API (#15886)

Language Bindings

Python

MXNet community voted to no longer support Python 2 in future releases of MXNet. Therefore, MXNet 1.6 release is going to be the last MXNet release to support Python 2.

C/C++

  • [C++] Improve inference script to support benchmark on Imagenet (#15164)
  • C Api for simplebind, fix comment for trigoops, add atol to assert (#16585)

Clojure

  • Extend Clojure BERT example (#15023)
  • [Clojure] Add fastText example (#15340)
  • make clojure api generator tests less brittle (#15579)

Julia

  • add julia env settings (#15523)
  • julia: bump window prebult binary version to v1.5.0 (#15608)
  • julia: remove Travis CI related files (#15616)
  • julia: bump binding version to v1.6.0 (#15607)
  • julia: rename build env var MXNET_HOME to MXNET_ROOT (#15568)
  • Revert "julia: rename build env var MXNET_HOME to MXNET_ROOT (#15568)" (#16147)
  • julia: fix mx.forward kwargs checking (#16138)
  • julia: implement context.num_gpus (#16236)
  • julia: add AbstractMXError as parent type (#16235)
  • [MXNET-1430] julia: implement context.gpu_memory_info (#16324)
  • julia/docs: more DRY on page rendering (#16396)

Perl

  • [Perl] - simplify aliasing strategy (#15395)
  • [Perl] - ndarray to native array conversion fix (#16635)

Scala

  • Add Sparse NDArray support for Scala (#15378)
  • fix the bug on Scala Sparse (#15500)
  • fix heap-use-after-free in scala (#15503)
  • Bump Scala version to 1.6 (#15660)
  • Fix Scala Symbolic API some/Some typo (#15687)
  • Faster Scala NDArray to BufferedImage function (#16219)

Performance improvements

  • Proper bulking of ops not using FCompute (#15272)
  • improve layernorm CPU performance (#15313)
  • Efficient MXNet sampling in the multinomial distribution (#15311)
  • Revert default return type for indices in argsort() and topk() back to float32 (#15360)
  • Use omp threads for cpu data loader (#15379)
  • Accelerate ROIPooling layer (#14894)
  • Avoid memory copy for dropout inference (#15521)
  • Add omp parallel optimization for _contrib_BilinearReisze2D (#15584)
  • Softmax optimization for GPU (#15545)
  • Speed up group executor (#16069)
  • FullyConnected Bias performance improvement on GPU (#16039)
  • Embedding gradient performance optimization on GPU (#16355)
  • Faster Transpose 2D (#16104)
  • Pseudo 2D transpose kernel (#16229)
  • Faster general take (#16615)

Example and tutorials

  • [TUTORIAL] Gluon performance tips and tricks (#15427)
  • Updating profiler tutorial to include new custom operator profiling (#15403)
  • [TUTORIAL] Gluon and Sparse NDArray (#15396)
  • [TUTORIAL] Revise Naming tutorial (#15365)
  • Revise Symbol tutorial (#15343)
  • Two fixes for info_gan.md example Code (#15323)
  • Rebase #13757 to master (#15189)
  • Tensor Inspector Tutorial (#15517)
  • logging (#15106)
  • update profiler tutorial (#15580)
  • [MXNET-1358] Fit api tutorial (#15353)
  • Tutorials nighly fix (#16179)
  • Update add_op_in_backend.md (#16403)
  • typo fix in r doc lstm tutorial (#16546)
  • [MKL-DNN] Add mxnet mkldnn cmake tutorial (#16688)

Website and documentation

  • [DOC] Clarify that global pooling is going to reset padding (#15269)
  • Update sparse_retain Documentation (#15394)
  • nano instructions (#15117)
  • remove comments from nano instructions (#15433)
  • REAME MTCNN Link URL Error in original website (#15020)
  • Update Horovod docs links in README (#15366)
  • fix doc for sort and argsort (#15317)
  • fix comment (#15481)
  • Improve docs for AMP (#15455)
  • [Doc] Add MKL install method apt/yum into tutorial (#15491)
  • Julia docs (#15454)
  • Docs: Fix misprints (#15505)
  • website build for julia: fix path to be static (#15554)
  • some minor typos/clarifications (#15538)
  • refine Nano setup directions (#15524)
  • [Doc] add squeeze to Array change shape (#15549)
  • fix typo (#15648)
  • Fix url (404 error) (#15683)
  • update julia install doc (#15609)
  • [DOC] refine autograd docs (#15109)
  • [DOC] Fix many arguments in the doc: reshape_like, arange_like, shape_array (#15752)
  • Add Gather_nd Scatter_nd to NDArray API category doc (#15689)
  • [Dependency Update] [Doc] move the general prerequisite software to the top (#15896)
  • typo in docs (#16094)
  • [WIP] New Website: New Docs [1/3] (#15884)
  • [DOC] Fix doc for nn.Embedding, nn.Dense and nd.Embedding (#15869)
  • [DOC] Consistent capitalization: mxnet -> MXNet, scala -> Scala (#16041)
  • New Website: Remove Old Content [2/3] (#15885)
  • New Website: New Pipeline [3/3] (#15883)
  • Update KL Divergence formula (#16170)
  • fix broken links (#16255)
  • redirect to the 404 page (#16287)
  • add google-analytics config (#16271)
  • Fixing links for website + Fixing search (#16284)
  • Minor fix in ToTensor documentation. (#16299)
  • adding redirects so that old website API links surfaced from searches (#16342)
  • Fix code block formatting in Why MXNet doc page (#16334)
  • Julia: add API docs back (#16363)
  • Change mailing list url in footer to point to instructions about how to subscribe instead (#16384)
  • Add instructions to report a security vulnerability (#16383)
  • [DOC] fix installation selector wrong history (#16381)
  • Beta build (#16411)
  • [WIP] Improving Python Docs API (#16392)
  • fix autodoc for spurrious toggles (#16452)
  • [Doc] Update the download page with 1.5.1 release (#16442)
  • Fixing broken links (#16500)
  • add binary and docs build command options (#16514)
  • add option to remove indexes (#16525)
  • Correct Google Analytics Tracker (#16490)
  • [Doc] Use mirror link in the download page (#16501)
  • checking broken link fixes work (#16538)
  • detect number of procs during sphinx build (#16512)
  • fixed broken links across multiple files (#16581)
  • fix missing docs due to git add issues (#16496)
  • second round of fixing broken links in multiple files (#16598)
  • Python Docstring Convetion (#16550)
  • [MXNET-1434] Fix a broken link for basic C++ tutorial (#16461)
  • Fix python doc build issue (#16630)
  • fixing broken links in multiple files - round 3 (#16634)

CI/CD

  • Fix build_ccache_wrappers: (#14631)
  • Remove mhard-float option. This is already deprecated by Google. (#15435)
  • CI: upgrade Julia version from 1.0.3 to 1.0.4 (#15502)
  • Add -R option to ci/build.py to avoid rebuilding containers (#15426)
  • [Dependency Update] Bump up the CI Nvidia docker to CUDA 10.1 (#14986)
  • fixed config.mk and Makefile bugs for installing mkl (#15424)
  • Add -DMXNET_USE_OPENMP to Makefiles so libinfo gets updated accordingly (#15498)
  • [Dependency Update] Dependency update doc (#15045)
  • Remove Scala package test on build (#15915)
  • Refactor for windows CI 'out of heap space' errors (#15922)
  • Fix Nightly Maven GPU (#15989)
  • Windows cmake flags cleanup (#16013)
  • Disable flaky test in test_amp_conversion (#16031)
  • Updates git_init Jenkins utility function to support checking out a particular commit id
  • Adds artifact repository scripts
  • Adds CD pipeline framework
  • Adds static libmxnet release pipeline
  • Updates CD pipeline
  • Adds documentation
  • Updates kvstore functions to use pushd and popd
  • Throws exceptions instead o magic numbers
  • Updates artifact repository cli to use --libtype instead of --static or --dynamic
  • Clarifies ci_utils and cd_utils origin remark
  • Adds clarifying note on why ubuntu 14.04 is being used for compilation
  • Removes MXNET_SHA
  • Removes set_release_job_name
  • Adds license headers
  • Updates artifact repository to expect licenses
  • Moves ci/cd to cd directory
  • Takes downstream job name from environment
  • Updates order of parameters
  • Updates job type parameter to dropdown
  • Adds libmxnet feature extraction code comments
  • Removes ccache setup from static build
  • Disable test coverage of C++ codebase on CI (#15981)
  • Update readme and project.clj comment (#16084)
  • Enable tvm_op for ci (#15889)
  • Not to search for coverage files when none exist (#16107)
  • Fixes openblas installation for static build
  • Update python dependencies (#16105)
  • CD Fixes (#16127)
  • Adds dynamic libmxnet to CD pipeline (#16163)
  • Fix README Build Status (#16183)
  • subscribe to build and CD changes (#16192)
  • [CD] Add COMMIT_ID param to release job (#16202)
  • Fix lack of dylib support in Makefile when use lapack (#15813)
  • Removes git status update stop gap solution (#16285)
  • add mkl installation temp fix (#16304)
  • add 'Release' cmake flag (#16294)
  • S3 upload artifacts (#16336)
  • Fix nightly scala pipeline (#16362)
  • remove redundant branch name (#16372)
  • Skipping installing nightly test (#16418)
  • Adds PyPI CD Pipeline (#16190)
  • upgrade the pytest version (#16429)
  • Revert "add mkl installation temp fix (#16304)" (#16369)
  • increase docker cache timeout (#16430)
  • Adds pip requirements file to nightly gpu ci image (#16472)
  • [CD] Adds python docker pipeline (#16547)
  • Move imagenet inference to nightly (#16577)
  • Backport #16980 #17031 #17018 #17019 to 1.6 branch (#17213)

Misc

  • update committer info (#15289)
  • Typo fix in plan_memory relase -> release. (#15299)
  • indent changes (#15321)
  • Had a few PRs merged. Hope to become an official contributor and potentially a commiter. (#15451)
  • cuda/cuDNN lib version checking. Force cuDNN v7 usage. (#15449)
  • Improve diagnose.py, adding build features info and binary library path. (#15499)
  • update ratcheck for apache-rat 0.13 release (#15417)
  • add myself to interested modules (#15590)
  • 1.5.0 news (#15137)
  • bump up version from 1.5.0 to 1.6.0 on master (#15072)
  • Remove myself from CODEOWNERS (#15617)
  • remove mshadow submodule
  • import mshadow source tree
  • cuDNN support cleanup (#15812)
  • Remove requests_failed_to_import handling
  • Update CODEOWNERS. (#15972)
  • Improve diagnose.py to display environment variables (#15715)
  • Update README.md (#16035)
  • [Dev] update ps-lite dependency (#15936)
  • Typedef cleanup (#15899)
  • add KEY for Tao Lv (#16081)
  • remove 'foo' and other print msg from test (#16088)
  • Revert accidental change to CMakelists (#16040)
  • Update env_var.md (#16145)
  • Update dmlc-core (#16149)
  • adding codeowners (#16165)
  • Factorize CUDA_KERNEL_LOOP used in CUDA kernels (#16197)
  • add code of conduct and conflict resolution (#16343)
  • simple typo error in NEWS.md (#16344)
  • update NEWS.md and README.md (#16385)
  • split issue templates (#16558)
  • Create SECURITY.md (#16573)

How to build MXNet

Please follow the instructions at https://mxnet.incubator.apache.org/get_started

Users that build MXNet from source are recommended to build release 1.6.0 without jemalloc to avoid incompatibilities with llvm's openmp library (details in issue #17043 and PR #17324). This is done for cmake builds by setting USE_JEMALLOC "OFF" in ./CMakeLists.txt, or for make builds with "USE_JEMALLOC = 0" in make/config.mk.

1.5.1

4 years ago

Apache MXNet (incubating) 1.5.1 is a maintenance release incorporating important bug fixes and important performance improvements. All users of Apache MXNet (incubating) 1.5.0 are advised to upgrade. You can install Apache MXNet (incubating) 1.5.1 at the usual place. Please review these Release Notes to learn the bug fixes.

Bug-fixes

  • add deconv in TRT subgraph (#15666) (#16043)
  • Update TRT tutorial with new APIs (#16044)
  • Fix _copy_to on MKLDNN backend (#15637) (#15803)
  • Benchmark doc fix (#15769) (#16029)
  • remove Julia cat image for license issue (#15964) (#16026)
  • added check for empty params file and unknown param (not arg/aux) (#15917)
  • fix license issues (#15806) (#15860)
  • prevent TRT_Logger to be destroyed before TRT engine (#14898) (#15877)
  • [MXNET-1086] added sub and mul to ONNX->TensorRT conversion (#15344) (#15875)
  • handle fix_gamma in tensorrt subgraph conversion correctly (#15645) (#15874)
  • fix LinearRegressionOutput with empty label (#15620) (#15873)
  • [v1.5.x] [MKLDNN] Independent gradients requests check with respect to weights… (#15805)
  • fix dropout mask output (#15697) (#15804)
  • fix fp32 flatten issue (#15351) (#15802)
  • Clojure package remove source images (#15828)
  • changed constructor args (#15601) (#15827)
  • Add MKLDNN 4c layout to fix gluoncv se_resnext101_64x4d (#15692) (#15801)
  • Fix the bug of MXEnginePushAsyncND and MXEnginePushSyncND (#15751) (#15792)

How to build MXNet

Please follow the instructions at https://mxnet.incubator.apache.org/get_started

List of submodules used by Apache MXNet (Incubating) and when they were updated last

Name Commit-id Last update in MXNet Last update in module
dlpack 10892ac Oct 30, 2017 Aug 12, 2019
dmlc-core 3943914 May 14, 2019 Sep 2, 2019
googletest eb9225c Jan 14, 2019 Aug 29, 2019
mkldnn 41bee20 May 14, 2019 Aug 27, 2019
mshadow 1d79ecf May 13, 2019 Aug 4, 2019
nvidia_cub c3cceac Feb 16, 2018 Jul 17, 2019
onnx-tensorrt 1e209e5 Jan 3, 2019 Aug 22, 2019
openmp 37c7212 Nov 14, 2017 Aug 28, 2019
ps-lite 8a76389 Apr 25, 2018 Sep 2, 2019
tvm 21935dc May 21, 2019 Sep 2, 2019