AMDMIGraphX Versions Save

AMD's graph optimization engine.

rocm-5.4.2

1 year ago

ROCm release v5.4.2

rocm-5.4.1

1 year ago

ROCm release v5.4.1

rocm-5.4.0

1 year ago

ROCm release v5.4.0

788ce62be11e8a79410e127bbc2a081ebf2e1db0 Use minimum block size of 64 threads (#1427) 4d471bdaac896d41d963936dba5f4df8551575df Add JIT pad (#1411) (#1441) 360b1801207aba3d7f1c97aeb0dead85695ed3c8 Updates for RC1 (#1425) 83784c521b92f7b7d4ca7a0db86e0afd8cce2246 memset fix (#1414) 01d0ecfc640e4af60f9e3d8720a0c88b548e9b9a Fix rank 2 batch norm (#1412) 32f6388c01e57021e32a5fae2df372f058ddb88c Refactor dynamic padding mode (#1387) be309bfbc4f58d2decdff0ac0220202c698a6cb9 Rewrite TF batch norm; remove batch_norm_inference (#1371) 4f3cc4176235d2c81c1ea2128620fd0dfb5d95cc Simplify unit algebraic ops (#1281) f7d987baa34e30423cafb6edd5ca32eee5f97841 Stream sync Changset (#1358) a9a47402dfc7b6ff6286ac13343a0335a35121fd Fast softmax (#1290) c9ffb38dce2505538866a1a5c6a2178a2c7d7a1a Add output_alias and runs_on_offload_target flags for the custom ops (#1309) e19f78ae8382fce9d832ccc8cd0ec74a44b348ee Use find_2.0 API for the convolution (#1346) c2842c1ec9fd067af1381aac0c8860ad04a07fb7 Fix invalid program in debug mode from find_splits (#1390) 70e63960f08484cfe4b6df7575522a7645d1bdaf Add compute_fp32 flag for quant_gemm tests (#1360) 40118191909f98070b5863d0b58a8bfbfc7c9418 Add onnx mod operator gpu cpu (#1306) c00f82028809ad4e740c8b16eff4c46feb322015 Rewrite ONNX parse batch norm (#1362) 492c4a6c602094975fcbebdc22cc28a824ab9c7a Use larger vector size instead of preloading for broadcasted inputs (#1389) 66bbff1e8adf3044ce76499c87fd37f55ab81005 Upgrade cppcheck to 2.9 (#1400) 94bc41dc6fd9a45b3069608c44463c5adb7a13d3 check concurrency on PR level with one running and one pending performance tests (#1401) 1b575b5ced5cd17daf1c993f789ce792de6affdf update codecov version (#1402) 8ea8473d716d4d9958998606520599edcb3e159d Remove unused device functions (#1394) d9578ba62ccbae472f39c19abe38272c33595029 Parameterize epsilon for layernorm kernel (#1367) 9a70050b5080620cc486cf87e0a13800948f8fe1 Multibroadcast find_mul_conv (#1384) 97a1ed2de7cd5103aed50b10c634adfe041e7681 Improve layernorm and reductions performance (#1348) 34c08db7680d1bfe5c8c8991e32419cdb55f258d Disabled concurrency, queue added to perf-test.yml (#1386) 10f37f49bd25d41e23ec866517acfe618becc4b6 Fix typo for add_sigmoid (#1385) 255fb11abba993636f7e8b4f5ed2e49927138ff5 Update deprecated Pybind constructor (#1382) e1e36cdc5875f5f20e33f719b8ab80cc03c52ec2 [mlir] Replaced find_library with find_package to locate MLIR static library (#1373) 333860cef89a05b2eee5599641a733a63018bf63 Reduce problem size of unbatched_gemm tests (#1383) 4b76dd0d344db8f902f61e95366fad333929f226 Fix split_reshape for slice len of 1 (#1379) 7662d9c03e311695328e1ce1145cd91171be3763 Implement concat using jit compilation (#1356) 827baeec9f99b76804b2ac3b70ef1a1786630ebe expose underlying migraphx::argument data pointer in pybind (#1376) a10a8ef1182424abd62c4612ba3b22d16504a262 Use rocblas_gemm_ex for batched gemms with broadcasted B (#1354) d78bcdfb5f4d030e85ad81a8f42cb953b70c788d Bump version to 2.4 (#1375) ed2c73ac0df1fbcc25f7742d61ca97de42429deb Remove unused headers (#1363) f2667056315799936a2b2562f0f9a71be0b9f917 Fix TF literal parsing for relu6 (#1370) 60aa0e4865627114c829b21fc5be06819bdb56fd Fix accuracy bug when vectorizing slices (#1364) d37a4df96a5033ac73843e949bf6642f8d8f9466 Enable cppcheck rule for 'not', 'or' keywords (#1361) 794a433549c7e8f69742dc5fa8442e2f27e096e9 Add pass to rewrite gelu as fast gelu (#1299) ed7973d1d178974b59719dcc04a27a4b7835e29e Insert contiguous for reshape as necessary (#1351) 349635ce1602772f4ece9a3c5aea1060a0cef799 Show kernel time when using gpu-driver (#1289) 8752875ae10635745e72ac8609e79e4b2c47ccc6 Improvements to handling and add constant passed to dot operator (#1280) af7f22d8ff3dc4c340ce742e1a1c22f64d82a2e4 Fix test suite compile in Ubuntu 22.04 (#1353) 1704bb044c189c0c01624c9692b4516ebd363d8f fix bug size_t -> std::size_t (#1350) fa3c21fa0d37473e369bcb6fed077a4aa674804c Dynamic ref NMS (#1288) 79e15ca9c6b2a403022c885d880c01673e85cb5d Update is_supported (#1334) b691abdd04d510b7c5f1d59190cec9a895e733aa Enable tidy for fpga backend (#1347) 3c133f818b7dd51e2ee2ec8f68e5211de736f2f2 Remove print (#1345) ac507c6413b9fa7afa5bfc4d7ed56b58dc77715b Fix json strings in driver models (#1341) 8045f7c8d5ebf28a6b9e31cc1146c681961fac2a pybind updates for torch_migraphx library (#1323) 7c8f2690938efb3747fcbd3f2b04c6df0c367269 run performance benchmarks on types (#1343) 1784584e0bedc81d00add1b4c663c06f3c84f77b Add jit layernorm fusion (#1301) 18e4a2c6d55b651b302e8f05ca2e1f5f7b743372 Improve horizontal fusion of contiguous (#1292) 0e17a72403fe899b0c0c40dc1fe625c416d45ca1 Fix softmax accuracy issues (#1342) cb53687eda5eda62267eac6341f43e9477414af3 formatting (#1339) bab9502af87e774daf20b143d943a75b34b6f17f Remove prints (#1338) 55cb7d3a3f9202fea5719e67e1f886e09a882861 Enable switching to bare pointer ABI for MLIR (#1333) 7ecb2de4c827538ae038c3f10773299e7a640d3e onnxruntime renamed master to main (#1336)

rocm-5.3.3

1 year ago

ROCm release v5.3.3

rocm-5.3.2

1 year ago

ROCm release v5.3.2

rocm-5.3.1

1 year ago

ROCm release v5.3.1

rocm-5.3.0

1 year ago

Improvements include...

Accuracy update (#1374) Final performance improvements for release (#1369) 53merge v2 (#1357) Allow license_stamper.py to be ran from any directory (#1332) Explicitly set rocblas_pointer_mode in examples (#1331) Imply type of literal returned based on input protobuff for zero elem… (#1326) Dynamic ref convolution op (#1224) Update README.md (#1327) Improve help and error reporting in driver (#1258) Add support for tuning db access in mlir kernel (#1307) Add accuracy checker tool (#1315) Avoid registering host buffer ptr multiple times during hip copies (#1245) Add node name to debug output of PARSE_IF (#1318) Fix literal type in the instance_norm parsing (#1317) Add onnx mod operator (#1302) Add fpga target (#1304) Add performance testing yamls (#1313) Improve error reporting in the API (#1274) Change ownership to company email (#1310) Dynamic check_shapes (#1295) Fix TF parsing for creating literals and Fix name lookups for input params (#1298) Dynamic dimension input onnx parser (#1249) Fix op includes (#1308) Fix test case for min & max operators (#1305) Reduce header inclusion in op headers (#1271) Add tests for C API (#1266) create the dev package (#1293) change to a cached github repo for blaze prereq (#1291) Use current device when constructng context (#1294) Add restrict to jit kernel params (#1300) Improve kernel code generation (#1285) Update perf report to show the number of operators and per operator avg time in summary (#1287) Add env var to enable debug symbols for gpu kernels (#1284) Add is_supported and get_target_assignments (#1269) Dyn shape update (#1199) Add a step to unsqeeze axis (#1242) Verify load and save (#1265) Add jit softmax (#1243) Horizontally fuse contiguous operators (#1232) Add mlir fusion (#1251) Add method to insert multiple instructions (#1178) Invalid parameter for yolov4 example (#1275) NMS refactor, enable nonstandard shape (#1257) Update driver models to use json strings (#1244) Custom Op example using MIOpen calls (#1208) Custom Op example using rocBLAS calls (#1211) Custom Op example using HIP kernel (#1200) Get parent module in the pass manager (#1181) bug fix: register the miopen_fusion op. (#1267) Use jit for contiguous operator (#1217) Adding in check_stamped.py to tools/ (#1255) Add compute_method for the experimental custom op (#1194) remove eliminate_workspace pass (#1254) Fix code block issue with .ipynb files. (#1263) Update license files (#1248) Fixing misspelled macro to enable MIOpen hidden find mode API (#1250) Update lowering of Dot operator (#1247) Update tf_parser to have add_common_op() for parse_relu6 (#1241) Create allocate op and replace_allocate pass (#1183) Instruction distance check fix (#1237) Use env var for creds Add vectorized reduce (#1202) Prioritizing int8 over int8x4 when it is applicable (#1218) Group code objects by kernel name in perf report summary (#1234) Fix compilation on Debian bookworm/sid (#1229) Fix dangling reference with gemm add fusion (#1233) Update protobuf version (#1228) Bump tensorflow from 2.6.4 to 2.7.2 in /examples/nlp/python_bert_squad (#1227) Improve eliminate contiguous pass (#1223) renamed to main from master (#1226) Parallelize evaluations in propagate_constant (#1220) Upgrade to cppcheck 2.8 and fix new issues found (#1225) Used wrong path to download the bertsquad-10.onnx model (#1221) Bump tensorflow from 2.5.3 to 2.6.4 in /examples/nlp/python_bert_squad (#1219) Improve applicable batched gemms (#1214) Remove std references in runtime compilation (#1186) Fuse gemm add with pointwise fusions (#1213) Fix onnx mean parsing for integral inputs (#1209) Rename pointwise ops (#1145) Improve matching with has_value when there are convert operators (#1212) renamed variables for module from p to m (#1204) Update install_prereqs.sh for individual use (#1197) Prefuse layernorm for gpu (#1190) Updated a path to the bert-squad onnx file after upstream changed path (#1201) Expose add_literal in C and Python API (#1173) Refactor vectorization and preloading for pointwise fusions (#1184) upgrade docker images to ROCm 5.0.2 (#1133) Add compile tests for gpu math functions (#1182) Cppcheck fixes (#1195) Extend lifetimes in C++ API (#1139) Bumping version to support next ROCm release (#1192)

rocm-5.2.3

1 year ago

No changes from rocm-5.2.1

rocm-5.2.1

1 year ago

Enabled the devel migraphx package Resolved a bug where migraphx could not run its own binary output files

rocm-5.2.0

1 year ago

Improvements include...

Add GatherND operator (#1089) Add lane reduction (#1180) Expose get_queue method for context in API (#1161) ReverseSequence op (#1177) Refactor Pooling and implement ONNX LpPool and GlobalLpPool (#1152) Reduce with runtime compilation (#1150) Half2 overloads (#1157) Fix file download for resnet50 example (#1164) Fix problem with incomplete types with older clang versions (#1174) Fix out-of-bounds access when generate uses nonpacked tensors (#1160) parallelize the ref implementation of the gemm operator (#1142) scatter operator refactoring to include reduction (#1124) fix a bug in create tensor_view with vec data type (#1155) Fix comparisons in migraphx::value class (#1146) Python Binding for the Manual Graph Buidling (#1143)