Halide Versions Save

a language for fast, portable data-parallel computation

v17.0.1

3 months ago

What's Changed

Changes to make WebGPU code compliant with recent versions of Emscripten (#8106)
Fix rfactor adding too many pure loops (#8107)
Forward the partition methods from generator outputs (#8090)
Fix reduce_expr_modulo of vector in Solve.cpp (#8107)

Full Changelog: https://github.com/halide/Halide/compare/v17.0.0...v17.0.1

v17.0.0

3 months ago

Changes Of Note

ParamMap has been removed entirely from the public API. All users of ParamMap should migrate to Callable instead.
Halide::Parameter has been moved to the public Halide API (it was formerly "internal" and not intended for public use).
New scheduling primitives:
- Func::partition() and friends: Set the loop partition policy, which controls how/whether a loop is split into three loops (prologue/steady-state/epilogue). Loop partitioning can be useful to optimize boundary conditions (e.g. clamp_edge).
- Func::hoist_storage() and friends: allows a functions's storage to be moved to a given loop level. Unlike Func::store_at(), no optimizations are triggered (e.g. sliding window).
New TailStrategy options for for existing scheduling directives:
- ShiftInwardsAndBlend: Equivalent to ShiftInwards, but protects values that would be re-evaluated by loading the memory location that would be stored to, modifying only the elements not contained within the overlap, and then storing the blended result. Unlike ShiftInwards, this is valid to use in update definitions.
- RoundUpAndBlend: Equivalent to RoundUp, but protects values that would be written beyond the end by loading the memory location that would be stored to, modifying only the elements within the region being computed, and then storing the blended result. Unlike RoundUp, this is valid to use on non-outermost splits in update definitions.
Substantially improved performance and display in the VizIR output.
Profiler improvements:
- Substantially nicer text output
- Injects timing into calls for copy_to_host and copy_to_device so you can measure host<->device copy overhead
- Allows option sorting via HL_PROFILER_SORT env var
Substantially faster codegen for several GPU backends.
Experimental serialization/deserialization feature allows for saving of Halide IR code.
Various bug fixes and improvements in the Anderson2021 autoscheduler.
Improved ARM codegen, including: better patterns for sdot/udot; improved shift/mul codegen.
Support for Zen4 architecture in the x86 backend.
Updates to the ONNX app.
Various fixes and improvements to sliding-window and storage-folding.
Improvements to slow gather operations for some x86 variants.
Improvements to correctness for the .async() scheduling directive.
Improved codegen for float16 conversion, especially on x86.
Several compile-time warnings of dubious usefulness disabled.
WebAssembly codegen now defaults to assuming that saturating-float-to-int and sign-extension instructions sets are always available.
Target now does some reality-checking that it doesn't contain obviously nonsensical Feature combinations

What's Changed

Misc changes and fixes to RISCV codegen
Revise LLVM fix to work when no V8 or WABT available by @steven-johnson in https://github.com/halide/Halide/pull/7635
Be more careful about overflow in trim_bounds_using_alignment by @abadams in https://github.com/halide/Halide/pull/7645
Add a compositing example app by @abadams in https://github.com/halide/Halide/pull/7646
Get the ASAN toolchain working again by @steven-johnson in https://github.com/halide/Halide/pull/7604
Upgrade clang-format and clang-tidy to use v16 by @steven-johnson in https://github.com/halide/Halide/pull/7660
Enable the misc-use-anonymous-namespace clang-tidy check by @steven-johnson in https://github.com/halide/Halide/pull/7661
Enable clang-tidy's modernize-use-default-member-init check by @steven-johnson in https://github.com/halide/Halide/pull/7662
Update onnx app to Adams2019 autoscheduler and new autoscheduler API by @abadams in https://github.com/halide/Halide/pull/7673
Remove ParamMap by @steven-johnson in https://github.com/halide/Halide/pull/7675
Fix correctness_float16_t for ASAN builds by @steven-johnson in https://github.com/halide/Halide/pull/7687
Add a select overload for tuples by @abadams in https://github.com/halide/Halide/pull/7672
Add Sanitizer details to README_cmake.md by @steven-johnson in https://github.com/halide/Halide/pull/7688
Fix quadratic algorithm in simplify_correlated_differences by @abadams in https://github.com/halide/Halide/pull/7686
Fix float16 under asan, attempt #2 by @steven-johnson in https://github.com/halide/Halide/pull/7691
Add a warning if a Generator declares any Outputs before the final Input (Fixes #7669) by @steven-johnson in https://github.com/halide/Halide/pull/7697
Fixed the regularization for BGU. by @mcourteaux in https://github.com/halide/Halide/pull/7684
Fix clang and llvm versions in scripts by @TH3CHARLie in https://github.com/halide/Halide/pull/7702
Fix leaks caused by self-referential parameter constraints by @abadams in https://github.com/halide/Halide/pull/7700
Fix float16 warning for older clangs by @abadams in https://github.com/halide/Halide/pull/7701
Upgrade Halide main branch for LLVM18 by @steven-johnson in https://github.com/halide/Halide/pull/7710
Improved profiler result printing. by @mcourteaux in https://github.com/halide/Halide/pull/7709
Default WITH_TEST_FUZZ to OFF by @steven-johnson in https://github.com/halide/Halide/pull/7695
Throw an erorr if split is called with the same older and inner var name by @TH3CHARLie in https://github.com/halide/Halide/pull/7715
Making HLSL code-gen a couple orders of magnitude faster... by @slomp in https://github.com/halide/Halide/pull/7719
Making Metal code-gen a bit faster by @slomp in https://github.com/halide/Halide/pull/7720
Fix handling of thread features for scalars in Anderson2021 by @aekul in https://github.com/halide/Halide/pull/7726
Change default generator timeout to infinite by @abadams in https://github.com/halide/Halide/pull/7718
Remove unused using decl by @abadams in https://github.com/halide/Halide/pull/7730
[Hexagon] - Fix problems in sim_host.cpp by @pranavb-ca in https://github.com/halide/Halide/pull/7725
Fix RDom usage in anderson2021_test_apps_autoscheduler (Fixes #7729) by @steven-johnson in https://github.com/halide/Halide/pull/7734
Fix leak on cloning functions with update defs by @abadams in https://github.com/halide/Halide/pull/7735
Ignore code in src/runtime/hexagon_remote/bin/src for clang-format by @steven-johnson in https://github.com/halide/Halide/pull/7736
Clean up really long line lengths in Anderson2021 by @steven-johnson in https://github.com/halide/Halide/pull/7728
Revise labels on autoscheduler tests by @steven-johnson in https://github.com/halide/Halide/pull/7732
Speedup the VizIR HTML. by @mcourteaux in https://github.com/halide/Halide/pull/7713
Run clang-tidy on macOS runners instead of Linux by @steven-johnson in https://github.com/halide/Halide/pull/7746
Fix infinite recursion in loop partitioning by @abadams in https://github.com/halide/Halide/pull/7743
Fix leaks in test/correctness/memoize.cpp by @abadams in https://github.com/halide/Halide/pull/7705
Allow optional sorting of profiler output via HL_PROFILER_SORT env var (Fixes #7638) by @steven-johnson in https://github.com/halide/Halide/pull/7639
Permit llvm 15 on windows by @abadams in https://github.com/halide/Halide/pull/7744
Revert accidental typo change in #7746 by @steven-johnson in https://github.com/halide/Halide/pull/7747
[vulkan] Fix heap buffer overflow in Vulkan extension handling discovered by ASAN by @derek-gerstmann in https://github.com/halide/Halide/pull/7740
[vulkan] Fix SPIR-V IR references causing leaks by @derek-gerstmann in https://github.com/halide/Halide/pull/7739
Improve error-handling in Anderson2021, and ensure build deps are cor… by @steven-johnson in https://github.com/halide/Halide/pull/7748
StmtViz: Search for tooltip only in the child node by @antonysigma in https://github.com/halide/Halide/pull/7754
Experimental serializer by @TH3CHARLie in https://github.com/halide/Halide/pull/7594
Define cast<i32>(u32) overflow behavior by @rootjalex in https://github.com/halide/Halide/pull/7769
Fix vector reduce HTML by @mcourteaux in https://github.com/halide/Halide/pull/7773
Remove fragile simd_op_check test for mlal/mlsl on ARM by @rootjalex in https://github.com/halide/Halide/pull/7775
Speedup page loading of VizStmt. by @mcourteaux in https://github.com/halide/Halide/pull/7755
Try to fix remaining ASAN-reported leaks by @steven-johnson in https://github.com/halide/Halide/pull/7767
Fix out of bounds access in anderson2021_test_apps_autoscheduler by @aekul in https://github.com/halide/Halide/pull/7771
Don't introduce reinterprets in find/lower intrinsics by @rootjalex in https://github.com/halide/Halide/pull/7776
[Hexagon] -Build Hexagon runtime components using the Hexagon SDK (Clone of #7671) by @pranavb-ca in https://github.com/halide/Halide/pull/7741
slice IRMatcher should only match on slices by @abadams in https://github.com/halide/Halide/pull/7772
Don't inject undef() in the simplifier by @abadams in https://github.com/halide/Halide/pull/7791
Fix for top-of-tree LLVM by @steven-johnson in https://github.com/halide/Halide/pull/7798
[ARM] Distribute shifts as muls by @rootjalex in https://github.com/halide/Halide/pull/7790
[ARM] support new udot/sdot patterns by @rootjalex in https://github.com/halide/Halide/pull/7800
Remove some unused includes by @abadams in https://github.com/halide/Halide/pull/7799
Add support to the makefile for serialization by @abadams in https://github.com/halide/Halide/pull/7762
[wasm] Enable PIC for WebAssembly on LLVM v18.x by @derek-gerstmann in https://github.com/halide/Halide/pull/7803
Update WebGPU to latest Emscripten/Dawn API by @steven-johnson in https://github.com/halide/Halide/pull/7804
Add jump-buttons to get fro Stmt directly to Assembly by @mcourteaux in https://github.com/halide/Halide/pull/7793
Update clang-tidy action to stop breaking by @steven-johnson in https://github.com/halide/Halide/pull/7808
[serialization] Add serialization support to generator interface by @derek-gerstmann in https://github.com/halide/Halide/pull/7792
Ensure that multitarget AOT builds have consistent random sequence by @steven-johnson in https://github.com/halide/Halide/pull/7717
Move clang-tidy checks back to Linux by @steven-johnson in https://github.com/halide/Halide/pull/7817
Update 'Check CMake file lists' action by @steven-johnson in https://github.com/halide/Halide/pull/7809
Remove dead auto-schedule label in CMake by @steven-johnson in https://github.com/halide/Halide/pull/7818
Don't return an undefined Stmt() from IfThenElse visitor by @abadams in https://github.com/halide/Halide/pull/7816
Avoid generating name collisions in CSE by @abadams in https://github.com/halide/Halide/pull/7821
Add a check that PredicateLoads must be used in the outermost split of a dimension by @TH3CHARLie in https://github.com/halide/Halide/pull/7788
Enable emission of float16/32 casts on x86 by @abadams in https://github.com/halide/Halide/pull/7837
Iterate over lets in the correct order in VectorizeLoops by @vksnk in https://github.com/halide/Halide/pull/7830
Zen4 support by @abadams in https://github.com/halide/Halide/pull/7840
Update arguments in driver.cpp to match what correctness/simd_op_check has by @vksnk in https://github.com/halide/Halide/pull/7842
[tutorials] Add tutorial on JIT compile/execute performance by @derek-gerstmann in https://github.com/halide/Halide/pull/7838
[api] Promote Internal::Parameter to Halide::Parameter by @derek-gerstmann in https://github.com/halide/Halide/pull/7829
[Hexagon] - Fix 8-bit unsigned saturating downcasts for HVX (Fixes #7806) by @pranavb-ca in https://github.com/halide/Halide/pull/7825
Handle nested vectorization in store predicates by @abadams in https://github.com/halide/Halide/pull/7864
Respect input buffer constraints in root-level bounds inference exprs by @abadams in https://github.com/halide/Halide/pull/7865
Prevent use of uninitialized scalar Parameters in JIT code (#7847, partial) by @steven-johnson in https://github.com/halide/Halide/pull/7853
Handle unreachable code in bounds inference by @abadams in https://github.com/halide/Halide/pull/7866
[serialization] Add support to serialize to memory, and a basic serialization tutorial by @derek-gerstmann in https://github.com/halide/Halide/pull/7760
Don't deduce unreachability from predicated out of bounds stores by @abadams in https://github.com/halide/Halide/pull/7874
Validate for types when fusing Vars with RVars by @abadams in https://github.com/halide/Halide/pull/7877
Consider all dimensions before deciding to slide over a new dimension by @abadams in https://github.com/halide/Halide/pull/7875
Update onnx app to work with newer versions of protobuf by @abadams in https://github.com/halide/Halide/pull/7879
HTML Stmt IR with conceptual code and device code. by @mcourteaux in https://github.com/halide/Halide/pull/7843
Update README.md to include RISCV in llvm build instructions by @abadams in https://github.com/halide/Halide/pull/7878
Implement elementwise complex value division by @antonysigma in https://github.com/halide/Halide/pull/7848
Explicitly name the allocgroups on GPU schedules "allocgroup__..." by @mcourteaux in https://github.com/halide/Halide/pull/7883
Generate simpler LLVM IR for shuffles that recursively become broadcasts by @abadams in https://github.com/halide/Halide/pull/7902
Check for overflow in Type constructor by @abadams in https://github.com/halide/Halide/pull/7889
Mutating if branches in isolation can break reachability analysis by @abadams in https://github.com/halide/Halide/pull/7895
Disable warning for mismatched new/delete by @abadams in https://github.com/halide/Halide/pull/7897
Assignment is not associative by @abadams in https://github.com/halide/Halide/pull/7894
Don't lift loop vars outside of their loops in sliding window by @abadams in https://github.com/halide/Halide/pull/7896
Stop interleaver from expanding the scope of letstmts by @abadams in https://github.com/halide/Halide/pull/7908
Highlight groups for the HTML Stmt file and tooltips to reveal types. by @mcourteaux in https://github.com/halide/Halide/pull/7887
Static analysis (MSVC) fixes for device_buffer_utils.h by @slomp in https://github.com/halide/Halide/pull/7904
Check returned result in the test by @vksnk in https://github.com/halide/Halide/pull/7911
Fix read-after-write hazard analysis in storage folding by @abadams in https://github.com/halide/Halide/pull/7910
Turn off SLP vectorization for avx512 only by @abadams in https://github.com/halide/Halide/pull/7918
Scheduling directive to hoist the storage of the function by @vksnk in https://github.com/halide/Halide/pull/7915
Improve the error message if you store_at without a compute_at by @vksnk in https://github.com/halide/Halide/pull/7923
Loop Partitioning Policy through Stage::partition(VarOrRVar, LoopPartitionPolicy) by @mcourteaux in https://github.com/halide/Halide/pull/7914
Remove use of dynamic_cast. by @zvookin in https://github.com/halide/Halide/pull/7931
Add special build for testing serialization via a serialization roundtrip in JIT compilation and fix serialization leaks by @TH3CHARLie in https://github.com/halide/Halide/pull/7763
Add missing serialization of Dim::partition_policy by @TH3CHARLie in https://github.com/halide/Halide/pull/7935
Make sure all Halide arithmetic scalar types can be named from the Generator interface. by @zvookin in https://github.com/halide/Halide/pull/7934
Remove the deprecated API llvm::Type::getInt8PtrTy usage. by @hokein in https://github.com/halide/Halide/pull/7937
More targeted fix for gather instructions being slow on intel processors by @abadams in https://github.com/halide/Halide/pull/7945
Track likely values through lets in loop partitioning by @abadams in https://github.com/halide/Halide/pull/7930
Add missing condition to if renesting rule by @abadams in https://github.com/halide/Halide/pull/7952
Always call lower_round_to_nearest_ties_to_even on arm32 by @vksnk in https://github.com/halide/Halide/pull/7957
Improve code size and compile time for local laplacian app by @abadams in https://github.com/halide/Halide/pull/7927
[serialization] Serialize stub definitions of external parameters. by @derek-gerstmann in https://github.com/halide/Halide/pull/7926
[WebGPU] Update to latest native headers by @jrprice in https://github.com/halide/Halide/pull/7932
Return values from stub functions in Deserialization by @steven-johnson in https://github.com/halide/Halide/pull/7963
Make the fast inverse test throughput-limited rather than latency-limited by @abadams in https://github.com/halide/Halide/pull/7958
Attempt to fix nested vectorization gemm performance on new build bot by @abadams in https://github.com/halide/Halide/pull/7959
Update instructions to include generated schedules by @antonysigma in https://github.com/halide/Halide/pull/7928
[serialization] Add Halide version and serialization version in serialization format by @TH3CHARLie in https://github.com/halide/Halide/pull/7905
Handle many more intrinsics in Bounds.cpp by @steven-johnson in https://github.com/halide/Halide/pull/7823
Disallow async nestings that violate read after write dependencies by @abadams in https://github.com/halide/Halide/pull/7868
complete_x86_target() should enable F16C and FMA when AVX2 is present by @steven-johnson in https://github.com/halide/Halide/pull/7971
Add two new tail strategies for update definitions by @abadams in https://github.com/halide/Halide/pull/7949
Add appropriate mattrs for arm-32 extensions by @abadams in https://github.com/halide/Halide/pull/7978
Move canonical version numbers into source, not build system (#7980) by @steven-johnson in https://github.com/halide/Halide/pull/7981
Silence useless "Insufficient parallelism" autoscheduler warning by @steven-johnson in https://github.com/halide/Halide/pull/7990
Add a notebook with a visualization of the aprrox_* functions and their errors by @vksnk in https://github.com/halide/Halide/pull/7974
Make narrowing float->int casts on wasm go via wider ints by @abadams in https://github.com/halide/Halide/pull/7973
Fix handling of assert statements whose conditions get vectorized by @abadams in https://github.com/halide/Halide/pull/7989
Fix all "unscheduled update()" warnings in our code by @steven-johnson in https://github.com/halide/Halide/pull/7991
Silence useless 'Outer dim vectorization of var' warning in Mullapudi… by @steven-johnson in https://github.com/halide/Halide/pull/7992
Make wasm +sign-ext and +nontrapping-fptoint the default by @steven-johnson in https://github.com/halide/Halide/pull/7995
Teach unrolling to exploit conditions in enclosing ifs by @abadams in https://github.com/halide/Halide/pull/7969
Do some basic validation of Target Features (#7986) by @steven-johnson in https://github.com/halide/Halide/pull/7987
Inject profiling for function calls to 'halide_copy_to_host' and 'halide_copy_to_device'. by @mcourteaux in https://github.com/halide/Halide/pull/7913
bounds_of_nested_lanes assumed that one layer of nested vectorization could be removed at a time, but failed in situations with unusual nesting structures. by @abadams in https://github.com/halide/Halide/pull/8039 and 8055
we now track whether or not let expressions failed to solve in solver; failure to do this meant we did unhelpful transformations in some cases which let to exploding compile times. by @abadams in https://github.com/halide/Halide/pull/7982

Full Changelog: https://github.com/halide/Halide/compare/v16.0.0...v17.0.0

v16.0.0

10 months ago

What's Changed

General Notes

Support for the Vulkan API (w/SPIR-V codegen)
Support for WebGPU (experimental)
Improved Halide IR HTML Visualization
Fixed a regression in the Adams2019 auto-scheduler that disabled sub-tiling
Added GPU auto-scheduler (Anderson2021)

Efficient Automatic Scheduling of Imaging and Vision Pipelines for the GPU Luke Anderson, Andrew Adams, Karima Ma, Tzu-Mao Li, Tian Jin, Jonathan Ragan-Kelley Proceedings of the ACM on Programming Languages (OOPSLA 2021)

Deprecations / Removals

OpenGLCompute has been deprecated
ParamMap has been deprecated
Deprecated HVX_shared_object feature has been removed
References to deprecated fixed-point operators have been removed
Deprecated halide_target_feature_disable_llvm_loop_opt has been removed
Deprecated MIPS device support has been removed

Notable Fixes & Changes

Generate dot() in the Metal backend by @vksnk in https://github.com/halide/Halide/pull/7085
Add evaluate() and evaluate_may_gpu() to Python bindings by @steven-johnson in https://github.com/halide/Halide/pull/7108
Add support for generating LLVM vector predication intrinsics. by @zvookin in https://github.com/halide/Halide/pull/7111
RISC V vector predication support intrinsics support by @zvookin in https://github.com/halide/Halide/pull/7119
Add range-checking to Buffer objects in Python by @steven-johnson in https://github.com/halide/Halide/pull/7128
Fix Python buffer handling by @steven-johnson in https://github.com/halide/Halide/pull/7125
[WASM] Use rounding_mul_shift_right for q15mulr_sat_s pattern by @rootjalex in https://github.com/halide/Halide/pull/7134
[x86] Generate AVX512 fixed-point instructions by @rootjalex in https://github.com/halide/Halide/pull/7129
Fix readnone attribute for llvm 16 by @abadams in https://github.com/halide/Halide/pull/7152
Call cache.clear between internal functions in CG_C by @steven-johnson in https://github.com/halide/Halide/pull/7155
Add bfloat support to halide_type_to_string() by @steven-johnson in https://github.com/halide/Halide/pull/7154
Factor simd_op_check into separate files by architecture. by @zvookin in https://github.com/halide/Halide/pull/7163
Slightly improve error message for non-integer RDom min/extent by @abadams in https://github.com/halide/Halide/pull/7151
Migrate from MCJIT to ORC JIT by @dkurt in https://github.com/halide/Halide/pull/7166
Use n32:64 in RISC-V data layout by @dkurt in https://github.com/halide/Halide/pull/7175
Don't attempt to use makecontext()/swapcontext() on Android by @steven-johnson in https://github.com/halide/Halide/pull/7196
Add bridging for clang _Float16 type. by @zvookin in https://github.com/halide/Halide/pull/7201
Fix issue with vector predicated comparison and select instructions. by @zvookin in https://github.com/halide/Halide/pull/7205
Add RISC V zvl flag for LLVM version 16 or greater. by @zvookin in https://github.com/halide/Halide/pull/7209
Extend LLVM IR type mangling to handle scalars. by @zvookin in https://github.com/halide/Halide/pull/7212
Fix bitrot in PowerPC testing by @steven-johnson in https://github.com/halide/Halide/pull/7211
Use aligned_alloc() as default allocator for HalideBuffer.h on most platforms by @steven-johnson in https://github.com/halide/Halide/pull/7190
Tighten alignment promises for halide_malloc() by @steven-johnson in https://github.com/halide/Halide/pull/7222
Fix some sources of signed integer overflow in the compiler by @abadams in https://github.com/halide/Halide/pull/7231
Explicitly stage strided loads by @abadams in https://github.com/halide/Halide/pull/7230
Remove deprecated halide_target_feature_disable_llvm_loop_opt by @steven-johnson in https://github.com/halide/Halide/pull/7247
Conditional allocations shouldn't fail for size=0 in C++ backend (#7255) by @steven-johnson in https://github.com/halide/Halide/pull/7256
Inline into extern function args during bounds inference by @abadams in https://github.com/halide/Halide/pull/7261
Use ::aligned_alloc() instead of std::aligned_alloc() in HalideBuffer.h by @steven-johnson in https://github.com/halide/Halide/pull/7268
Optimize Module::compile() for some edge cases by @steven-johnson in https://github.com/halide/Halide/pull/7269
Drop support for MIPS (#7287) by @steven-johnson in https://github.com/halide/Halide/pull/7289
Emit prototypes for destructor functions in C Backend by @steven-johnson in https://github.com/halide/Halide/pull/7296
[HVX] Fix EliminateInterleaves by @rootjalex in https://github.com/halide/Halide/pull/7279
Remove dependency on platform threads library by @alexreinking in https://github.com/halide/Halide/pull/7297
Fix error of add_halide_generator in cross-compilation by @stevesuzuki-arm in https://github.com/halide/Halide/pull/7283
Fix issue in add_halide_runtime in cross-compilation by @stevesuzuki-arm in https://github.com/halide/Halide/pull/7284
Add workaround for the const-or-not user_context issue (#635) by @steven-johnson in https://github.com/halide/Halide/pull/7291
[x86 & wasm] Split up double saturating-narrows from i32 by @rootjalex in https://github.com/halide/Halide/pull/7280
Hoist vector slices using rewrite rules by @abadams in https://github.com/halide/Halide/pull/7243
Improved halide_popcount by @Aelphy in https://github.com/halide/Halide/pull/7225
halide_popcount<uint64_t> is broken by @steven-johnson in https://github.com/halide/Halide/pull/7313
Fix segfault by nonconstant bound in Adams2019 by @stevesuzuki-arm in https://github.com/halide/Halide/pull/7321
Make auto scheduler libs available in HalideHelpers package by @stevesuzuki-arm in https://github.com/halide/Halide/pull/7285
Improve support for Arm baremetal compilation and runtime by @stevesuzuki-arm in https://github.com/halide/Halide/pull/7286
Remove deprecated HVX_shared_object feature by @steven-johnson in https://github.com/halide/Halide/pull/7331
Fix a subtle uninitialized-memory-read in Buffer::for_each_value() by @steven-johnson in https://github.com/halide/Halide/pull/7330
Add a hook to Codegen_C::compile() by @steven-johnson in https://github.com/halide/Halide/pull/7335
Tiny improvements in codegen in C backend by @steven-johnson in https://github.com/halide/Halide/pull/7337
Devirtualize the protected compile() methods in Codegen_C by @steven-johnson in https://github.com/halide/Halide/pull/7341
Fix tuple output bounds checks by @abadams in https://github.com/halide/Halide/pull/7345
Change early-bound default args in Python bindings to late-bound by @steven-johnson in https://github.com/halide/Halide/pull/7347
Fix Python error handling by @steven-johnson in https://github.com/halide/Halide/pull/7352
Permit vectorization of non-recursive atomic operations by @abadams in https://github.com/halide/Halide/pull/7346
Update WABT to 1.0.32; Increase stack size for WASM AOT apps by @steven-johnson in https://github.com/halide/Halide/pull/7373
Bounds visitors for min/max were missing single_point mutated case by @abadams in https://github.com/halide/Halide/pull/7377
Fix overflow in x86 absd lowering by @abadams in https://github.com/halide/Halide/pull/7407
Add initial support for WebGPU by @jrprice in https://github.com/halide/Halide/pull/6492
Use pmaddubsw for non-RDom horizontal widening adds by @abadams in https://github.com/halide/Halide/pull/7440
Compute comparison masks in narrower types if possible by @abadams in https://github.com/halide/Halide/pull/7392
Fix bugs in PyTorch codegen. by @Yongqi-Zhuo in https://github.com/halide/Halide/pull/7443
Remove references to deprecated variants of fixed-point operators by @steven-johnson in https://github.com/halide/Halide/pull/7457
Add GPU autoscheduler by @aekul in https://github.com/halide/Halide/pull/6856
d3d12 runtime: replacing spinlocks by mutex objects by @slomp in https://github.com/halide/Halide/pull/7489
Feature Enhancement: Halide IR HTML Visualization by @maaz139 in https://github.com/halide/Halide/pull/7421
Deprecate ParamMap (#7121) by @steven-johnson in https://github.com/halide/Halide/pull/7357
Forbid assigning to Buffer(Expr) by introducing an intermediate type. by @abadams in https://github.com/halide/Halide/pull/7517
[vulkan phase2] Vulkan Runtime by @derek-gerstmann in https://github.com/halide/Halide/pull/6924
Add libfuzzer compatible fuzz harness by @silvergasp in https://github.com/halide/Halide/pull/7512
fuzz: Port correctness/cse fuzzer over to libfuzzer by @silvergasp in https://github.com/halide/Halide/pull/7543
metal : replacing spinlock by mutex by @slomp in https://github.com/halide/Halide/pull/7532
Fix save_tiff() PlanarConfig assignment for monochrome inputs by @philboske in https://github.com/halide/Halide/pull/7568
Fix various compilation errors with AppleClang 14.0.3 by @steven-johnson in https://github.com/halide/Halide/pull/7578
fuzz: Add libfuzzer compatible bounds fuzzer by @silvergasp in https://github.com/halide/Halide/pull/7549
Significant change to RISC V and scalable vector code generation. by @zvookin in https://github.com/halide/Halide/pull/7616
Fix inverted may_subtile checks by @abadams in https://github.com/halide/Halide/pull/7626
Deprecate OpenGLCompute for Halide 16 by @shoaibkamil in https://github.com/halide/Halide/pull/7627

New Contributors

@sashashura made their first contribution in https://github.com/halide/Halide/pull/7136
@twesterhout made their first contribution in https://github.com/halide/Halide/pull/7315
@terryheo made their first contribution in https://github.com/halide/Halide/pull/7323
@adrian-lebioda made their first contribution in https://github.com/halide/Halide/pull/7379
@Ttayu made their first contribution in https://github.com/halide/Halide/pull/7402
@Yongqi-Zhuo made their first contribution in https://github.com/halide/Halide/pull/7443
@aekul made their first contribution in https://github.com/halide/Halide/pull/6856
@zhen8838 made their first contribution in https://github.com/halide/Halide/pull/7494
@maaz139 made their first contribution in https://github.com/halide/Halide/pull/7421
@silvergasp made their first contribution in https://github.com/halide/Halide/pull/7512
@dbabokin made their first contribution in https://github.com/halide/Halide/pull/7545
@philboske made their first contribution in https://github.com/halide/Halide/pull/7568

Full Changelog: https://github.com/halide/Halide/compare/v15.0.1...v16.0.0

v15.0.1

1 year ago

What's Changed

The Python binding of compile_to_callable() was not properly copying from device to host for output buffers, so output was typically black (or garbage) when used with a GPU target. (#7213)
The bin directory was missing from the installs.
Upgraded LLVM to 15.0.7
New in 15.0.0, but restated here for visibility: The target flag disable_llvm_loop_opt is deprecated, as it's now the default behavior. This means that we have turned off llvm's autovectorization and loop unrolling. This should not affect any schedules with manually-specified vectorization and unrolling, other than trimming code size a little. However, schedules that do not vectorize or unroll may slow down because they were (intentionally or not) relying on llvm to do it automatically. If you see a performance regression with Halide 15, try turning on the enable_llvm_loop_opt target flag.

v15.0.0

1 year ago

What's Changed

General Notes

Support for RISC V Vector architectures.
Python-related:
- Halide builds for Python are now being built and provided to PyPI, so it is now possible to use the Halide Python bindings simply by pip install halide
- Major improvements were made to the Python bindings, with many missing or incomplete sections of the API added or filled in.
- We now support the use of Generators from Python (for both JIT and AOT usage).
- The standard CMake rules now support generating a Python extension directly.
- Support for Python was removed from Halide's Makefiles; you must use CMake to build the Python bindings
Halide::Func now allows you to (optionally) constrain the type(s) of Exprs that the Func can contain, and/or the dimensionality of the Func.
Added a new way to use the JIT (compile_to_callable) that allows calling a jitted function with the same syntax as for AOT-compiled functions, allowing more control over JIT lifespan, as well as thread-safe arguments without requiring ParamMap
General improvements to SIMD codegen
Several rarely-used parts of the C++ Generator API were deprecated, and the way that autoschedulers are specified for AOT compilation is now completely different (but better for future expandability).
CMake builds now require >= v3.22
WABT usage requires >= v1.0.30
LLVM 12 is no longer supported
The target flag disable_llvm_loop_opt is deprecated, as it's now the default behavior. This means that we have turned off llvm's autovectorization and loop unrolling. This should not affect any schedules with manually-specified vectorization and unrolling, other than trimming code size a little. However, schedules that do not vectorize or unroll may slow down because they were (intentionally or not) relying on llvm to do it automatically. If you see a performance regression with Halide 15, try turning on the enable_llvm_loop_opt target flag.

Notable bug fixes

Make Halide::round behave as documented (https://github.com/halide/Halide/pull/7012)
Incorrect folding of saturating_sub (https://github.com/halide/Halide/issues/6883)
The check for race conditions didn't consider where clauses (https://github.com/halide/Halide/issues/6808)
Performance regression for x86 for certain LLVM versions (https://github.com/halide/Halide/issues/6783)
Fusing a specialization drops compute_withs from generated code (https://github.com/halide/Halide/pull/6770)
Incorrect output when realize condition depends on tuple call (https://github.com/halide/Halide/pull/6915)
Python extensions should default to throwing exceptions rather than calling abort() for errors (https://github.com/halide/Halide/pull/6986)
Python bindings didn't support bool buffers (https://github.com/halide/Halide/pull/7006)
Python bindings didn't support float16 buffers (https://github.com/halide/Halide/pull/7060)
Python extensions that executed on GPU didn't copy back to host properly (https://github.com/halide/Halide/pull/6869)
Fix bugs in div_round_to_zero and fast_integer_divide_round_to_zero (https://github.com/halide/Halide/pull/7008)
Bugs in add_requirement() (https://github.com/halide/Halide/pull/7045)

Major changes

Augment Halide::Func to allow for constraining Type and Dimensionality by @steven-johnson in https://github.com/halide/Halide/pull/6734 and https://github.com/halide/Halide/pull/6735
Add Target support for architectures with implementation specific vector size. by @zvookin in https://github.com/halide/Halide/pull/6786
Add support for vscale vector code generation. by @zvookin in https://github.com/halide/Halide/pull/6802
Remove Python bindings from Makefiles by @alexreinking in https://github.com/halide/Halide/pull/6821
Add a new, alternate JIT-call convention by @steven-johnson in https://github.com/halide/Halide/pull/6777
Pip packaging by @alexreinking in https://github.com/halide/Halide/pull/6886 and https://github.com/halide/Halide/pull/6938
Define a Generator framework in Python by @steven-johnson in https://github.com/halide/Halide/pull/6764
Make Halide::round behave as documented by @abadams in https://github.com/halide/Halide/pull/7012

Minor changes

-mtune=/-mcpu= support for x86 AMD CPU's by @LebedevRI in https://github.com/halide/Halide/pull/6655
Enable deprecations warnings by @steven-johnson in https://github.com/halide/Halide/pull/6555
Fix GPU depredication/scalarization by @shoaibkamil in https://github.com/halide/Halide/pull/6669
Allow PyPipeline and PyFunc to realize() scalar buffers by @steven-johnson in https://github.com/halide/Halide/pull/6674
Future-proof 'processortotune processor` by @LebedevRI in https://github.com/halide/Halide/pull/6673
Fix ctors for Realization by @steven-johnson in https://github.com/halide/Halide/pull/6675
-mtune=native CPU autodetection for AMD Zen 3 CPU by @LebedevRI in https://github.com/halide/Halide/pull/6648
Clean up Python extensions in python_bindings by @steven-johnson in https://github.com/halide/Halide/pull/6670
Halide::Tools::save_image() should accept buffers with const types by @steven-johnson in https://github.com/halide/Halide/pull/6679
Fix "set but not used" warnings/errors by @steven-johnson in https://github.com/halide/Halide/pull/6683
Drop support for LLVM12 by @steven-johnson in https://github.com/halide/Halide/pull/6686
Upgrade to clang-format 13 by @steven-johnson in https://github.com/halide/Halide/pull/6689
Always mark _ucon as 'unused' in Codegen_C by @steven-johnson in https://github.com/halide/Halide/pull/6691
Add break to avoid 'possible unintentional fallthru' warning by @steven-johnson in https://github.com/halide/Halide/pull/6694
Silence "unknown warning" in Clang 13 by @steven-johnson in https://github.com/halide/Halide/pull/6693
Fixes for top-of-tree LLVM by @steven-johnson in https://github.com/halide/Halide/pull/6697
Python: make Func implicitly convertible to Stage (#6702) by @steven-johnson in https://github.com/halide/Halide/pull/6704
llvm no longer wants a type suffix on vst intrinsics by @abadams in https://github.com/halide/Halide/pull/6701
Fix type-mangling for vst on arm32 for LLVM15 by @steven-johnson in https://github.com/halide/Halide/pull/6705
Remove the last remaining call to getPointerElementType() by @steven-johnson in https://github.com/halide/Halide/pull/6715
ARM vst mangling needs to be conditional on opaque ptrs by @steven-johnson in https://github.com/halide/Halide/pull/6716
Combine string constants in combine_strings() by @steven-johnson in https://github.com/halide/Halide/pull/6717
Update CodeGen_PTX_Dev to use new PassManager by @steven-johnson in https://github.com/halide/Halide/pull/6718
Closure functions for parallel tasks should be internal, not external by @steven-johnson in https://github.com/halide/Halide/pull/6720
Smarten type_of<> for fn ptrs; fix async_parallel for C backend by @steven-johnson in https://github.com/halide/Halide/pull/6719
Remove legacy::FunctionPassManager usage in Codegen_PTX_Dev by @steven-johnson in https://github.com/halide/Halide/pull/6722
get_amd_processor(): implement detection for the rest of supported AMD CPU's by @LebedevRI in https://github.com/halide/Halide/pull/6711
Add Func::output_type() method by @steven-johnson in https://github.com/halide/Halide/pull/6724
Grab-bag of minor Python fixes by @steven-johnson in https://github.com/halide/Halide/pull/6725
Remove rounding_halving_sub and non-existent arm rhsub instructions by @rootjalex in https://github.com/halide/Halide/pull/6723
Faster widening_mul(int16x, int16x) -> int32x for x86 (AVX2 and SSE2) by @rootjalex in https://github.com/halide/Halide/pull/6677
Add missing #include in ThreadPool.h by @steven-johnson in https://github.com/halide/Halide/pull/6738
Fix regression from #6734 by @steven-johnson in https://github.com/halide/Halide/pull/6739
Add forwarding for the recently-added Func::output_type() method by @steven-johnson in https://github.com/halide/Halide/pull/6741
Silence "unscheduled update stage" warnings in msan_generator.cpp by @steven-johnson in https://github.com/halide/Halide/pull/6740
Add pycache to toplevel .gitignore file by @steven-johnson in https://github.com/halide/Halide/pull/6743
Silence "may be used uninitialized" in Buffer::for_each_element() by @steven-johnson in https://github.com/halide/Halide/pull/6747
Update WABT to 1.0.29 by @steven-johnson in https://github.com/halide/Halide/pull/6748
Update hannk README link to hosted models page by @steven-johnson in https://github.com/halide/Halide/pull/6749
Add a HalideError base class to Python bindings by @steven-johnson in https://github.com/halide/Halide/pull/6750
Add GeneratorFactoryProvider to generate_filter_main() by @steven-johnson in https://github.com/halide/Halide/pull/6755
Minor metadata-related cleanups by @steven-johnson in https://github.com/halide/Halide/pull/6759
Expand the x86 SIMD variants tested in correctness_vector_reductions by @steven-johnson in https://github.com/halide/Halide/pull/6762
Fix Param<T>::set_estimate for T=void by @steven-johnson in https://github.com/halide/Halide/pull/6766
add_python_aot_extension should use FUNCTION_NAME for the .so output … by @steven-johnson in https://github.com/halide/Halide/pull/6767
Fix fundamental confusion about target/tune CPU by @LebedevRI in https://github.com/halide/Halide/pull/6765
Fix annoying typo in Func.h by @steven-johnson in https://github.com/halide/Halide/pull/6774
Add execute_generator() API by @steven-johnson in https://github.com/halide/Halide/pull/6771
Allow overriding of Generator::init_from_context() for debug purposes by @steven-johnson in https://github.com/halide/Halide/pull/6760
Convert some assert-only usage of output_types() -> types() by @steven-johnson in https://github.com/halide/Halide/pull/6779
[miscompile] Don't de-negate and change direction of shifts-by-unsigned by @LebedevRI in https://github.com/halide/Halide/pull/6782
Move some options from execute_generator back to generate_filter_main by @steven-johnson in https://github.com/halide/Halide/pull/6787
LLVM codegen: register AA pipeline if LLVM is older than 14 by @LebedevRI in https://github.com/halide/Halide/pull/6785
Update the list of fused_pairs and run validate_fused_group for specalization definitions too by @vksnk in https://github.com/halide/Halide/pull/6770
halide_type_of<>() should always be constexpr by @steven-johnson in https://github.com/halide/Halide/pull/6790
Define an AbstractGenerator interface by @steven-johnson in https://github.com/halide/Halide/pull/6637
hexagon_scatter test should run only if target has HVX by @steven-johnson in https://github.com/halide/Halide/pull/6793
slow tests should support sharding by @steven-johnson in https://github.com/halide/Halide/pull/6780
Add missing include to test_sharding.h by @steven-johnson in https://github.com/halide/Halide/pull/6795
Pacify clang-tidy by @steven-johnson in https://github.com/halide/Halide/pull/6796
Silence a "possibly uninitialized" warning by @steven-johnson in https://github.com/halide/Halide/pull/6797
Make all tests default to -fvisibility=hidden by @steven-johnson in https://github.com/halide/Halide/pull/6799
Minor typedef cleanup by @steven-johnson in https://github.com/halide/Halide/pull/6800
Fix auto_schedule/machine_params parsing by @steven-johnson in https://github.com/halide/Halide/pull/6804
Rewrite strided loads of 4 in AlignLoads by @vksnk in https://github.com/halide/Halide/pull/6806
Fix two minor bugs triggered by an or reduction with early-out by @abadams in https://github.com/halide/Halide/pull/6807
[CMake] Mark multi-threaded tests as such by @LebedevRI in https://github.com/halide/Halide/pull/6810
Rework .gitignore by @alexreinking in https://github.com/halide/Halide/pull/6822
Update presets to format version 3 by @alexreinking in https://github.com/halide/Halide/pull/6824
Fix for top-of-tree LLVM by @steven-johnson in https://github.com/halide/Halide/pull/6825
Tweak python apps for better Blaze/Bazel compatibility by @steven-johnson in https://github.com/halide/Halide/pull/6823
Apply CMAKE_C_COMPILER_LAUNCHER to initmod clang calls by @alexreinking in https://github.com/halide/Halide/pull/6831
Scrub Python from Makefile after buildbot update by @alexreinking in https://github.com/halide/Halide/pull/6833
Remove unused function in callable_generator.cpp by @steven-johnson in https://github.com/halide/Halide/pull/6834
Disable testing for apps/linear_algebra on x86-32-linux/Make by @steven-johnson in https://github.com/halide/Halide/pull/6836
Rearrange subdirectories in python_bindings by @steven-johnson in https://github.com/halide/Halide/pull/6835
Better lowering of halving_sub and rounding_halving_add by @abadams in https://github.com/halide/Halide/pull/6827
Check RDom::where predicates for race conditions by @alexreinking in https://github.com/halide/Halide/pull/6842
Remove Generator::value_tracker and friends by @steven-johnson in https://github.com/halide/Halide/pull/6845
Add placeholder code for bfloat16 in Python (#6849) by @steven-johnson in https://github.com/halide/Halide/pull/6850
Fix the PLUGINS argument to properly join multiple arguments by @steven-johnson in https://github.com/halide/Halide/pull/6851
Add autoscheduling to the generator_aot_stubuser test by @steven-johnson in https://github.com/halide/Halide/pull/6855
Silence Adams2019 Autoscheduler by @steven-johnson in https://github.com/halide/Halide/pull/6854
[vulkan phase0] Add adts for containers and memory allocation to runtime by @derek-gerstmann in https://github.com/halide/Halide/pull/6829
Promote Reinterpret Intrinsic into an Reinterpret IR Node by @LebedevRI in https://github.com/halide/Halide/pull/6853
Python source reorg by @alexreinking in https://github.com/halide/Halide/pull/6867
Fix simd_op_check for top-of-tree LLVM by @steven-johnson in https://github.com/halide/Halide/pull/6874
Use pmaddubsw 8-bit horizontal widening adds (Fixes #6859) by @rootjalex in https://github.com/halide/Halide/pull/6873
[Codegen_LLVM] Radically simplify visit(const Reinterpret *op) by @LebedevRI in https://github.com/halide/Halide/pull/6865
[Codegen] Fail to codegen Call::undef, just like Call::signed_integer_overflow by @LebedevRI in https://github.com/halide/Halide/pull/6871
Fix error in Makefile for Adams2019 on OSX by @steven-johnson in https://github.com/halide/Halide/pull/6877
Refactor/cleanup in Autoscheduler code by @steven-johnson in https://github.com/halide/Halide/pull/6858
Ensure $CMAKE_{lang}_OUTPUT_EXTENSION is set before using it by @shoaibkamil in https://github.com/halide/Halide/pull/6879
#6863 - Fixes to make address sanitizer happy for internal runtime classes by @derek-gerstmann in https://github.com/halide/Halide/pull/6880
[Codegen_LLVM] Define all the things by @LebedevRI in https://github.com/halide/Halide/pull/6866
Add set-host-dirty/copy-to-host to PythonExtensionGen by @steven-johnson in https://github.com/halide/Halide/pull/6869
Rewrite PythonExtensionGen to be C++ based by @steven-johnson in https://github.com/halide/Halide/pull/6888
Fixes to allow compiling with LLVM16 by @steven-johnson in https://github.com/halide/Halide/pull/6889
Add support for generating x86 sum-of-absolute-difference reductions by @abadams in https://github.com/halide/Halide/pull/6872
Remove (most) of the env var usage from Adams2019 by @steven-johnson in https://github.com/halide/Halide/pull/6861
[vulkan phase1] Add SPIR-V IR by @derek-gerstmann in https://github.com/halide/Halide/pull/6882
Add auto_schedule label to Adams2019 and Li2018 tests in CMake by @steven-johnson in https://github.com/halide/Halide/pull/6898
[Simplify] Drop no-op single-input identity shuffles by @LebedevRI in https://github.com/halide/Halide/pull/6901
[Codegen_LLVM] Annotate LLVM IR functions with nounwind/mustprogress attributes by @LebedevRI in https://github.com/halide/Halide/pull/6897
Don't try to fold saturating_sub of VectorReduce by @rootjalex in https://github.com/halide/Halide/pull/6896
Upgrade clang-format and clang-tidy to v14 (v2) by @steven-johnson in https://github.com/halide/Halide/pull/6902
Allow AMX instructions with K dimension larger than 4 bytes by @frengels in https://github.com/halide/Halide/pull/6582
Fix autoscheduling trivial lut wrappers by @abadams in https://github.com/halide/Halide/pull/6905
Fix broken Makefile rules for autoschedulers on OSX by @steven-johnson in https://github.com/halide/Halide/pull/6906
LICENSE.txt: Include full text of Apache 2.0 license (not just the 'header' version) by @steven-johnson in https://github.com/halide/Halide/pull/6912
LICENSE.txt: add spirv license by @steven-johnson in https://github.com/halide/Halide/pull/6913
LICENSE.txt: add BLAS license. by @steven-johnson in https://github.com/halide/Halide/pull/6914
Upgrade CMake minimum version to 3.22 by @steven-johnson in https://github.com/halide/Halide/pull/6916
Remove unused GHA and packaging workflows. by @alexreinking in https://github.com/halide/Halide/pull/6917
Fix two warnings found with clang 16 by @steven-johnson in https://github.com/halide/Halide/pull/6918
Fix bug when realize condition depends on tuple call by @abadams in https://github.com/halide/Halide/pull/6915
Fix wrong install path for *.py files by @steven-johnson in https://github.com/halide/Halide/pull/6921
Make use of CMake 3.22 features by @alexreinking in https://github.com/halide/Halide/pull/6919
Make saturating_cast an intrinsic by @rootjalex in https://github.com/halide/Halide/pull/6900
Halide::Error should not extend std::runtime_error by @steven-johnson in https://github.com/halide/Halide/pull/6927
Rework internal PYTHONPATH maintenance by @steven-johnson in https://github.com/halide/Halide/pull/6922
Tutorial 10 needs to be skipped for Python when targeting Wasm (just as non-Python does) by @steven-johnson in https://github.com/halide/Halide/pull/6932
Add build & test presets for release and debug CMake builds by @steven-johnson in https://github.com/halide/Halide/pull/6934
Add ASAN support to CMake via toolchain file by @steven-johnson in https://github.com/halide/Halide/pull/6920
Fix badly-merged CMakePresets.json file by @steven-johnson in https://github.com/halide/Halide/pull/6936
Add minimal useful implementation of extracting and concatenating bits by @abadams in https://github.com/halide/Halide/pull/6928
Export HalidePythonExtensionHelpers.cmake for installs by @steven-johnson in https://github.com/halide/Halide/pull/6941
Add/update Python Readme by @steven-johnson in https://github.com/halide/Halide/pull/6939
Don't throw an exception from generate_filter_main by @steven-johnson in https://github.com/halide/Halide/pull/6946
Handle saturating_cast in compute_expr_cost() by @rootjalex in https://github.com/halide/Halide/pull/6947
Two quick build fixes by @alexreinking in https://github.com/halide/Halide/pull/6950
Remove add_python_aot_extension() rule in CMake by @steven-johnson in https://github.com/halide/Halide/pull/6949
Build fixes for manylinux2014 by @alexreinking in https://github.com/halide/Halide/pull/6953
Remove add_python_stub_extension(), adding the functionality to add_halide_generator() instead by @steven-johnson in https://github.com/halide/Halide/pull/6952
[HVX] Fix state_var issue by @rootjalex in https://github.com/halide/Halide/pull/6894
Fix RPATH for Python wheels on macOS by @alexreinking in https://github.com/halide/Halide/pull/6958
Python: don't crash for repr(Expr()) by @steven-johnson in https://github.com/halide/Halide/pull/6962
Some minor top-level CMakeLists.txt reorganization by @alexreinking in https://github.com/halide/Halide/pull/6957
CMake packaging fixes by @alexreinking in https://github.com/halide/Halide/pull/6966
Use CMake target to handle vendored SPIRV headers by @alexreinking in https://github.com/halide/Halide/pull/6968
Don't cache Halide_ASAN_ENABLED by @alexreinking in https://github.com/halide/Halide/pull/6969
Lower saturating_cast in bounds inference by @rootjalex in https://github.com/halide/Halide/pull/6970
Small refactor to remove confusion between CodeGen_LLVM and CodeGen_Internal. by @zvookin in https://github.com/halide/Halide/pull/6973
Fix XCode by wrapping weights in an OBJECT library by @alexreinking in https://github.com/halide/Halide/pull/6977
Add test for _Halide_target_export_single_symbol by @steven-johnson in https://github.com/halide/Halide/pull/6983
Fix markdown links by @alexreinking in https://github.com/halide/Halide/pull/6988
Improve error-handling in Python Extensions by @steven-johnson in https://github.com/halide/Halide/pull/6986
Refactor buffer-unpacking code in PythonExtensionGen by @steven-johnson in https://github.com/halide/Halide/pull/6991
Fixes for Xcode "new" build system. by @alexreinking in https://github.com/halide/Halide/pull/6993
Fix compiler warnings in Elf.cpp by @steven-johnson in https://github.com/halide/Halide/pull/6992
[Codegen] Adapt ModuleAddressSanitizerPass/ModuleSanitizerCoveragePass renaming by @MaskRay in https://github.com/halide/Halide/pull/6996
Apply _Halide_place_dll() to _Halide_gengen (#6999) by @steven-johnson in https://github.com/halide/Halide/pull/7000
Log target info in performance_fast_pow (#6997) by @steven-johnson in https://github.com/halide/Halide/pull/6998
Clean up Adams2019 CMake file by @steven-johnson in https://github.com/halide/Halide/pull/7003
Prohibit C99 VLA usage in runtime code by @steven-johnson in https://github.com/halide/Halide/pull/7005
Couple small fixes to update RISC V to current LLVM flags and enable vscale use. by @zvookin in https://github.com/halide/Halide/pull/6995
Fix Python handling of boolean buffers by @steven-johnson in https://github.com/halide/Halide/pull/7006
Fix some bugs in div_round_to_zero by @abadams in https://github.com/halide/Halide/pull/7008
[HVX] Simplify constant factor before distributing by @rootjalex in https://github.com/halide/Halide/pull/7009
Add one-sided widening intrinsics. by @rootjalex in https://github.com/halide/Halide/pull/6967
Rework Python Extension C++ code (again) by @steven-johnson in https://github.com/halide/Halide/pull/7010
Add minimum GitHub token permissions for workflow by @varunsh-coder in https://github.com/halide/Halide/pull/7011
Revert "[HVX] Simplify constant factor before distributing" by @steven-johnson in https://github.com/halide/Halide/pull/7013
Fix SpecificExpr canonicalization by @rootjalex in https://github.com/halide/Halide/pull/7016
Appease Python linter by @steven-johnson in https://github.com/halide/Halide/pull/7022
Don't use -g for EMCC by @steven-johnson in https://github.com/halide/Halide/pull/7025
Temporarily disable testing for apps/fft (#7033) by @steven-johnson in https://github.com/halide/Halide/pull/7035
Add reinterpret simplifications by @rootjalex in https://github.com/halide/Halide/pull/7029
Codegen_C for user_context by @steven-johnson in https://github.com/halide/Halide/pull/7031
Fix Wasm BulkMemory Codgen + Minor fixes to apps/HelloWasm by @steven-johnson in https://github.com/halide/Halide/pull/7026
Add stack-size-canary test to apps/fft's CMake file by @steven-johnson in https://github.com/halide/Halide/pull/7034
Handle widen_right_* intrinsics in bounds inference by @vksnk in https://github.com/halide/Halide/pull/7039
Revert "Temporarily disable testing for apps/fft (#7033)" by @steven-johnson in https://github.com/halide/Halide/pull/7040
Fix PyExt error handling by @steven-johnson in https://github.com/halide/Halide/pull/7042
add_requirement() maintenance by @steven-johnson in https://github.com/halide/Halide/pull/7045
Fix false positive use after free warning. by @zvookin in https://github.com/halide/Halide/pull/7046
Allow call_intrin to call an LLVM intrinsic with void return type. by @zvookin in https://github.com/halide/Halide/pull/7048
Allow CodeGen_LLVM::codegen_buffer_pointer to support vectors. by @zvookin in https://github.com/halide/Halide/pull/7049
Don't mutate GeneratorParams in PythonGenerators by @steven-johnson in https://github.com/halide/Halide/pull/7052
Allow redefinition of Generators when in interactive mode by @steven-johnson in https://github.com/halide/Halide/pull/7053
Upgrade wabt to 1.0.30 by @steven-johnson in https://github.com/halide/Halide/pull/7058
Add support for float16 buffer in python extension by @stevesuzuki-arm in https://github.com/halide/Halide/pull/7060
Add a terminate_handler to try to report unhandled exceptions by @steven-johnson in https://github.com/halide/Halide/pull/7038
Improve MSAN under JIT by @steven-johnson in https://github.com/halide/Halide/pull/7059
Autoscheduler test reorg, part 1 by @steven-johnson in https://github.com/halide/Halide/pull/7064
Autoscheduler test reorg, part 2 by @steven-johnson in https://github.com/halide/Halide/pull/7065
Autoscheduler test reorg, part 3 by @steven-johnson in https://github.com/halide/Halide/pull/7067
pacify clang-tidy by removing unused "using" by @steven-johnson in https://github.com/halide/Halide/pull/7071
Add pip packaging workflow to GHA by @alexreinking in https://github.com/halide/Halide/pull/6938
[HVX] Fix DistributeShiftsAsMuls by @rootjalex in https://github.com/halide/Halide/pull/7083
Support added for dot() instructions in Metal backend

Changes to public API since last release

Add add_halide_python_extension_library() rule by @steven-johnson in https://github.com/halide/Halide/pull/6979
Add add_halide_runtime rule by @steven-johnson in https://github.com/halide/Halide/pull/6985
Remove deprecated Halide::Output type by @steven-johnson in https://github.com/halide/Halide/pull/6685
Remove deprecated build() support from Generators by @steven-johnson in https://github.com/halide/Halide/pull/6684
Remove deprecated versions of Func::prefetch() by @steven-johnson in https://github.com/halide/Halide/pull/6698
Remove deprecated JIT handler setters by @steven-johnson in https://github.com/halide/Halide/pull/6699
Drop support for Matlab extensions by @steven-johnson in https://github.com/halide/Halide/pull/6696
Revise PyStub calling convention for GeneratorParams by @steven-johnson in https://github.com/halide/Halide/pull/6742
Change stub module names in Python to be _pystub rather than _stub by @steven-johnson in https://github.com/halide/Halide/pull/6830

New Deprecations (Upcoming API changes)

Deprecate variadic-template version of Realization ctor by @steven-johnson in https://github.com/halide/Halide/pull/6695
Deprecate GeneratorContext getters with get_ prefix by @steven-johnson in https://github.com/halide/Halide/pull/6753
Deprecate disable_llvm_loop_opt (#4113) by @steven-johnson in https://github.com/halide/Halide/pull/6754
Add Func::type()/types(), deprecate Func::output_type()/output_types() by @steven-johnson in https://github.com/halide/Halide/pull/6772
Deprecate/remove Generator::get_externs_map() and friends by @steven-johnson in https://github.com/halide/Halide/pull/6844
Rework autoscheduler API (#6788) by @steven-johnson in https://github.com/halide/Halide/pull/6838

Other Notes

Although there are commits relating to a Vulkan backend, this release of Halide doesn't provide Vulkan support (it's still a work in progress)
It's possible that the changes in https://github.com/halide/Halide/pull/6754 can cause performance degradation (but usually only for poorly-schedule Halide code).

New Contributors

@frengels made their first contribution in https://github.com/halide/Halide/pull/6582
@varunsh-coder made their first contribution in https://github.com/halide/Halide/pull/7011

v14.0.0

2 years ago

What's Changed

Major changes

@abadams
- Add ability to pass a user context in JIT mode (#6313)
- Reenable warning about unscheduled update definitions (#6602)
@alexreinking
- Add helper for cross-compiling Halide generators. (#6366)
@LebedevRI
- Implement SanitizerCoverage support (Refs. #6513) (#6517)
@steven-johnson
- Expand optional static-typing for Buffer to include dimensionality (#6574)
- Deprecate the Generator::build() method (#6580)
- Move GeneratorContext into a standalone class (#6618)
- Python Bindings didn't allow for zero-D Funcs, ImageParams, Buffers (#6633)
@zvookin
- Timer based profiler (#6642)

Minor changes

@abadams
- Deprecate JIT runtime override methods that take void * (#6344)
- Allow users to use their own cuda contexts and streams in JIT mode (#6345)
- Add --help flag to rungenmain, fixing #5323 (#6354)
- Do target-specific lowering of lerp (#6432)
- Reduce overhead of sampling profiler by having only one thread do it (#6433)
- Skip custom cuda context test on older GPUs (#6437)
- Avoid needless gather in fast_integer_divide lowering (#6441)
- Fixes for c++20 (#6446)
- Add a fast integer divide that rounds to zero (#6455)
- Let lerp lowering incorporate a final cast. (#6480)
- Try removing optional buffer added to closure (#6481)
- rounding shift rights should use rounding halving add (#6494)
- Make random faster by putting the innermost var last (#6504)
- Make it possible to interpret a wide type as multiple smaller elements (#6506)
- Handle mixed-width args to mul-shift-right (#6526)
- Attempted redo of faster noise (#6539)
- Better default lowering of absd (#6545)
- Make HALIDE_REGISTER_GENERATOR work with multiple template args (#6556)
- Rename Output to OutputFileType and deprecated Output (#6568)
- Remove incorrect not-multiple-of-16 claim (#6573)
- Fix bug in mul_shift_right matching (#6610)
@alexreinking
- Add super-build for cross-compiling HANNK (#6374)
- Fix empty INSTALL_COMMAND in hannk super-build (#6387)
- Remove halide_config.cmake from Makefile build. Fixes #6615 (#6616)
- Make IRComparer consider nans to be less than non-nans. (#6626)
@ashishUthama
- Include LICENSE.txt in package (#6428)
@dsharletg
- Fix description of rounding_shift_left/rounding_shift_right (#6549)
@Elarnon
- Only commutative reductions can be parallelized (#6609)
@jinderek
- Support new warp shuffle intrinsics after CUDA Volta architecture (#6505)
@knzivid
- python_bindings: Fix SIGSEGV in HalidePythonCompileTimeErrorReporter (#6635)
@LebedevRI
- [CMake] Deduplicate Halide_LLVM_VERSION and LLVM_PACKAGE_VERSION (#6646)
@masahi
- [APP] Fix hexagon_benchmarks build (use two-var prefetch) (#6563)
@mcleary
- Add support for AMX instructions (#5818)
@mcourteaux
- Include GPU source kernels in Stmt and StmtHtml file. (#6444)
- Syntax highlighting for embedded PTX code. (#6447)
@mgharbi
- Fixes the Pytorch Wrapper Codegen for CPU-only machines. (#6590)
@OmarEmaraDev
- Fix default device wrap native function (#6310)
- Fix wrong type in Ramp CodeGen for OpenGLCompute (#6349)
- Vectorize Ramp in OpenGLCompute backend (#6372)
- Support vectorization in OpenGLCompute backend (#6348)
- Support vectorized Select in OpenGLCompute backend (#6371)
@rootjalex
- Make bounds of let visitor use unique_name() (#6583)
- Remove incorrect docs on widening_add (#6625)
- Disallow Type::narrow() and Type::widen() from producing bitwidths between 1 and 8 bits (#6622)
- Wild match object should not be foldable (#6623)
- Clear bounds info on casts when value bounds are undefined for overflow types (#6640)
@slomp
- decommissioning StackPrinter (#6470)
@steven-johnson
- [hannk] Fix MeanOp (#6336)
- Add using OpVisitor::visit; to various OpVisitors to avoid overload warnings for some compilers (#6337)
- [hannk] Add a prepare() method for ops and interp (#6338)
- Fix WASM datalayout for top-of-tree LLVM (#6339)
- Make halide_type_t and halide_type_of constexpr (#6340)
- Harvest IWYU changes for LLVM, WABT (#6341)
- Fix HelloWasm (#6342)
- Fix Makefile for LLVM11 (injection from #5818) (#6343)
- [hannk] requantize() should never skip the operation (#6350)
- [hannk] augment SoftmaxOp to allow specifying axis (#6351)
- Use Node instead of d8 for Wasm AOT testing (#6356)
- [hannk] Add missing call to Interpreter::prepare in benchmark app (#6358)
- [hannk] Allow disabling TFLite+Delegate build in CMake (#6360)
- [hannk] Add support for building/running for wasm (#6361)
- Update Emscripten settings (#6362)
- [hannk] Clean up aliasing (v2) (#6364)
- [hannk] tests should only process .tflite files (#6368)
- Revamp Hannk IR (#6379)
- Fix for top-of-tree LLVM (#6380)
- Remove halide_assert() from halide_default_device_wrap_native (#6381)
- Rename halide_assert -> halide_abort_if_false (#6382)
- Convert various halide_assert -> static_assert (#6383)
- Fix for top-of-tree LLVM (#6386)
- Check results of all runtime function calls (#6389)
- Add halide_debug_assert() macro (#6390)
- [hannk] Have CMake emit .s, .stmt, .ll files (#6392)
- [hannk] Upgrade hannk to use TFLite 2.7.0 by default (#6393)
- Clean up CodeGen_LLVM names to match ASAN nomenclature changes (#6395)
- Drop support for LLVM11 (#6396)
- Move PyTorch test into standalone tests (#6397)
- Remove halide_abort_if_false() usage in runtime/metal (#6398)
- Fix OGLC debug builds (#6399)
- Add defensive checks to halide_buffer_copy_already_locked (#6401)
- _halide_buffer_crop() needs to check for runtime failures (v2) (#6403)
- Fix broken ASAN code (#6408)
- [hannk] Pacify clang-tidy (#6412)
- One more ASAN fix (#6413)
- [hannk] Fix lower_tflite_fullyconnected (#6414)
- Fix Introspection issues (#6424)
- Don't remap the function name or the target in the metadata (#6430)
- Set up SANITIZER_FLAGS and OPTIMIZE for apps/Makefile.inc (#6435)
- Ensure that halide_start_clock() is called before halide_current_time… (#6438)
- Codegen_C: buffer compilation needs to special-case scalar buffers (#6442)
- Add operator<< for Closure (#6443)
- Re-enable performance_async_gpu for D3D12Compute (#6450)
- Tweak Hexagon codegen output (#6461)
- Add LinkageType::ExternalPlusArgv (#6452) (#6463)
- Fix Closure API (#6464)
- Move null check from Printer to halide_string_to_string() (#6467)
- Deal with Printer::scratch (#6469) (#6472)
- Restore support for using V8 as the Wasm JIT interpreter (#6478)
- Fail if no_bounds_query specified for HL_JIT_TARGET (#6489)
- Document the usage of llvm::legacy::PassManager (#6491)
- Update WABT to 1.0.25 (#6497)
- Grab Bag of minor cleanups to LowerParallelTasks (#6498)
- Update simd_op_check for arm64 upz1 code generation (#6499) (#6500)
- Fix size_t -> int conversion warning (#6501)
- Fix simd-op-check for top-of-tree LLVM (#6529)
- Revert "Make random faster by putting the innermost var last" (#6538)
- Fix GeneratorOutput_Buffer::set_estimates() (#6540)
- Revert "Make it possible to interpret a wide type as multiple smaller elements" (#6541)
- Convert apps/hannk/Elementwise to use generate() (#6543)
- Fixes for top-of-tree LLVM (#6546) (#6548)
- Fix deprecation warnings in Python tutorials (#6552)
- Use add_halide_generator() everywhere in apps/ (#6554)
- Fix for top-of-tree LLVM (#6561)
- Enable simd_op_check test for wasm i8x16.popcnt (#6562)
- Revert "Fix for top-of-tree LLVM" (#6564)
- wasm simd cleanup (#6566)
- Add support for wasm-simd ops for integer-integer widening (#6567)
- Add explicit to a handful of Generator-related ctors. (#6569)
- Fix typo in comment in HalideBuffer.h (#6570)
- Allow calling scheduling methods on Output<Buffer[]> (#6577)
- Fix for top-of-tree LLVM (#6579)
- Fix Win32-specific breakage in top-of-tree LLVM (#6581)
- Convert apps/ to use static Buffer dims where useful (#6585)
- Various fixes to static-dimensioned Buffer (#6589)
- Convert Buffer<> usage in python_bindings/ to use static dimensions (#6591)
- Convert Buffer<> usage in test/generators to use static dimensions (#6592)
- Rename BufferDimsUnconstrained -> AnyDims (#6594)
- Allow building with LLVM15 (#6603)
- Update WasmExecutor for WABT API changes (#6612)
- Minor Generator cleanup (#6613)
- Unbreak WABT again by using main instead of a commit (#6614)
- Update apps/hannk to use TFLite 2.8.0 (#6617)
- Update WABT version to the just-released 1.027 (instead of main) (#6619)
- Clean up python_binding Makefile (#6634)
- Fix const-correctness in C/C++ backend (Issue #6636) (#6638)
- Convert most remaining Generators to prefer statically-dimensioned In… (#6641)
- Allow profiler feature under wasm iff wasm_threads is enabled (#6643)
- Fix UB in hannk FillWithRandom operation. (#6645)
- Update initialization of WABT store field to work with top-of-tree (#6649)
- Fix apparent typo in PR #6294 (#6653)
- Eliminate some unnecessary clamping in ClampUnsafeAccesses (#6297) (#6654)
- Python Bindings: fix Python bool -> Expr implicit conversion (#6657)
- Fix 'variable set but not used` warning/error (#6658)
- Allow make test_apps to work with ASAN (#6659)
- Add optional runtime H::R::Buffer access checks (#6660)
- Add ldscript code for Python extensions in CMake (#6665)
- Remove the nobuild/partialbuildmethod tests from python_bindings/ (#6668)
@TH3CHARLie
- Add support for CUDA capability 8.6 (#6334)
- Fix cuda-debug logging (#6346)
@vksnk
- Scheduling directive to set an explicit storage bound (#6327)
- Add include for size_t in constants.h (#6353)
- Add missing widening_absd patterns (#6359)
- Change implementation of round_f* in CodeGen_C to use nearbyint() to match CodeGen_LLVM (#6406)
- Rewrite integer lerp using intrinsics (#6426)
- Avoid double narrowing in widening_add/widening_sub if type is 8-bit (#6629)
@zvookin
- Move parallel/async lowering from LLVM codegen to a standard Halide IR lowering pass. (#6195)
- Fixes to support LLVM with opaque pointers. (#6608)

New Contributors

@TH3CHARLie made their first contribution in (#6334)
@OmarEmaraDev made their first contribution in (#6310)
@mcourteaux made their first contribution in (#6444)
@jinderek made their first contribution in (#6505)
@masahi made their first contribution in (#6563)
@knzivid made their first contribution in (#6635)

v13.0.4

2 years ago

This is a patch release that fixes a single bug relating to multiple outputs that depend on each other (#6375).

v13.0.3

2 years ago

This is a patch release with some added build system capabilities and a handful of backported stability improvements. Please see the PR list below for more details.

What's changed

Build system
- The Mullapudi 2016 autoscheduler no longer assert-rejects unsupported targets. #6520
- Fixed invalid headers in the linear algebra app on RISC-V. #6503
- Fixed CMake export bug when custom-built LLVM has multiple include directories. #6519
- Python artifacts will be installed when built, in the Halide_Python CPack component. Targets are not (yet) exported. #6530 #6523
- Added SOVERSION override for libHalide to support advanced package maintenance workflows. #6534
Stability improvements
- Fixed a missing clamp on inputs which might be read out of bounds via undefined overcompute values. #6352 #6508
- Fixed an internal use-after-free bug. #6527
- Fixed a free-order bug in Halide::Runtime that affected CUDA targets. #6511
- Python bindings now correctly acquire/release the GIL. #6525 #6537
Other changes
- Functions are tagged with the LLVM MSAN attribute when MSAN feature is enabled. #6516
- CMake documentation has been updated. #6535

Full Changelog: https://github.com/halide/Halide/compare/v13.0.2...v13.0.3

v13.0.2

2 years ago

This is a patch release to support official Debian packaging. No changes have been made to the compiler library or runtime.

Apps

Linear algebra app now correctly checks for the availability of SSE/AVX headers. #6471

v13.0.1

2 years ago

This is a hotfix for v13.0.0.

Bugs fixed:

Fix obscure bug in widening let substitution. #6405
x86_cpuid_halide must preserve all 64 bits of rbx/rsi. #6409