a language for fast, portable data-parallel computation
Full Changelog: https://github.com/halide/Halide/compare/v17.0.0...v17.0.1
ParamMap
has been removed entirely from the public API. All users of ParamMap
should migrate to Callable
instead.Halide::Parameter
has been moved to the public Halide API (it was formerly "internal" and not intended for public use).Func::partition()
and friends: Set the loop partition policy, which controls how/whether a loop is split into three loops (prologue/steady-state/epilogue). Loop partitioning can be useful to optimize boundary conditions (e.g. clamp_edge).Func::hoist_storage()
and friends: allows a functions's storage to be moved to a given loop level. Unlike Func::store_at()
, no optimizations are triggered (e.g. sliding window).TailStrategy
options for for existing scheduling directives:
ShiftInwardsAndBlend
: Equivalent to ShiftInwards, but protects values that would be re-evaluated by loading the memory location that would be stored to, modifying only the elements not contained within the overlap, and then storing the blended result. Unlike ShiftInwards, this is valid to use in update definitions.RoundUpAndBlend
: Equivalent to RoundUp, but protects values that would be written beyond the end by loading the memory location that would be stored to, modifying only the elements within the region being computed, and then storing the blended result. Unlike RoundUp, this is valid to use on non-outermost splits in update definitions.copy_to_host
and copy_to_device
so you can measure host<->device copy overheadHL_PROFILER_SORT
env varAnderson2021
autoscheduler..async()
scheduling directive.Target
now does some reality-checking that it doesn't contain obviously nonsensical Feature
combinationscast<i32>(u32)
overflow behavior by @rootjalex in https://github.com/halide/Halide/pull/7769
auto-schedule
label in CMake by @steven-johnson in https://github.com/halide/Halide/pull/7818
llvm::Type::getInt8PtrTy
usage. by @hokein in https://github.com/halide/Halide/pull/7937
Full Changelog: https://github.com/halide/Halide/compare/v16.0.0...v17.0.0
Efficient Automatic Scheduling of Imaging and Vision Pipelines for the GPU Luke Anderson, Andrew Adams, Karima Ma, Tzu-Mao Li, Tian Jin, Jonathan Ragan-Kelley Proceedings of the ACM on Programming Languages (OOPSLA 2021)
OpenGLCompute
has been deprecatedParamMap
has been deprecatedHVX_shared_object
feature has been removedhalide_target_feature_disable_llvm_loop_opt
has been removedMIPS
device support has been removedbfloat
support to halide_type_to_string()
by @steven-johnson in https://github.com/halide/Halide/pull/7154
HVX_shared_object
feature by @steven-johnson in https://github.com/halide/Halide/pull/7331
Full Changelog: https://github.com/halide/Halide/compare/v15.0.1...v16.0.0
compile_to_callable()
was not properly copying from device to host for output buffers, so output was typically black (or garbage) when used with a GPU target. (#7213)bin
directory was missing from the installs.Support for RISC V Vector architectures.
Python-related:
pip install halide
Halide::Func now allows you to (optionally) constrain the type(s) of Exprs that the Func can contain, and/or the dimensionality of the Func.
Added a new way to use the JIT (compile_to_callable
) that allows calling a jitted function with the same syntax as for AOT-compiled functions, allowing more control over JIT lifespan, as well as thread-safe arguments without requiring ParamMap
General improvements to SIMD codegen
Several rarely-used parts of the C++ Generator API were deprecated, and the way that autoschedulers are specified for AOT compilation is now completely different (but better for future expandability).
CMake builds now require >= v3.22
WABT usage requires >= v1.0.30
LLVM 12 is no longer supported
The target flag disable_llvm_loop_opt is deprecated, as it's now the default behavior. This means that we have turned off llvm's autovectorization and loop unrolling. This should not affect any schedules with manually-specified vectorization and unrolling, other than trimming code size a little. However, schedules that do not vectorize or unroll may slow down because they were (intentionally or not) relying on llvm to do it automatically. If you see a performance regression with Halide 15, try turning on the enable_llvm_loop_opt target flag.
bool
buffers (https://github.com/halide/Halide/pull/7006)float16
buffers (https://github.com/halide/Halide/pull/7060)div_round_to_zero
and fast_integer_divide_round_to_zero
(https://github.com/halide/Halide/pull/7008)add_requirement()
(https://github.com/halide/Halide/pull/7045)-mtune=
/-mcpu=
support for x86 AMD CPU's by @LebedevRI in https://github.com/halide/Halide/pull/6655
to
tune processor` by @LebedevRI in https://github.com/halide/Halide/pull/6673
-mtune=native
CPU autodetection for AMD Zen 3 CPU by @LebedevRI in https://github.com/halide/Halide/pull/6648
const
types by @steven-johnson in https://github.com/halide/Halide/pull/6679
break
to avoid 'possible unintentional fallthru' warning by @steven-johnson in https://github.com/halide/Halide/pull/6694
get_amd_processor()
: implement detection for the rest of supported AMD CPU's by @LebedevRI in https://github.com/halide/Halide/pull/6711
rounding_halving_sub
and non-existent arm rhsub instructions by @rootjalex in https://github.com/halide/Halide/pull/6723
widening_mul(int16x, int16x) -> int32x
for x86 (AVX2 and SSE2) by @rootjalex in https://github.com/halide/Halide/pull/6677
HalideError
base class to Python bindings by @steven-johnson in https://github.com/halide/Halide/pull/6750
Generator::init_from_context()
for debug purposes by @steven-johnson in https://github.com/halide/Halide/pull/6760
-fvisibility=hidden
by @steven-johnson in https://github.com/halide/Halide/pull/6799
visit(const Reinterpret *op)
by @LebedevRI in https://github.com/halide/Halide/pull/6865
Call::undef
, just like Call::signed_integer_overflow
by @LebedevRI in https://github.com/halide/Halide/pull/6871
auto_schedule
label to Adams2019 and Li2018 tests in CMake by @steven-johnson in https://github.com/halide/Halide/pull/6898
nounwind
/mustprogress
attributes by @LebedevRI in https://github.com/halide/Halide/pull/6897
-g
for EMCC by @steven-johnson in https://github.com/halide/Halide/pull/7025
add_halide_python_extension_library()
rule by @steven-johnson in https://github.com/halide/Halide/pull/6979
add_halide_runtime
rule by @steven-johnson in https://github.com/halide/Halide/pull/6985
Halide::Output
type by @steven-johnson in https://github.com/halide/Halide/pull/6685
build()
support from Generators by @steven-johnson in https://github.com/halide/Halide/pull/6684
Func::prefetch()
by @steven-johnson in https://github.com/halide/Halide/pull/6698
_pystub
rather than _stub
by @steven-johnson in https://github.com/halide/Halide/pull/6830
get_
prefix by @steven-johnson in https://github.com/halide/Halide/pull/6753
Halide_LLVM_VERSION
and LLVM_PACKAGE_VERSION
(#6646)hexagon_benchmarks
build (use two-var prefetch) (#6563)Type::narrow()
and Type::widen()
from producing bitwidths between 1 and 8 bits (#6622)using OpVisitor::visit;
to various OpVisitors to avoid overload warnings for some compilers (#6337)explicit
to a handful of Generator-related ctors. (#6569)store
field to work with top-of-tree (#6649)bool
-> Expr
implicit conversion (#6657)make test_apps
to work with ASAN (#6659)This is a patch release that fixes a single bug relating to multiple outputs that depend on each other (#6375).
This is a patch release with some added build system capabilities and a handful of backported stability improvements. Please see the PR list below for more details.
Halide_Python
CPack component. Targets are not (yet) exported. #6530 #6523Halide::Runtime
that affected CUDA targets. #6511Full Changelog: https://github.com/halide/Halide/compare/v13.0.2...v13.0.3
This is a hotfix for v13.0.0.
Bugs fixed: