Taichi Versions Save

Productive, portable, and performant GPU programming in Python.

v1.7.1

2 weeks ago

Highlights:

Bug fixes
- Fix CFG aliasing error with matrix of matrix (#8445) (by Zhanlue Yang)
Documentation
- Update offset.md (#8470) (by Kenshi Takayama)
- Update math_module.md (#8471) (by Kenshi Takayama)
- Update accelerate_pytorch.md | Fix typo in recap: Eeasy -> Easy (#8475) (by Aryan Garg)
Miscellaneous
- Bump version to 1.7.1 (by Haidong Lan)
- Bump taichi version to v1.8.0 (#8458) (by Zhanlue Yang)

Full changelog:

[Misc] Bump version to 1.7.1 (by Haidong Lan)
[bug] Fix abs on unsigned types (#8476) (by Lin Jiang)
[Doc] Update offset.md (#8470) (by Kenshi Takayama)
[Doc] Update math_module.md (#8471) (by Kenshi Takayama)
[Doc] Update accelerate_pytorch.md | Fix typo in recap: Eeasy -> Easy (#8475) (by Aryan Garg)
[Misc] Bump taichi version to v1.8.0 (#8458) (by Zhanlue Yang)
[lang] Warn about non-contiguous gradient tensors (#8450) (by Bob Cao)
[autodiff] Fix the type of cmp statements in autodiff (#8452) (by Lin Jiang)
[Bug] Fix CFG aliasing error with matrix of matrix (#8445) (by Zhanlue Yang)
[misc] Add flag to disable taichi header print (#8413) (by Chaoming Wang)

v1.7.0

5 months ago

1. New features

1.1 Real Function

We are excited to announce the stabilization of the Real Function feature in Taichi Lang v1.7.0. Initially introduced as an experimental feature in v1.0.0, it has now matured with enhanced capabilities and usability.

Key Updates

Decorator Change: The Real Function now uses @ti.real_func. The previous decorator, @ti.experimental.real_func, is deprecated.
Performance Improvements: Real Functions, unlike Taichi inline functions (@ti.func), are compiled as separate entities, akin to CUDA's device functions. This separation allows for recursive runtime calls and significantly faster compilation. For instance, the Cornell box example's compilation time is reduced from 2.34s to 1.01s on an i9-11900K when switching from inline to real functions.
Enhanced Functionality: Real Functions support multiple return statements, offering greater flexibility in coding.

Limitations

Backend Support: Real Functions are currently only compatible with LLVM-based backends, including CPU and CUDA.
Parallel Loops: Writing parallel loops within Real Functions is not supported. However, if called within a parallel loop in a kernel, the Real Function will be parallelized accordingly.

Important Note on Usage: Ensure all arguments and return values in Real Functions are explicitly type-hinted.

Usage Example

The following example demonstrates the recursive capability of Real Functions. The sum_func Real Function is used to calculate the sum of numbers from 1 to n, showcasing its ability to handle multiple return statements and variable recursion depths.

@ti.real_func
def sum_func(n: ti.i32) -> ti.i32:
    if n == 0:
        return 0
    return sum_func(n - 1) + n

@ti.kernel
def sum(n: ti.i32) -> ti.i32:
    return sum_func(n)

print(sum(100))  # 5050

You can find more examples of the real function in the repository.

1.2 Enhancements in Kernel Arguments and Return Values

Support for Multiple Return Values in Taichi Kernel:

In this update, we've introduced the capability to return multiple values from a Taichi kernel. This can be achieved by specifying a tuple as the return type. You can directly use (ti.f32, s0) as the type hint or write the type hint in Python manner like typing.Tuple[ti.f32, s0] or for Python 3.9 and above, tuple[ti.f32, s0] . The following example illustrates this new feature:

s0 = ti.types.struct(a=ti.math.vec3, b=ti.i16)

@ti.real_func
def foo() -> (ti.f32, s0):
    return 1, s0(a=ti.math.vec3([100, 0.5, 3]), b=1)

@ti.kernel
def bar() -> (ti.f32, s0):
    return foo()
    
ret1, ret2 = bar()
print(ret1)  # 1.0
print(ret2)  # {'a': [100.0, 0.5, 3.0], 'b': 1}

Removal of Size Limit on Kernel Arguments and Return Values:

We have eliminated the size restrictions on kernel arguments and return values. However, it's crucial to remember that keeping these small is advisable. Large argument or return value sizes can lead to substantially longer compile times. While we support larger sizes, we haven't thoroughly tested arguments and return values exceeding 4KB and cannot guarantee their flawless functionality.

1.3 Argument Pack

Taichi now introduces a powerful feature for developers - Argument Packs. This new functionality enables efficient caching of unchanged parameters between multiple kernel calls, which not only provides convenience when launching a kernel, but also boosts the performance.

Key Advantages

Argument Pack: User-defined data types that encapsulate multiple parameters into a single, manageable unit.
Buffering Capability: Store and reuse parameters that remain constant across kernel calls, reducing the overhead of repeated parameter passing.
Device-level Caching: Taichi optimizes performance by caching argpacks directly on the device.

Usage Example

import taichi as ti
ti.init()

# Defining a custom argument type using "ti.types.argpack"
view_params_tmpl = ti.types.argpack(view_mtx=ti.math.mat4, proj_mtx=ti.math.mat4, far=ti.f32)

# Declaration of a Taichi kernel leveraging Argument Packs
@ti.kernel
def p(view_params: view_params_tmpl) -> ti.f32:
    return view_params.far

# Instantiation of the argument pack
view_params = view_params_tmpl(
    view_mtx=ti.math.mat4(
        [[1, 0, 0, 0],
         [0, 1, 0, 0],
         [0, 0, 1, 0],
         [0, 0, 0, 1]]),
    proj_mtx=ti.math.mat4(
        [[1, 0, 0, 0],
         [0, 1, 0, 0],
         [0, 0, 1, 0],
         [0, 0, 0, 1]]),
    far=1)

# Executing the kernel with the Argument Pack
print(p(view_params))  # Outputs: 1.0

Supported Data Types

Argument Packs are currently compatible with a variety of data types, including scalar, matrix, vector, Ndarray, and Struct.

Limitations

Please note that Argument Packs currently do not support the following features and data types:

Ahead-of-Time (AOT) Compilation and Compute Graph
ti.template
ti.data_oriented

2. Improvements

2.1 CUDA Memory Allocation Improvements

Dynamic VRAM Allocation:

In our latest update, the CUDA backend has been optimized to dynamically allocate Video RAM (VRAM), significantly reducing the initial preallocation requirement. Now, less than 50MB is preallocated upon ti.init.

Changes in `device_memory_GB` and `device_memory_fraction` Usage:

These settings are now specifically tailored for preallocating memory for SPARSE data structures, such as ti.pointer. This preallocation occurs only once a Sparse data structure is detected in your code.

Impact on VRAM Consumption:

Users can expect a noticeable decrease in VRAM usage with these enhancements. For instance: diffmpm3d: 3866MB --> 3190 MB nerf_train_deploy: 5618MB --> 4664 MB

2.2 CUDA SIMT APIs

Added the following ti.simt.block APIs:

ti.simt.block.sync_any_nonzero
ti.simt.block.sync_all_nonzero
ti.simt.block.sync_count_nonzero

2.3 Sparse grid APIs

Added helper function to create a 2D/3D sparse grid, for example:

    # create a 2D sparse grid
    grid = ti.sparse.grid(
        {
            "pos": ti.math.vec2,
            "mass": ti.f32,
            "grid2particles": ti.types.vector(20, ti.i32),
        },
        shape=(10, 10),
    )

    # access
    grid[0, 0].pos = ti.math.vec2(1, 2)
    grid[0, 0].mass = 1.0
    grid[0, 0].grid2particles[2] = 123

2.4 GGUI

Added Metal backend support for GGUI

2.5 AOT

Added C-APIs of ti_import_cpu_memory() and ti_import_cuda_memory()
Added support for multiple AOT runtime devices
Added support for matrix/vector in compute graph in C-API
Added support for matrix/vector in compute graph in Python

2.6 Error reporting

Improved the quality and coverage of error messages

2.7 Autodiff

supports passing vector/matrix arguments in autodiff kernel
supports autodiff for torch Tensor and taichi ndarray on CPU and CUDA
supports passing grad tensor to primal kernel

3. Bug Fixes

3.1 Autodiff Bugfixes

Fixed a few bugs with use of ti.ad.Tape
Fixed a bug with random seed for loss

3.2 AOT Bugfixes

Fixed a few bugs with compute graph
Fixed a few bugs with C-API

3.3 API Bugfixes

Fixed a bunch of bugs related to Matrix/Vector
Fixed an error with Ndarray type check
Fixed a few error with taichi.math APIs
Fixed an error with SNode destruction
Fixed an error with dataclass support for struct with matrix
Fixed an error with ti.func
Fixed a few errors with ti.struct and struct field
Fixed a few errors with Sparse Matrix

3.4 Build & Environment Bugfixes

Fixed a few compilation issues on Windows platform
Fixed an issue with cusolver dependency

3.5 GGUI Bugfixes

Fix vec_to_euler that breaks GGUI cameras & handle camera logic better
Fix for ImGui widget size on HiDPI

4. Deprecation Notice

We have removed the CC backend because it is rarely used, and it lacks maintenance.
We are deprecating ti.experimental.real_func because it is no longer experimental. Please use ti.real_func instead.

5. Full changelog

Highlights:
   - **Bug fixes**
      - Fix macro error with ti_import_cpu_memory (#8401) (by **Zhanlue Yang**)
      - Fix argpack nesting issues (by **listerily**)
      - Convert matrices to structs in argpack type members, Fixing layout error (by **listerily**)
      - Fix error when returning a struct field member when the return … (#8271) (by **秋云未云**)
      - Fix Erroneous handling of ndarray in real function in CFG (#8245) (by **Lin Jiang**)
      - Fix issue with passing python-scope Matrix as ti.func argument (#8197) (by **Zhanlue Yang**)
      - Fix incorrect CFG Graph structure due to missing Block wiith OffloadedStmts on LLVM backend (#8113) (by **Zhanlue Yang**)
      - Fix type inference error with LowerMatrixPtr pass (#8105) (by **Zhanlue Yang**)
      - Set initial value for Cuda device allocation (#8063) (by **Zhanlue Yang**)
      - Fix the insertion position of the access chain (#7957) (by **Lin Jiang**)
      - Fix wrong datatype size when writing to ndarray from Python scope (by **Ailing Zhang**)
   - **CUDA backend**
      - Warn driver version if it doesn't support memory pool. (#7912) (by **Haidong Lan**)
   - **Documentation**
      - Fixing typo in impl.py on ti.grouped function documentation (#8407) (by **Quentin Warnant**)
      - Update doc about kernels and functions (#8400) (by **Lin Jiang**)
      - Update documentation (#8089) (by **Zhao Liang**)
      - Update docstring for inverse func (#8170) (by **Zhao Liang**)
      - Update type.md, add descriptions of the vector (#8048) (by **Chenzhan Shang**)
      - Fix a bug in faq.md (#7992) (by **Zhao Liang**)
      - Fix problems in type_system.md (#7949) (by **秋云未云**)
      - Add doc about struct arguments (#7959) (by **Lin Jiang**)
      - Fix docstring of mix function (#7922) (by **Zhao Liang**)
      - Update faq and ggui, and add them to CI (#7861) (by **Zhao Liang**)
      - Add kernel sync doc (#7831) (by **Zhao Liang**)
   - **Error messages**
      - Warn before calling the external function (#8177) (by **Lin Jiang**)
      - Add option to print full traceback in Python (#8160) (by **Lin Jiang**)
      - Let to_primitive_type throw an error if the type is a pointer (by **lin-hitonami**)
      - Update deprecation warning of the graph arguments (#7965) (by **Lin Jiang**)
   - **Language and syntax**
      - Add clz instruction (#8276) (by **Jett Chen**)
      - Move real function out of the experimental module (#8399) (by **Lin Jiang**)
      - Fix error with loop unique analysis for MatrixPtrStmt (#8307) (by **Zhanlue Yang**)
      - Pass DebugInfo from Python to C++ for ndarray and field (#8286) (by **魔法少女赵志辉**)
      - Support TensorType for SharedArray (#8258) (by **Zhanlue Yang**)
      - Use ErrorEmitter in type check passes (#8285) (by **魔法少女赵志辉**)
      - Implement struct DebugInfo and ErrorEmitter (#8284) (by **魔法少女赵志辉**)
      - Add TensorType support for Constant Folding (#8250) (by **Zhanlue Yang**)
      - Support TensorType for irpass::alg_simp() (#8225) (by **Zhanlue Yang**)
      - Support vector/matrix ndarray arguments in real function (by **Lin Jiang**)
      - Fix error on ndarray type check (by **Lin Jiang**)
      - Support real function in data-oriented classes (by **lin-hitonami**)
      - Let kernel support return type annotated with 'typing.Tuple' (by **lin-hitonami**)
      - Support tuple return value for kernel and real function (by **lin-hitonami**)
      - Let static assert be in static scope (#8217) (by **Lin Jiang**)
      - Avoid scalarization for AOS GlobalPtrStmt (#8187) (by **Zhanlue Yang**)
      - Support matrix return value for real function (by **lin-hitonami**)
      - Support ndarray argument for real function (by **lin-hitonami**)
      - Cast the scalar arguments and return values of ti.func if the type hints exist (#8193) (by **Lin Jiang**)
      - Handle MatrixPtrStmt for uniquely_accessed_pointers() (#8165) (by **Zhanlue Yang**)
      - Support struct arguments for real function (by **lin-hitonami**)
      - Merge irpass::half2_vectorize() with irpass::scalarize() (#8102) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after optimize_bit_struct_stores & determine_ad_stack_size (#8097) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::demote_operations() (#8096) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::lower_access() (#8091) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::make_block_local() (#8090) (by **Zhanlue Yang**)
      - Support TensorType for Dead-Store-Elimination (#8065) (by **Zhanlue Yang**)
      - Optimize alias checking conditions for store-to-load forwarding (#8079) (by **Zhanlue Yang**)
      - Support TensorType for Load-Store-Forwarding (#8058) (by **Zhanlue Yang**)
      - Fix TensorTyped error with irpass::make_thread_local() (#8051) (by **Zhanlue Yang**)
      - Fix numerical issue with auto_diff() (#8025) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::make_mesh_block_local() (#8030) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::make_thread_local() (#8028) (by **Zhanlue Yang**)
      - Support allocate with cuda memory pool and reduce preallocation size accordingly (#7929) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::demote_no_access_mesh_fors() (#7956) (by **Zhanlue Yang**)
      - Fix error with irpass::check_out_of_bound() for TensorTyped ExternalPtrStmt (#7997) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::demote_atomics() (#7943) (by **Zhanlue Yang**)
      - Separate out preallocation logics for runtime objects (#7938) (by **Zhanlue Yang**)
      - Remove deprecated funcs in __init__.py (#7941) (by **Lin Jiang**)
      - Remove deprecated sparse_matrix_builder function (#7942) (by **Lin Jiang**)
      - Remove deprecated compile option ndarray_use_cached_allocator (#7937) (by **Zhanlue Yang**)
      - Migrate irpass::scalarize() after irpass::detect_read_only() (#7939) (by **Zhanlue Yang**)
      - Remove deprecated funcs in ti.ui (#7940) (by **Lin Jiang**)
      - Remove the support for 'is' (#7930) (by **Lin Jiang**)
      - Migrate irpass::scalarize() after irpass::offload() (#7919) (by **Zhanlue Yang**)
      - Raise error when the dimension of the ndrange does not equal to the number of the loop variable (#7933) (by **Lin Jiang**)
      - Remove a.atomic(b) (#7925) (by **Lin Jiang**)
      - Cancel deprecating native min/max (#7928) (by **Lin Jiang**)
      - Fix the api doc search problem (#7918) (by **Zhao Liang**)
      - Move irpass::scalarize() after irpass::auto_diff() (#7902) (by **Zhanlue Yang**)
      - Fix Ndarray fill with Matrix/Vector typed values (#7901) (by **Zhanlue Yang**)
      - Add cast to field.fill() interface (#7899) (by **Zhanlue Yang**)
      - Let nested data classes have methods (#7909) (by **Lin Jiang**)
      - Let kernel argument support matrix nested in a struct (by **lin-hitonami**)
      - Support the functions of dataclass as kernel argument and return value (#7865) (by **Lin Jiang**)
      - Fix a bug on PosixPath (#7860) (by **Zhao Liang**)
      - Postpone MatrixType scalarization to irpass::differentiation_validation_check() (#7839) (by **Zhanlue Yang**)
      - Postpone MatrixType scalarization to irpass::gather_meshfor_relation_types() (#7838) (by **Zhanlue Yang**)
   - **Miscellaneous**
      - Make clang-tidy happy on 'explicit' (#7999) (by **秋云未云**)
   - **OpenGL backend**
      - Fix: runtime caught error cannot be displayed in opengl (#7998) (by **秋云未云**)
   - **IR optimization passes**
      - Make merging casts int(int(x)) less aggressive (#7944) (by **Ailing**)
      - Fix redundant clone of stmts across offloaded tasks (#7927) (by **Ailing**)
   - **Refactor**
      - Refactor the argument passing logic of rwtexture and remove extra_args (#7914) (by **Lin Jiang**)

v1.6.0

11 months ago

Deprecation Notice

We removed some APIs that were deprecated a long time ago. See the table below:

Removed API	Replace with
Using atomic operations like a.atomic_add(b)	ti.atomic_add(a, b) or a += b
Using is and is not inside Taichi kernel and Taichi function	Not supported
Ndrange for loop with the number of the loop variables not equal to the dimension of the ndrange	Not supported
ti.ui.make_camera()	ti.ui.Camera()
ti.ui.Window.write_image()	ti.ui.Window.save_image()
ti.SOA	ti.Layout.SOA
ti.AOS	ti.Layout.AOS
ti.print_profile_info	ti.profiler.print_scoped_profiler_info
ti.clear_profile_info	ti.profiler.clear_scoped_profiler_info
ti.print_memory_profile_info	ti.profiler.print_memory_profiler_info
ti.CuptiMetric	ti.profiler.CuptiMetric
ti.get_predefined_cupti_metrics	ti.profiler.get_predefined_cupti_metrics
ti.print_kernel_profile_info	ti.profiler.print_kernel_profiler_info
ti.query_kernel_profile_info	ti.profiler.query_kernel_profiler_info
ti.clear_kernel_profile_info	ti.profiler.clear_kernel_profiler_info
ti.kernel_profiler_total_time	ti.profiler.get_kernel_profiler_total_time
ti.set_kernel_profiler_toolkit	ti.profiler.set_kernel_profiler_toolkit
ti.set_kernel_profile_metrics	ti.profiler.set_kernel_profiler_metrics
ti.collect_kernel_profile_metrics	ti.profiler.collect_kernel_profiler_metrics
ti.VideoManager	ti.tools.VideoManager
ti.PLYWriter	ti.tools.PLYWriter
ti.imread	ti.tools.imread
ti.imresize	ti.tools.imresize
ti.imshow	ti.tools.imshow
ti.imwrite	ti.tools.imwrite
ti.ext_arr	ti.types.ndarray
ti.any_arr	ti.types.ndarray
ti.Tape	ti.ad.Tape
ti.clear_all_gradients	ti.ad.clear_all_gradients
ti.linalg.sparse_matrix_builder	ti.types.sparse_matrix_builder

We no longer deprecate the builtin min/max function in the Taichi kernel anymore.
We deprecate some arguments in the declaration of the arguments of the compute graph, and they will be removed in v1.7.0. Including:
- element_shape argument for scalar and ndarray
- shape, channel_format and num_channels arguments for texture
cc backend will be removed at next release (v1.7.0)

New features

Struct arguments

You can now use struct arguments in all backends. The structs can be nested, and it can contain matrices and vectors. Here's an example:

transform_type = ti.types.struct(R=ti.math.mat3, T=ti.math.vec3)
pos_type = ti.types.struct(x=ti.math.vec3, trans=transform_type)
@ti.kernel
def kernel_with_nested_struct_arg(p: pos_type) -> ti.math.vec3:
    return p.trans.R @ p.x + p.trans.T
trans = transform_type(ti.math.mat3(1), [1, 1, 1])
p = pos_type(x=[1, 1, 1], trans=trans)
print(kernel_with_nested_struct_arg(p))  # [4., 4., 4.]

Ndarray

Support 0 dim ndarray read & write in python scope
Fixed a bug when writing into ndarray from Python scope

Improvements

Support rsqrt operator in autodiff
Added assembly printer for CPU backend Zhanlue Yang
Supporting CUDA shared array allocation over 48KiB

Performance

Improved vectorization support on CPU backend, with significant performance gains for specific applications

New Examples

2D euler fluid simulation example by Lee-abcde

Misc

Python 3.11 support
ti.frexp is supported on CUDA, Vulkan, Metal, OpenGL backends.
ti.math.popcnt intrinsic by Garry Ling
Fixed a memory leak issue during SNodeTree destruction Zhanlue Yang
Added validation and improved error report for ti.Field finalization Zhanlue Yang
Fixed a memory leak issue with Cuda backend in C-API Zhanlue Yang
Added support for formatted printing with str.format() and f-strings Tianyi Liu
Changed Python code formatter from yapf to black

Developer Experience

build.py script for preparing build & testing environment

Full changelog

Highlights:

Bug fixes
- Fix wrong datatype size when writing to ndarray from Python scope (by Ailing Zhang)
CUDA backend
- Warn driver version if it doesn't support memory pool. (#7912) (by Haidong Lan)
- Better handling shared array shape check (#7818) (by Haidong Lan)
- Support large shared memory for CUDA backend (#7452) (by Haidong Lan)
Documentation
- Add doc about struct arguments (#7959) (by Lin Jiang)
- Fix docstring of mix function (#7922) (by Zhao Liang)
- Update faq and ggui, and add them to CI (#7861) (by Zhao Liang)
- Update doc for dynamic snode (#7804) (by Zhao Liang)
- Update field.md (#7819) (by zhoooou)
- Update readme (#7808) (by yanqingzhang)
- Update write_test.md (#7745) (by Qian Bao)
- Update performance.md (#7720) (by Zhao Liang)
- Update readme (#7673) (by Zhao Liang)
- Update tutorial.md (#7512) (by Chenzhan Shang)
- Update gui_system.md (#7628) (by Qian Bao)
- Remove deprecated api docstrings (#7596) (by pengyu)
- Fix the cexp docstring (#7588) (by Zhao Liang)
- Add doc about returning struct (#7556) (by Lin Jiang)
Error messages
- Update deprecation warning of the graph arguments (#7965) (by Lin Jiang)
Language and syntax
- Remove deprecated funcs in init.py (#7941) (by Lin Jiang)
- Remove deprecated sparse_matrix_builder function (#7942) (by Lin Jiang)
- Remove deprecated funcs in ti.ui (#7940) (by Lin Jiang)
- Remove the support for 'is' (#7930) (by Lin Jiang)
- Raise error when the dimension of the ndrange does not equal to the number of the loop variable (#7933) (by Lin Jiang)
- Remove a.atomic(b) (#7925) (by Lin Jiang)
- Cancel deprecating native min/max (#7928) (by Lin Jiang)
- Let nested data classes have methods (#7909) (by Lin Jiang)
- Let kernel argument support matrix nested in a struct (by lin-hitonami)
- Support the functions of dataclass as kernel argument and return value (#7865) (by Lin Jiang)
- Fix a bug on PosixPath (#7860) (by Zhao Liang)
- Seprate out the scalarization for MatrixOfMatrixPtrStmt and MatrixOfGlobalPtrStmt (#7803) (by Zhanlue Yang)
- Fix pylance warning (#7805) (by Zhao Liang)
- Support taking structs as kernel arguments (by lin-hitonami)
- Fix math module circular import bugs (#7762) (by Zhao Liang)
- Support formatted printing in str.format() and f-strings (#7686) (by 魔法少女赵志辉)
- Replace internal representation of Python-scope ti.Matrix with numpy arrays (#7559) (by Yi Xu)
- Stop letting ti.Struct inherit from TaichiOperations (#7474) (by Yi Xu)
- Support writing sparse matrix as matrix market file (#7529) (by pengyu)
Vulkan backend
- Fix repeated generation of array ranges in spirv codegen. (#7625) (by Haidong Lan)

Full changelog:

[CUDA] Warn driver version if it doesn't support memory pool. (#7912) (by Haidong Lan)
[Doc] Add doc about struct arguments (#7959) (by Lin Jiang)
[Error] Update deprecation warning of the graph arguments (#7965) (by Lin Jiang)
[windows] Workaround C++ mangling special chars (#7964) (by Ailing)
[Lang] Remove deprecated funcs in init.py (#7941) (by Lin Jiang)
[build] Remove redundant C-API shared object in wheel (#7950) (by Proton)
[test] Do not test cc backend (by Proton)
[Lang] Remove deprecated sparse_matrix_builder function (#7942) (by Lin Jiang)
[Lang] Remove deprecated funcs in ti.ui (#7940) (by Lin Jiang)
[Lang] Remove the support for 'is' (#7930) (by Lin Jiang)
[Lang] Raise error when the dimension of the ndrange does not equal to the number of the loop variable (#7933) (by Lin Jiang)
[Lang] Remove a.atomic(b) (#7925) (by Lin Jiang)
[Lang] Cancel deprecating native min/max (#7928) (by Lin Jiang)
[Doc] Fix docstring of mix function (#7922) (by Zhao Liang)
[example] Fix ti example bugs (#7903) (by Zhao Liang)
[ci] Build.py: Source generated env in new spawned shell (by Proton)
[misc] Fix changelog commit extract code (by Proton)
[ci] More robust build.py bootstrapping (#7920) (by Proton)
[Lang] [bug] Let nested data classes have methods (#7909) (by Lin Jiang)
[cuda] Only set CU_LIMIT_STACK_SIZE when necessary (#7906) (by Ailing)
[Lang] Let kernel argument support matrix nested in a struct (by lin-hitonami)
[Bug] Fix wrong datatype size when writing to ndarray from Python scope (by Ailing Zhang)
[lang] Support 0 dim ndarray read & write in python scope (by Ailing Zhang)
[Lang] Support the functions of dataclass as kernel argument and return value (#7865) (by Lin Jiang)
[spirv] Support struct as kernel argument (by Lin Jiang)
[spirv] Fix the ret type of frexp (by lin-hitonami)
[ci] Build.py: Do not try to bootstrap pip (too many issues) (#7897) (by Proton)
[ci] Build.py quirks fix (#7894) (by Proton)
[Doc] Update faq and ggui, and add them to CI (#7861) (by Zhao Liang)
[build] Remove unused apt pkg 'libmirclient-dev' to make 'build.py' run properly on ubuntu 22.04 (#7871) (by Yu Zhang)
[Lang] Fix a bug on PosixPath (#7860) (by Zhao Liang)
[ci] Polishing build.py, wave 4 (#7857) (by Proton)
[build] Use LLVM without zstd dependency on M1 Macs (#7856) (by Proton)
[doc] Update dev_install.md to reflect build.py usage (#7848) (by Proton)
[ci] Polishing build.py, wave 3 (#7845) (by Proton)
[lang] Add popcnt to llvm intrinsic support (#7772) (by Garry Ling)
[Doc] Update doc for dynamic snode (#7804) (by Zhao Liang)
[ci] Fix release build failure (#7834) (by Proton)
[ci] More robust build.py bootstrapping (#7833) (by Proton)
[Doc] Update field.md (#7819) (by zhoooou)
[autodiff] Remove redundant autodiff mode in kernel name (#7829) (by Ailing)
[lang] Migrate Caching Allocation logics from CudaDevice/AmdgpuDevice to DeviceMemoryPool (#7793) (by Zhanlue Yang)
[misc] Resolve code formatter frictions (#7828) (by Proton)
[Lang] Seprate out the scalarization for MatrixOfMatrixPtrStmt and MatrixOfGlobalPtrStmt (#7803) (by Zhanlue Yang)
[bug] Fix imgui_context in destroying multiple GGUI windows (#7812) (by Ailing)
[misc] Update git-blame-ignore-revs (#7825) (by Proton)
[ci] Complete doc test list, remove redundant default prelude (#7823) (by Proton)
[misc] Relax Black formatter line length limit to 120 (#7824) (by Proton)
[Doc] Update readme (#7808) (by yanqingzhang)
[misc] Switch code formatter from yapf to black (#7785) (by Proton)
[CUDA] Better handling shared array shape check (#7818) (by Haidong Lan)
[misc] Improve ::liong::json::deserialize() (by PGZXB)
[bug] Fix gen_offline_cache_key (#7810) (by PGZXB)
[ci] Fix build.py ensurepip (#7811) (by Proton)
[Lang] Fix pylance warning (#7805) (by Zhao Liang)
[lang] Support frexp on spirv-based backends (#7770) (by Ailing)
[lang] Split MemoryPool into DeviceMemoryPool and HostMemoryPool (#7786) (by Zhanlue Yang)
[misc] Optimize import overhead: pytorch and get_clangpp (#7797) (by Haidong Lan)
[ci] [doc] Tighten up document testing (#7801) (by Proton)
[ci] Polishing build.py, wave 2 (#7800) (by Proton)
[aot] Remove unused AotDataConverter (#7799) (by Lin Jiang)
[perf] Fix Taichi CPU backend compile parameter to pair performance with Numba. (#7731) (by zhengxianli)
[ci] Polishing build.py (#7794) (by Proton)
[bug] Returning nan for ti.sym_eig on identity matrix (#7443) (by Yimin Tang)
[Lang] Support taking structs as kernel arguments (by lin-hitonami)
[ir] Add 'create_load' to ArgLoadStmt (by lin-hitonami)
[ir] Let the src of GetElementStmt be a pointer (by lin-hitonami)
[lang] Clean up runtime allocation functions (#7773) (by Zhanlue Yang)
[lang] Migrate CUDA preallocation logic to CudaMemoryPool (#7746) (by Zhanlue Yang)
[gfx] Fix runtime buffer/image copy barrier semantics (#7781) (by Bob Cao)
[misc] Remove unnecessary TaskCodeGenLLVM::task_counter (#7777) (by PGZXB)
[ci] Temporarily force Windows release builds to run on sm70 nodes (#7767) (by Proton)
[refactor] Remove Kernel::lowered_ (#7765) (by PGZXB)
[gui] Fluid visualization utilities (#7682) (by Qian Bao)
[Lang] Fix math module circular import bugs (#7762) (by Zhao Liang)
[misc] Make pre-commit happy (#7768) (by Proton)
[ci] Build iOS AOT static library (by Proton)
[misc] Wrap path with std::filesystem::path (#7754) (by Bob Cao)
[lang] Support vector and matrix dtypes in ti.field (#7761) (by Ailing)
[ir] Remove unnecessary field_dims_ in ArgLoadStmt (#7755) (by Ailing)
[refactor] Remove Kernel::task_counter_ (#7751) (by PGZXB)
[ci] Build.py: Introduce TAICHI_CMAKE_ARGS manager for better log readability (by Proton)
[ci] Reorganize build.py code (by Proton)
[refactor] Let KernelCompilationManager manage kernel compilation in gfx::AotModuleBuilderImpl (#7715) (by PGZXB)
[misc] Remove unused FullSimplifyPass::Args::program (#7750) (by PGZXB)
[refactor] Re-impl LlvmAotModule using LLVM::KernelLauncher (#7744) (by PGZXB)
[lang] Implement experimental CG(Conjugate Gradient) solver in Taichi-lang (#7690) (by Qian Bao)
[lang] Transform bit_shr to bit_sar for uint (#7757) (by Ailing)
[ir] Postpone scalarize and lower_matrix_ptr to after bit loop vectorization (#7726) (by 魔法少女赵志辉)
[ci] Isolate post sm70 tests (#7740) (by Proton)
[cuda] Suppport using SparseMatrix on more CUDA versions (#7724) (by Yu Zhang)
[cuda] Update the data layout of CUDA (#7748) (by Lin Jiang)
[ci] Ignore dup benchmark data points (#7749) (by Proton)
[bug] Fix reduction of atomic max (#7747) (by Lin Jiang)
[Doc] Update write_test.md (#7745) (by Qian Bao)
[refactor] Remove 'args' from 'RuntimeContext' (by lin-hitonami)
[gfx] Let gfx backends use LaunchContextBuilder to build arguments in struct type (by lin-hitonami)
[gfx] [refactor] Convert f16 in LaunchContextBuilder (by lin-hitonami)
[gfx] Record the struct type of arguments and results in KernelContextAttributes (by lin-hitonami)
[gfx] Compile struct type of result and arguments in gfx backends (by lin-hitonami)
[refactor] Implement CompiledKernelData::check() (#7743) (by PGZXB)
[doc] [test] Update docs for printing with f-strings and formatted strings (#7733) (by 魔法少女赵志辉)
[lang] Improve error message for mismatched index for ndarrays in python scope (#7737) (by Ailing)
[bug] Avoid redundant cache loading (#7741) (by PGZXB)
[refactor] Let KernelCompilationManager manage kernel compilation in LlvmAotModuleBuilder (#7714) (by PGZXB)
[ci] Skip large shared memory test for Turing GPUs. (#7739) (by Haidong Lan)
[cuda] Remove deprecated cusparse functions (#7725) (by Yu Zhang)
[misc] Update pull_request_template.md (#7738) (by Ailing)
[misc] Remove TI_WARN for cuda in memory_pool.cpp (#7734) (by Ailing)
[CUDA] Support large shared memory for CUDA backend (#7452) (by Haidong Lan)
[vulkan] Update SPIR-V codegen to emit FP16 consts (#7676) (by Bob Cao)
[lang] Support frexp on cuda backend (#7721) (by Ailing)
[refactor] Unify implementation of ProgramImpl::compile() (by PGZXB)
[refactor] Introduce LLVM::KernelLauncher (by PGZXB)
[refactor] Introduce gfx::KernelLauncher (by PGZXB)
[test] Enable test offline cache on amdgpu and dx11 (#7703) (by PGZXB)
[lang] Refactor ownership and inheritance of allocators (#7685) (by Zhanlue Yang)
[ci] Fix git cache quirks (#7722) (by Proton)
[lang] Improve error msg in create ndarray (#7709) (by Garry Ling)
[Doc] Update performance.md (#7720) (by Zhao Liang)
[bug] Switch the gallery image used by README. (#7716) (by Chengchen(Rex) Wang)
[lang] Merge AMDGPUCachingAllocator to the generic CachingAllocator (#7717) (by Zhanlue Yang)
[bug] Invalid Field cache, RWAccessors cache, and Kernel cache upon SNodeTree destruction (#7704) (by Zhanlue Yang)
[ci] [test] Enable cc test on CI (by lin-hitonami)
[test] [cc] Skip tests that cc backend doesn't support (by lin-hitonami)
[test] Exclude the cc backend from tests that involve dynamic indexing (#7705) (by 魔法少女赵志辉)
[bug] Fix camera controls (#7681) (by liblaf)
[bug] [cc] Fix comparison op in cc backend (by Lin Jiang)
[bug] [cc] Set external ptr for cc backend (by lin-hitonami)
[lang] Merged VirtualMemoryAllocator into MemoryPool for LLVM-CPU backend (#7671) (by Zhanlue Yang)
[misc] Remove useless JITEvaluatorId (#7700) (by PGZXB)
[bug] Fixed building with clang on Windows failed (#7699) (by PGZXB)
[Lang] Support formatted printing in str.format() and f-strings (#7686) (by 魔法少女赵志辉)
[ci] Git caching proxy in CI (#7692) (by Proton)
[build] Let msvc generate pdb for cpp & c_api tests (by lin-hitonami)
[refactor] Stop storing pointers to array devallocs in kernel args (by lin-hitonami)
[aot] Implement bin2c in AOT cppgen (#7687) (by PENGUINLIONG)
[cpu] Remove atomics demotion for single-thread CPU targets. (#7631) (by Haidong Lan)
[aot] Export templated kernels (#7683) (by PENGUINLIONG)
[ci] Revive /benchmark (#7680) (by Proton)
[Doc] Update readme (#7673) (by Zhao Liang)
[misc] Device API public headers and CMake rework part 1 (#7624) (by Bob Cao)
[misc] Move optimize cpu module to KernelCodeGen (#7667) (by PGZXB)
[lang] [ir] Extract and save the format specifiers in str.format() (#7660) (by 魔法少女赵志辉)
[example] Add 2D euler fluid simulation example (#7568) (by Lee-abcde)
[wasm] Remove WASM backend (by lin-hitonami)
[build] Fix ssize_t type undefined errors when building with TI_WITH_LLVM=OFF on windows (#7665) (by Yu Zhang)
[misc] Remove unused Kernel::is_evaluator (#7669) (by PGZXB)
[misc] Remove unused Program::jit_evaluator_cache and Program::jit_evaluator_cache_mut (#7668) (by PGZXB)
[misc] Simplify test_offline_cache.py (#7663) (by PGZXB)
[lang] Improve error reporting for FieldsBuilder finalization (#7640) (by Zhanlue Yang)
[misc] Rename taichi::lang::llvm to taichi::lang::LLVM (#7659) (by PGZXB)
[refactor] Remove MemoryPool daemon in LLVM runtime (#7648) (by Zhanlue Yang)
[opt] Cleanup unncessary options in constant fold pass (#7661) (by Ailing)
[ci] Use build.py to prepare testing environment on Windows (#7658) (by Proton)
[opt] Move binary jit evaluator to host (by Ailing Zhang)
[test] Update C++ constant fold tests to test operator one by one (by Ailing Zhang)
[aot] Avoid shared library file being packaged into wheel data (#7652) (by Chenzhan Shang)
[ci] Fix scipy install (#7649) (by Proton)
[misc] Remove an unnecessary parameter of KernelCompilationManager::make_filename (by PGZXB)
[refactor] Remove some unnecessary functions of KernelCodeGen (by PGZXB)
[refactor] Re-impl JIT and Offline Cache on LLVM backends (by PGZXB)
[refactor] Implement llvm::KernelCompiler (by PGZXB)
[refactor] Gen code for KernelCodeGen::ir instead of KernelCodeGen::kernel->ir (by PGZXB)
[Doc] Update tutorial.md (#7512) (by Chenzhan Shang)
[ci] Test manylinux2014 build on PR (#7647) (by Proton)
[bug] Fix logical comparison returns -1 (#7641) (by Ailing)
[doc] Fix gui_system.md tests (#7646) (by Proton)
[Doc] Update gui_system.md (#7628) (by Qian Bao)
[aot] Hand-written CMake target script (#7644) (by PENGUINLIONG)
[ci] Do not use Android toolchain for perf testing (#7642) (by Proton)
[ci] Support Python 3.11 (#7627) (by Proton)
[build] Setup Android SDK environment for performance bot (#7635) (by Zhanlue Yang)
[ci] Update perf mon image (#7639) (by Proton)
[ci] Fix perf mon break (#7638) (by Proton)
[doc] Add documentation on using ghstack (#7632) (by Proton)
[build] Static linking libstdc++ on Linux (by Proton)
[ci] Rewrite Dockerfiles (by Proton)
[ci] Resolve "Needed single revision" workaround failure when the repo directory is empty (#7633) (by Proton)
[Vulkan] Fix repeated generation of array ranges in spirv codegen. (#7625) (by Haidong Lan)
[build] Switch to use docker with Android-SDK for performance bot (#7630) (by Zhanlue Yang)
[opengl] glfw finalize crash fix (by Proton)
[ci] build.py: Android support, entering shell, export env (by Proton)
[ci] Do not run tests with mixed backends (by Proton)
[refactor] Use f16 function from external lib (by lin-hitonami)
[refactor] Migrate members from RuntimeContext to LaunchContextBuilder (by lin-hitonami)
[bug] Fix setting arguments exceeding the max arg num (by lin-hitonami)
[cpu] Explicitly make cpu multithreading loop for range-fors. (#7593) (by Haidong Lan)
[aot] Fixed generator for compute graph (#7626) (by PENGUINLIONG)
[ir] Postpone scalarize and lower_matrix_ptr to after typecheck (#7589) (by 魔法少女赵志辉)
[aot] Header generator completed (#7609) (by PENGUINLIONG)
[amdgpu] Initialize AMDGPUContext with defaults (by Proton)
[build] Remove libSPIRV-Tools-shared.(so|dll) in wheel (by Proton)
[lang] Removed cpu_device(), cuda_device(), and amdgpu_device() from LlvmRuntimeExecutor (#7544) (by Zhanlue Yang)
[refactor] Remove the get/set functions in RuntimeContext (by lin-hitonami)
[aot] Pass LaunchContextBuilder to CompiledGraph::init_runtime_context (by lin-hitonami)
[gfx] Let GfxRuntime use LaunchContextBuilder (by lin-hitonami)
Let LaunchContextBuilder be the argument of the kernel launch function (by lin-hitonami)
[llvm] [refactor] Set the llvm runtime when executing (by lin-hitonami)
[refactor] Migrate {set, get}_{arg, ret} functions from RuntimeContext (by lin-hitonami)
[bug] Fix compilation error (#7606) (by PGZXB)
[aot] Hide map memory failure (#7604) (by PENGUINLIONG)
[refactor] Fix KernelCodeGen::kernel from Kernel * to const Kernel * (by PGZXB)
[refactor] Remove legacy implementation of llvm offline cache (by PGZXB)
[refactor] Impl llvm::CompiledKernelData (by PGZXB)
[bug] Type check for logical not op with real type inputs (#7600) (by Ailing)
[bug] Improve ndarray creation to fix segmentation fault (#7577) (by pengyu)
[lang] Add assembly printer for CPU backend (#7590) (by Zhanlue Yang)
[misc] Update docker filer (#7598) (by Zeyu Li)
[aot] Fix absolute path in generated TaichiTargets.cmake (#7597) (by Chenzhan Shang)
[Doc] Remove deprecated api docstrings (#7596) (by pengyu)
[llvm] Compile the kernel arguments to a StructType (by Lin Jiang)
[lang] Fix issue with llvm opaque pointer (#7557) (by Zhanlue Yang)
[opt] Constant folding for unary ops on host (#7573) (by Ailing)
[bug] Type check for bit_not op with real type inputs (#7592) (by Ailing)
[Doc] Fix the cexp docstring (#7588) (by Zhao Liang)
[Lang] Replace internal representation of Python-scope ti.Matrix with numpy arrays (#7559) (by Yi Xu)
[bug] Avoid cuda compilation via clang and ship pre-compiled .bc file instead (#7570) (by Zhanlue Yang)
[aot] Taichi kernel AOT command (#7565) (by PENGUINLIONG)
[bug] Fix struct members registered to StructField class (#7574) (by Ailing)
[aot] Mobile platform AOT build scripts (#7567) (by PENGUINLIONG)
[misc] Revert "Security upgrade ipython from 7.34.0 to 8.10.0 (#7341)" (#7571) (by Proton)
[test] Add cpp tests for constant folding pass (#7566) (by Ailing)
[misc] Security upgrade ipython from 7.34.0 to 8.10.0 (#7341) (by Chengchen(Rex) Wang)
[lang] Refactor CudaCachingAllocator into a more generic caching allocator (#7531) (by Zhanlue Yang)
[aot] Load GfxRuntime140 module from TCM (#7539) (by PENGUINLIONG)
[lang] Fixed useless serial shader to blit ExternalTensorShapeAlongAxisStmt on Metal (#7562) (by PENGUINLIONG)
[aot] Enable Vulkan 8bit storage (#7564) (by PENGUINLIONG)
[bug] Fix crashing on printing FrontendFuncCallStmt with no return value (by lin-hitonami)
[refactor] Remove LaunchContextBuilder::set_arg_raw (by lin-hitonami)
[llvm] Generalize TaskCodeGenLLVM::create_return to set_struct_to_buffer (by lin-hitonami)
[bug] Fix Cuda memory leak during TiRuntime destruction (#7345) (by Zhanlue Yang)
[ir] Let void struct type represent void type (by lin-hitonami)
[aot] Let C-API use LaunchContextBuilder to manage RuntimeContext (by lin-hitonami)
[ir] Let the reference type declare a pointer argument (by lin-hitonami)
[Doc] Add doc about returning struct (#7556) (by Lin Jiang)
[bug] Fix returning struct containing vec3 (#7552) (by Lin Jiang)
[lang] [ir] Extract and save the format specifiers in the f-string (#7514) (by 魔法少女赵志辉)
[Lang] Stop letting ti.Struct inherit from TaichiOperations (#7474) (by Yi Xu)
[aot] Recover AOT CI branch names (#7543) (by PENGUINLIONG)
[aot] Put TiRT in Python wheel and CMake script to find it in wheel (#7537) (by PENGUINLIONG)
[refactor] Remove the difficult-to-implement CompiledKernelData::size() (#7540) (by PGZXB)
[bug] Implement the missing clone function for FrontendFuncCallStmt (#7538) (by PGZXB)
[misc] Bump version to v1.6.0 (#7536) (by Haidong Lan)
[doc] Handle 2 digit minor versions correctly (#7535) (by Ritoban Roy-Chowdhury)
[aot] GfxRuntime140 convention docs (#7527) (by PENGUINLIONG)
[rhi] Refactor allocate_memory API to use RhiResult (#7463) (by Bob Cao)
[metal] Choose the proper msl version according to the device capability (#7506) (by Yu Zhang)
[Lang] Support writing sparse matrix as matrix market file (#7529) (by pengyu)

v1.5.0

1 year ago

Deprecation Notice

ndarray no longer accepts field_dim, replaced by the ndim argument.
[RFC] Deprecate ti.cc backend in favor of TiRT and its C API, if you have any concerns please let us know at https://github.com/taichi-dev/taichi/issues/7629

New features

AOT

Taichi Runtime (TiRT) now supports Apple's Metal API and OpenGL ES for compatibility on old mobile platforms. Now Taichi programs can be deployed to any mainstream consumer devices. NOTE Taichi program deployment on mobile platforms is experimental. Please contact us at [email protected] for long-term services.
Taichi AOT now fully supports float16 dtype.

Ndarray

Out of bound check is now supported on ndarrays

Improvements

Python Frontend

We now support returning a struct on LLVM-based backends (CPU and CUDA backend). The struct can contain vectors and matrices, and it can also nest with other structs. Here's an example.

s0 = ti.types.struct(a=ti.math.vec3, b=ti.i16)
s1 = ti.types.struct(a=ti.f32, b=s0)

@ti.kernel
def foo() -> s1:
    return s1(a=1, b=s0(a=ti.math.vec3(100, 0.2, 3), b=1))

print(foo())  # {'a': 1.0, 'b': {'a': [100.0, 0.2, 3.0], 'b': 1}}

Performance

Support atomic operation on half2 for CUDA backend (with compute capability > 60). You can enable this with ti.init(half2_vectorization=True). This feature could effectively accelerate the Nerf training process, please refer to this repo for details.

GGUI

GGUI now has no computing backend restrictions! You can now use Metal, OpenGL, AMDGPU, or DirectX 11, in addition to CPU, CUDA, Vulklan that's previously suported by GGUI.
GGUI now has been validated on mesa's software rasterizer lavapipe, you can utilize this solution for headless server visualization, or on servers with no graphics capabilities (such as A100)
Add the fps_limit option which adjusts the maximal frame rate in GGUI.

Full changelog:

Highlights:
   - **AMDGPU backend**
      - Enable shared array on amdgpu backend (#7403) (by **Zeyu Li**)
      - Add print kernel amdgcn (#7357) (by **Zeyu Li**)
      - Add amdgpu backend profiler (#7330) (by **Zeyu Li**)
   - **Aot module**
      - Let AOT kernel inherit CallableBase and use LaunchContextBuilder (by **lin-hitonami**)
      - Deprecate element shape and field dim for AOT symbolic args (#7100) (by **Haidong Lan**)
   - **Bug fixes**
      - Fix copy_from() of StructField (#7294) (by **Yi Xu**)
      - Fix caching same loop invariant global vars inside nested fors (#7285) (by **Lin Jiang**)
      - Fix num_splits in parallel_struct_for (#7121) (by **Yi Xu**)
      - Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by **Yi Xu**)
   - **Documentation**
      - Update GGUI docs with correct API (#7525) (by **pengyu**)
      - Fix typos and improve example code in data_oriented_class.md (#7520) (by **pengyu**)
      - Update gui_system.md, remove unnecessary example (#7487) (by **NextoneX**)
      - Fix typo in API doc (#7511) (by **pengyu**)
      - Update math_module (#7405) (by **Zhao Liang**)
      - Update hello_world.md (#7400) (by **Zhao Liang**)
      - Update debugging.md (#7401) (by **Zhao Liang**)
      - Update hello_world.md (#7380) (by **Zhao Liang**)
      - Update type.md (#7376) (by **Zhao Liang**)
      - Update kernel_function.md (#7375) (by **Zhao Liang**)
      - Update hello_world.md (#7369) (by **Zhao Liang**)
      - Update hello_world.md (#7368) (by **Zhao Liang**)
      - Update data_oriented_class.md (#6790) (by **Zhao Liang**)
      - Update hello_world.md (#7367) (by **Zhao Liang**)
      - Update kernel_function.md (#7364) (by **Zhao Liang**)
      - Update hello_world.md (#7354) (by **Zhao Liang**)
      - Update llvm_sparse_runtime.md (#7323) (by **Gabriel Vainer**)
      - Update profiler.md (#7358) (by **Zhao Liang**)
      - Update kernel_function.md (#7356) (by **Zhao Liang**)
      - Update tut.md (#7352) (by **Gabriel Vainer**)
      - Update type.md (#7350) (by **Zhao Liang**)
      - Update hello_world.md (#7337) (by **Zhao Liang**)
      - Update append docstring (#7265) (by **Zhao Liang**)
      - Update ndarray.md (#7236) (by **Gabriel Vainer**)
      - Update llvm_sparse_runtime.md (#7215) (by **Zhao Liang**)
      - Remove doc tutorial (#7198) (by **Olinaaaloompa**)
      - Rename tutorial doc (#7186) (by **Zhao Liang**)
      - Update tutorial.md (#7176) (by **Zhao Liang**)
      - Update math_module.md (#7175) (by **Zhao Liang**)
      - Update debugging.md (#7173) (by **Zhao Liang**)
      - Fix C++ tutorial does not display on doc site (#7174) (by **Zhao Liang**)
      - Update doc regarding dynamic index (#7148) (by **Yi Xu**)
      - Move glossary to top level (#7118) (by **Zhao Liang**)
      - Update type.md (#7038) (by **Zhao Liang**)
      - Fix docstring (#7065) (by **Zhao Liang**)
   - **Error messages**
      - Allow IfExp on matrices when the condition is scalar (#7241) (by **Lin Jiang**)
      - Remove deprecations in ti.ui in 1.6.0 (#7229) (by **Lin Jiang**)
      - Remove deprecated ti.linalg.sparse_matrix_builder in 1.6.0 (#7228) (by **Lin Jiang**)
      - Remove deprecations in ASTTransformer in 1.6.0 (#7226) (by **Lin Jiang**)
      - Remove deprecated a.atomic_op(b) in Taichi v1.6.0 (#7225) (by **Lin Jiang**)
      - Remove deprecations in taichi/__init__.py in v1.6.0 (#7222) (by **Lin Jiang**)
      - Raise error when using deprecated ifexp on matrices (#7224) (by **Lin Jiang**)
      - Better error message when creating sparse snodes on backends that do not support sparse (#7191) (by **Lin Jiang**)
      - Raise errors when using metal sparse (#7113) (by **Lin Jiang**)
   - **GUI**
      - GGUI use shader "factory" (GGUI rework n/N) (#7271) (by **Bob Cao**)
   - **Intermediate representation**
      - Unified type system for internal operations (#6337) (by **daylily**)
   - **Language and syntax**
      - Keep ti.pyfunc (#7530) (by **Lin Jiang**)
      - Type check assignments between tensors (#7480) (by **Yi Xu**)
      - Fix pylance warnings raised by ti.static (#7437) (by **Zhao Liang**)
      - Deprecate arithmetic operations and fill() on ti.Struct (#7456) (by **Yi Xu**)
      - Fix pylance warnnings by ti.random (#7439) (by **Zhao Liang**)
      - Fix pylance types warning (#7417) (by **Zhao Liang**)
      - Add better error message for dynamic snode (#7238) (by **Zhao Liang**)
      - Simplify the swizzle generator (#7216) (by **Zhao Liang**)
      - Remove the deprecated dynamic_index switch (#7195) (by **Yi Xu**)
      - Remove deprecated packed switch (#7104) (by **Yi Xu**)
      - Raise errors when using the packed switch (#7125) (by **Yi Xu**)
      - Fix cannot use taichi in REPL (#7114) (by **Zhao Liang**)
      - Remove deprecated ti.Matrix.rotation2d() (#7098) (by **Yi Xu**)
      - Remove filename kwarg in aot Module save() (#7085) (by **Ailing**)
      - Remove sourceinspect deprecation warning message (#7081) (by **Zhao Liang**)
      - Make slicing a single row/column of a matrix return a vector (#7068) (by **Yi Xu**)
   - **Miscellaneous**
      - Strictly check ndim with external array (#7126) (by **Haidong Lan**)

Full changelog:
   - [cc] Add deprecation notice for cc backend (#7651) (by **Ailing**)
   - [misc] Cherry pick struct return related commits (#7575) (by **Haidong Lan**)
   - [Lang] Keep ti.pyfunc (#7530) (by **Lin Jiang**)
   - [bug] Fix symbol conflicts with taichi_cpp_tests (#7528) (by **Zhanlue Yang**)
   - [bug] Fix numerical issue with TensorType'd arithmetics (#7526) (by **Zhanlue Yang**)
   - [aot] Enable Metal AOT test (#7461) (by **PENGUINLIONG**)
   - [Doc] Update GGUI docs with correct API (#7525) (by **pengyu**)
   - [misc] Implement KernelCompialtionManager::clean_offline_cache (#7515) (by **PGZXB**)
   - [ir] Except shared array from demote atomics pass. (#7513) (by **Haidong Lan**)
   - [bug] Fix error with windows-clang compilation for cuda_runtime.cu (#7519) (by **Zhanlue Yang**)
   - [misc] Deprecate field dim and update deprecation warnings (#7491) (by **Haidong Lan**)
   - [build] Fix build failure without nvcc (#7521) (by **Ailing**)
   - [Doc] Fix typos and improve example code in data_oriented_class.md (#7520) (by **pengyu**)
   - [aot] Kernel argument count limit (#7518) (by **PENGUINLIONG**)
   - [Doc] Update gui_system.md, remove unnecessary example (#7487) (by **NextoneX**)
   - [AOT] [llvm] Let AOT kernel inherit CallableBase and use LaunchContextBuilder (by **lin-hitonami**)
   - [llvm] Let the offline cache record the type info of arguments and return values (by **lin-hitonami**)
   - [ir] Separate LaunchContextBuilder from Kernel (by **lin-hitonami**)
   - [Doc] Fix typo in API doc (#7511) (by **pengyu**)
   - [aot] Build Runtime C-API by default (#7508) (by **PENGUINLIONG**)
   - [bug] Fix run_tests.py --with-offline-cache (#7507) (by **PGZXB**)
   - [vulkan] Support printing constant strings containing % (#7499) (by **魔法少女赵志辉**)
   - [ci] Fix nightly version number, 2nd try (#7501) (by **Proton**)
   - [aot] Fixed memory leak in metal backend (#7500) (by **PENGUINLIONG**)
   - [ci] Fix nightly version number issue (#7498) (by **Proton**)
   - [example] Remove cv2, cairo dependency (#7496) (by **Zhao Liang**)
   - [type] Let Type * be serializable (by **lin-hitonami**)
   - [ci] Second attempt at permission check for ghstack landing (#7490) (by **Proton**)
   - [docs] Reword words of warning about building from source (#7488) (by **Anselm Schüler**)
   - [lang] Fixed double release of Metal command buffer (#7484) (by **PENGUINLIONG**)
   - [ci] Switch Android bots lock redis to bot-master (#7482) (by **Proton**)
   - [ci] Status check of ghstack CI bot (#7479) (by **Proton**)
   - [Lang] Type check assignments between tensors (#7480) (by **Yi Xu**)
   - [doc] Fix typo in ndarray.md (#7476) (by **Chenzhan Shang**)
   - [opt] Enable half2 optimization for atomic_add operations on CUDA backend (#7465) (by **Zhanlue Yang**)
   - [Lang] Fix pylance warnings raised by ti.static (#7437) (by **Zhao Liang**)
   - Let the LaunchContextBuilder manage the result buffer (by **lin-hitonami**)
   - [ci] Fix nightly build failure, and minor improvements (#7475) (by **Proton**)
   - [ci] Fix duplicated names in aot tests (#7471) (by **Ailing**)
   - [lang] Improve float16 support from Taichi type system (#7402) (by **Zhanlue Yang**)
   - [Lang] Deprecate arithmetic operations and fill() on ti.Struct (#7456) (by **Yi Xu**)
   - [misc] Add out of bound check for ndarray (#7458) (by **Ailing**)
   - [aot] Remove graph kernel interfaces (#7466) (by **PENGUINLIONG**)
   - [llvm] Let the RuntimeContext use the host result buffer (by **lin-hitonami**)
   - [gui] Fix 3d line drawing & add test (#7454) (by **Bob Cao**)
   - [lang] Fixed texture assertions (#7450) (by **PENGUINLIONG**)
   - [aot] Fixed header generator (#7455) (by **PENGUINLIONG**)
   - [aot] AOT module convention GfxRuntime140 (#7440) (by **PENGUINLIONG**)
   - [misc] Add an explicit error in cc backend codegen for dynamic indexing (#7449) (by **Ailing**)
   - [ci] Lower C++ tests concurrency (#7451) (by **Proton**)
   - [aot] Properly handle texture attributes (#7433) (by **PENGUINLIONG**)
   - [Lang] Fix pylance warnnings by ti.random (#7439) (by **Zhao Liang**)
   - [ir] Get the StructType of the kernel parameters (by **lin-hitonami**)
   - [ci] Report failure (not throwing exception) when C++ tests fail (#7435) (by **Proton**)
   - [llvm] Allocate the result buffer from preallocated memory (by **lin-hitonami**)
   - [vulkan] Fix GGUI and vulkan swapchain on AMD drivers (#7382) (by **Bob Cao**)
   - [autodiff] Handle return statement (#7389) (by **Mingrui Zhang**)
   - [misc] Remove unnecessary functions of gfx::AotModuleBuilderImpl (#7425) (by **PGZXB**)
   - [bug] Fix offline_cache::clean_offline_cache_files (ti cache clean) (#7426) (by **PGZXB**)
   - [test] Refactor C++ tests runner (#7421) (by **Proton**)
   - [ci] Adjust perfmon GPU freq (#7429) (by **Proton**)
   - [misc] Remove AotModuleParams::enable_lazy_loading (#7424) (by **PGZXB**)
   - [aot] Use graphs.json instead of TCB (#7392) (by **PENGUINLIONG**)
   - [refactor] Introduce KernelCompilationManager (#7409) (by **PGZXB**)
   - [IR] Unified type system for internal operations (#6337) (by **daylily**)
   - [lang] Add is_lvalue() to Expr to check writeback_binary operand (#7414) (by **魔法少女赵志辉**)
   - [bug] Fix get_error_string ret type typo (#7418) (by **Zeyu Li**)
   - [aot] Reorganize graph argument creation process (#7412) (by **PENGUINLIONG**)
   - [Amdgpu] Enable shared array on amdgpu backend (#7403) (by **Zeyu Li**)
   - [Lang] Fix pylance types warning (#7417) (by **Zhao Liang**)
   - [aot] Simplify device capability assignment (#7407) (by **PENGUINLIONG**)
   - [Doc] Update math_module (#7405) (by **Zhao Liang**)
   - [ci] Lock GPU frequency in perf benchmarking (#7413) (by **Proton**)
   - [ci] Add 'Needed single revision' workaround to all tasks (#7408) (by **Proton**)
   - [Doc] Update hello_world.md (#7400) (by **Zhao Liang**)
   - [refactor] Introduce KernelCompiler and implement spirv::KernelCompiler (#7371) (by **PGZXB**)
   - [Amdgpu] Add print kernel amdgcn (#7357) (by **Zeyu Li**)
   - [Doc] Update debugging.md (#7401) (by **Zhao Liang**)
   - [refactor] Disable ASTSerializer::allow_undefined_visitor (#7391) (by **PGZXB**)
   - [amdgpu] Enable llvm FpOpFusion option on AMDGPU backend (#7398) (by **Zeyu Li**)
   - [aot] Add test for shared array (#7387) (by **Ailing**)
   - [vulkan] Change command list submit error message & misc device API cleanups (#7395) (by **Bob Cao**)
   - [bug] Fix arch_uses_spirv (#7399) (by **PGZXB**)
   - [gui] Fix ggui & vulkan swapchain sizes on HiDPI displays (#7394) (by **Bob Cao**)
   - [Doc] Update hello_world.md (#7380) (by **Zhao Liang**)
   - [aot] Remove support for depth24stencil8 format on Metal (#7377) (by **PENGUINLIONG**)
   - [bug] Add DeviceCapabilityConfig to offline cache key (#7384) (by **PGZXB**)
   - [Doc] Update type.md (#7376) (by **Zhao Liang**)
   - [refactor] Remove dependencies on Callable::program in cpp tests (#7373) (by **PGZXB**)
   - [lang] Experimental support of conjugate gradient solver (#7035) (by **pengyu**)
   - [aot] Metal interop APIs (#7366) (by **PENGUINLIONG**)
   - [Doc] Update kernel_function.md (#7375) (by **Zhao Liang**)
   - [gui] Add `fps_limit` for GGUI (#7374) (by **Bob Cao**)
   - [Doc] Update hello_world.md (#7369) (by **Zhao Liang**)
   - [aot] Fix blockers in static library build with XCode (#7365) (by **PENGUINLIONG**)
   - [vulkan] Remove GLFW from Vulkan rhi dependency (#7351) (by **Bob Cao**)
   - [misc] Remove useless semicolon in llvm_program.h (#7372) (by **PGZXB**)
   - [Doc] Update hello_world.md (#7368) (by **Zhao Liang**)
   - [Amdgpu] Add amdgpu backend profiler (#7330) (by **Zeyu Li**)
   - [lang] Stop broadcasting scalar cond in select statements (#7344) (by **魔法少女赵志辉**)
   - [bug] Fix validation erros due to inactive VK_KHR_16bit_storage (#7360) (by **Zhanlue Yang**)
   - [aot] Support texture in Metal (#7363) (by **PENGUINLIONG**)
   - [Doc] Update data_oriented_class.md (#6790) (by **Zhao Liang**)
   - [Doc] Update hello_world.md (#7367) (by **Zhao Liang**)
   - [refactor] Introduce lang::CompiledKernelData (#7340) (by **PGZXB**)
   - [bug] Fix matrix initialization error with numpy.floating data (#7362) (by **Zhanlue Yang**)
   - [Doc] Update kernel_function.md (#7364) (by **Zhao Liang**)
   - [test] [amdgpu] Fix bug with allocs bb in function body (#7308) (by **Zeyu Li**)
   - [Doc] Update hello_world.md (#7354) (by **Zhao Liang**)
   - [aot] Fixed C-API docs (#7361) (by **PENGUINLIONG**)
   - [refactor] Remove dependencies on Callable::program in lang::CompiledGraph::run (#7288) (by **PGZXB**)
   - [DOC] Update llvm_sparse_runtime.md (#7323) (by **Gabriel Vainer**)
   - [Doc] Update profiler.md (#7358) (by **Zhao Liang**)
   - [Doc] Update kernel_function.md (#7356) (by **Zhao Liang**)
   - [aot] Improve Taichi C++ wrapper implementation (#7347) (by **PENGUINLIONG**)
   - [Doc] Update tut.md (#7352) (by **Gabriel Vainer**)
   - [ci] Add doc snippet CI requirements (#7355) (by **Proton**)
   - [amdgpu] Update device memory free (#7346) (by **Zeyu Li**)
   - [Doc] Update type.md (#7350) (by **Zhao Liang**)
   - [aot] Enable 16-bit dtype support for Taichi AOT (#7315) (by **Zhanlue Yang**)
   - [example] Re-implement the Cornell Box demo with shorter lines of code (#7252) (by **HK-SHAO**)
   - [aot] AOT CI refactorization (#7339) (by **PENGUINLIONG**)
   - [llvm] Let the kernel return struct (by **lin-hitonami**)
   - [Doc] Update hello_world.md (#7337) (by **Zhao Liang**)
   - [ci] Reduce doc test concurrency (#7336) (by **Proton**)
   - [ir] Refactor result fetching (by **lin-hitonami**)
   - [ir] Get the offsets of elements in StructType (by **lin-hitonami**)
   - [misc] Delete test.py (#7332) (by **Bob Cao**)
   - [vulkan] More subgroup operations (#7328) (by **Bob Cao**)
   - [vulkan] Add vulkan profiler (#7295) (by **Haidong Lan**)
   - [refactor] Move TaichiLLVMContext::runtime_jit_module and TaichiLLVMContext::create_jit_module() to LlvmRuntimeExecutor (#7320) (by **PGZXB**)
   - [refactor] Remove dependencies on LlvmProgramImpl::get_llvm_context() in TaskCodeGenLLVM (#7321) (by **PGZXB**)
   - [ci] Checkout with privileged token when landing ghstack PRs (#7331) (by **Proton**)
   - [ir] Add fields to StructType (by **lin-hitonami**)
   - [gui] Remove renderable reuse & make renderable immediate (#7327) (by **Bob Cao**)
   - [Gui] GGUI use shader "factory" (GGUI rework n/N) (#7271) (by **Bob Cao**)
   - [bug] Fix u64 field cannot be assigned value >= 2 ** 63 (#7319) (by **Lin Jiang**)
   - [type] Let the compute type of quant uint be unsigned int (by **lin-hitonami**)
   - [doc] Replace slack with discord (#7318) (by **yanqingzhang**)
   - [refactor] Change print statement to warnings.warn in taichi.lang.util.warning (#7301) (by **Jett Chen**)
   - [ci] ChatOps: ghstack land (#7314) (by **Proton**)
   - [refactor] Remove TaichiLLVMContext::lookup_function_pointer() (#7312) (by **PGZXB**)
   - [misc] Update MSVC flags (#7254) (by **Bob Cao**)
   - [doc] [ci] Cover code snippets in docs (#7309) (by **Proton**)
   - [refactor] Remove dependencies on LlvmProgramImpl::get_llvm_context() in KernelCodeGen (#7289) (by **PGZXB**)
   - [rhi] Device upload readback functions (#7278) (by **Bob Cao**)
   - [aot] Fixed external project inclusion (#7297) (by **PENGUINLIONG**)
   - [Doc] Update append docstring (#7265) (by **Zhao Liang**)
   - [refactor] Remove dependencies on Callable::program in lang::get_hashed_offline_cache_key (#7287) (by **PGZXB**)
   - [ci] [amdgpu] Enable amdgpu backend python unit tests (#7293) (by **Zeyu Li**)
   - [Bug] Fix copy_from() of StructField (#7294) (by **Yi Xu**)
   - [ci] Adapt new Android phone behavior (#7306) (by **Proton**)
   - [Bug] Fix caching same loop invariant global vars inside nested fors (#7285) (by **Lin Jiang**)
   - [amdgpu] Part5 enable the api of amdgpu (#7202) (by **Zeyu Li**)
   - [amdgpu] Enable struct for on amdgpu backend (#7247) (by **Zeyu Li**)
   - [misc] Update external/asset which was accidentally downgraded in #7248 (#7284) (by **Lin Jiang**)
   - [amdgpu] Update runtime module (#7248) (by **Zeyu Li**)
   - [llvm] Remove unused argument 'arch' in LlvmProgramImpl::get_llvm_context (#7282) (by **Lin Jiang**)
   - [misc] Remove deprecated kwarg in rw_texture type annotations (#7267) (by **Ailing**)
   - [ci] Tolerate duplicates when registering version (#7281) (by **Proton**)
   - [gui] Fix GGUI destruction order (#7279) (by **Bob Cao**)
   - [doc] Rename /doc/ndarray_android to /doc/tutorial (#7273) (by **Lin Jiang**)
   - [llvm] Unify the llvm context of host and device (#7249) (by **Lin Jiang**)
   - [misc] Fix manylinux2014 warning not printing (#7270) (by **Proton**)
   - [ci] Building: add complete PATH set for conda (#7268) (by **Proton**)
   - [autodiff] Support rsqrt operator (#7259) (by **Mingrui Zhang**)
   - [ci] Update pre-commit repos version (#7257) (by **Proton**)
   - [refactor] Fix "const CompileConfig *" to "const CompileConfig &" (Part2) (#7253) (by **PGZXB**)
   - [refactor] Fix "const CompileConfig *" to "const CompileConfig &" (#7243) (by **PGZXB**)
   - [aot] Added third-party render thread task injection for Unity (#7151) (by **PENGUINLIONG**)
   - [aot] Support statically linked C-API library on MacOS (#7207) (by **Zhanlue Yang**)
   - [gui] Force GGUI to go through host memory (nuking interops) (#7218) (by **Bob Cao**)
   - [Error] Allow IfExp on matrices when the condition is scalar (#7241) (by **Lin Jiang**)
   - [bug] Fix the parity of the RNG (#7239) (by **Lin Jiang**)
   - [Lang] Add better error message for dynamic snode (#7238) (by **Zhao Liang**)
   - [DOC] Update ndarray.md (#7236) (by **Gabriel Vainer**)
   - [Error] Remove deprecations in ti.ui in 1.6.0 (#7229) (by **Lin Jiang**)
   - [Doc] Update llvm_sparse_runtime.md (#7215) (by **Zhao Liang**)
   - [lang] Add validation checks for subscripts to reject negative indices (#7212) (by **Zhanlue Yang**)
   - [refactor] Remove legacy num_bits and acc_offsets from AxisExtractor (#7227) (by **Yi Xu**)
   - [Error] Remove deprecated ti.linalg.sparse_matrix_builder in 1.6.0 (#7228) (by **Lin Jiang**)
   - [Error] Remove deprecations in ASTTransformer in 1.6.0 (#7226) (by **Lin Jiang**)
   - [misc] Export DeviceAllocation into Python & support devalloc in field_info (#7233) (by **Bob Cao**)
   - [gui] Use templated bulk copy to simplify VBO preperation (#7234) (by **Bob Cao**)
   - [rhi] Add create_image_unique stub & misc RHI bug fixes (#7232) (by **Bob Cao**)
   - [opengl] Fix GLFW global context issue (#7230) (by **Bob Cao**)
   - [examples] Remove dependency on `ti.u8` compute type for ngp (#7220) (by **Bob Cao**)
   - [refactor] Remove Kernel::offload_to_executable (#7210) (by **PGZXB**)
   - [opengl] RW image binding & FP16 support (#7219) (by **Bob Cao**)
   - [Error] Remove deprecated a.atomic_op(b) in Taichi v1.6.0 (#7225) (by **Lin Jiang**)
   - [Error] Remove deprecations in taichi/__init__.py in v1.6.0 (#7222) (by **Lin Jiang**)
   - [Error] Raise error when using deprecated ifexp on matrices (#7224) (by **Lin Jiang**)
   - [refactor] Remove legacy BitExtractStmt (#7221) (by **Yi Xu**)
   - [amdgpu] Part4 link bitcode file (#7180) (by **Zeyu Li**)
   - [example] Reorganize example oit_renderer (#7208) (by **Lin Jiang**)
   - [aot] Fix ndarray aot with information from type hints (#7214) (by **Ailing**)
   - [gui] Fix wide line support on macOS (#7205) (by **Bob Cao**)
   - [Lang] Simplify the swizzle generator (#7216) (by **Zhao Liang**)
   - [refactor] Split constructing and compilation of lang::Function (#7209) (by **PGZXB**)
   - [doc] Fix netlify build command (#7217) (by **Ailing**)
   - [ci] M1 buildbot release tag (#7213) (by **Proton**)
   - [misc] Remove unused task_funcs (#7211) (by **PGZXB**)
   - [refactor] Program::this_thread_config() -> Program::compile_config() (#7199) (by **PGZXB**)
   - [doc] Fix format issues of windows debugging (#7197) (by **Olinaaaloompa**)
   - [aot] More OpenGL interop in C-API (#7204) (by **PENGUINLIONG**)
   - [metal] Disable a kernel test in offline cache to unblock CI (#7154) (by **Ailing**)
   - [ci] Switch Windows build script to build.py (#6993) (by **Proton**)
   - [misc] Update submodule taichi_assets (#7203) (by **Lin Jiang**)
   - [mac] Use ObjectLinkingLayer instead of RTDyldObjectLinkingLayer for aarch64 mac (#7201) (by **Ailing**)
   - [misc] Remove unused Program::jit_evaluator_id (#7200) (by **PGZXB**)
   - [misc] Remove legacy latex generation (#7196) (by **Yi Xu**)
   - [Lang] Remove the deprecated dynamic_index switch (#7195) (by **Yi Xu**)
   - [bug] Fix check_matched() failure with Ndarray holding TensorType'd element (#7178) (by **Zhanlue Yang**)
   - [Doc] Remove doc tutorial (#7198) (by **Olinaaaloompa**)
   - [bug] Fix example circle-packing (#7194) (by **Lin Jiang**)
   - [aot] C-API opengl runtime interop (#7120) (by **damnkk**)
   - [Error] Better error message when creating sparse snodes on backends that do not support sparse (#7191) (by **Lin Jiang**)
   - [example] Fix ti gallery close warning (#7187) (by **Zhao Liang**)
   - [lang] Interface refactors for MatrixType and VectorType (#7143) (by **Zhanlue Yang**)
   - [aot] Find Taichi in python wheel (#7181) (by **PENGUINLIONG**)
   - [gui] Update circles rendering to use quads (#7163) (by **Bob Cao**)
   - [Doc] Rename tutorial doc (#7186) (by **Zhao Liang**)
   - [ir] Fix gcc cannot compile inline template specialization (#7179) (by **Lin Jiang**)
   - [Doc] Update tutorial.md (#7176) (by **Zhao Liang**)
   - [aot] Replace std::exchange with local implementation for C++11 (#7170) (by **PENGUINLIONG**)
   - [ci] Fix near cache urls (missing comma) (#7158) (by **Proton**)
   - [docs] Create windows_debug.md (#7164) (by **Bob Cao**)
   - [Doc] Update math_module.md (#7175) (by **Zhao Liang**)
   - [aot] FindTaichi CMake module to help outside project integration (#7168) (by **PENGUINLIONG**)
   - [aot] Removed unused archs in C-API (#7167) (by **PENGUINLIONG**)
   - [Doc] Update debugging.md (#7173) (by **Zhao Liang**)
   - [refactor] Remove dependencies on Program::this_thread_config() in irpass::constant_fold (#7159) (by **PGZXB**)
   - [Doc] Fix C++ tutorial does not display on doc site (#7174) (by **Zhao Liang**)
   - [aot] C++ wrapper for memory slice and memory allocation with host access (#7171) (by **PENGUINLIONG**)
   - [aot] Fixed ti_get_last_error signature (#7165) (by **PENGUINLIONG**)
   - [misc] Log to stderr instead of stdout (#7166) (by **PENGUINLIONG**)
   - [aot] C-API get version wrapper (#7169) (by **PENGUINLIONG**)
   - [doc] Fix spelling of "paticle_field" (#7024) (by **Xiang (Kevin) Li**)
   - [misc] Remove useless Program::sync (#7160) (by **PGZXB**)
   - [doc] Update accelerate_python.md to use ti.max (#7161) (by **Tao Jin**)
   - [doc] Add doc ndarray (#7157) (by **Olinaaaloompa**)
   - [mac] Add .dylib and .cmake to built wheel (#7156) (by **Ailing**)
   - [refactor] Remove dependencies on Program::this_thread_config() in some tests (#7155) (by **PGZXB**)
   - [refactor] Remove dependencies on Program::this_thread_config() in llvm backends codegen (#7153) (by **PGZXB**)
   - [Lang] Remove deprecated packed switch (#7104) (by **Yi Xu**)
   - [example] Update quaternion arithmetics in fractal_3d_ggui (#7139) (by **Zhao Liang**)
   - [doc] Update field.md (Fields advanced) (#6867) (by **Gabriel Vainer**)
   - [ci] Use make_changelog.py to generate the full changelog (#7152) (by **Lin Jiang**)
   - [refactor] Rename Callable::*arg* to Callable::*param* (#7133) (by **PGZXB**)
   - [aot] Introduce new AOT deployment tutorial (#7144) (by **PENGUINLIONG**)
   - [bug] Unify error message matching with/without validation layers for CapiTest.FailMapDeviceOnlyMemory (#7110) (by **Zhanlue Yang**)
   - [lang] Remove redundant TensorType expansion for function returns (#7124) (by **Zhanlue Yang**)
   - [lang] Sign python library for Apple M1 (#7138) (by **PENGUINLIONG**)
   - [gui] Fix particle size limits (#7149) (by **Bob Cao**)
   - [lang] Migrate TensorType expansion in MatrixType/VectorType from Python code to Frontend IR (#7127) (by **Zhanlue Yang**)
   - [aot] Support texture arguments for AOT kernels (#7142) (by **Zhanlue Yang**)
   - [metal] Retain Metal commandBuffers & build command buffers directly (#7137) (by **Bob Cao**)
   - [rhi] Update `create_pipeline` API and add support of VkPipelineCache (#7091) (by **Bob Cao**)
   - [autodiff] Support grad in ndarray (#6906) (by **PhrygianGates**)
   - [Doc] Update doc regarding dynamic index (#7148) (by **Yi Xu**)
   - [refactor] Remove dependencies on Program::this_thread_config() in spirv::lower (#7134) (by **PGZXB**)
   - [Misc] Strictly check ndim with external array (#7126) (by **Haidong Lan**)
   - [ci] Run test when pushing to rc branches (#7146) (by **Lin Jiang**)
   - [refactor] Remove dependencies on Program::this_thread_config() in KernelCodeGen (#7086) (by **PGZXB**)
   - [ci] Disable backward_cpp on macOS (#7145) (by **Proton**)
   - [gui] Fix scene line renderable (#7131) (by **Bob Cao**)
   - [refactor] Remove useless Kernel::from_cache_ (#7132) (by **PGZXB**)
   - [cpu] Reuse VirtualMemoryAllocator for CPU ndarray memory allocation (#7128) (by **Ailing**)
   - [Lang] Raise errors when using the packed switch (#7125) (by **Yi Xu**)
   - [ci] Temporarily disable ad_external_array on Metal (#7136) (by **Bob Cao**)
   - [Error] Raise errors when using metal sparse (#7113) (by **Lin Jiang**)
   - [aot] AOT compat test in workflow (#7033) (by **damnkk**)
   - [Lang] Fix cannot use taichi in REPL (#7114) (by **Zhao Liang**)
   - [lang] Free ndarray memory when it's GC-ed in Python (#7072) (by **Ailing**)
   - [lang] Migrate TensorType expansion for FuncCallExpression from Python code to Frontend IR (#6980) (by **Zhanlue Yang**)
   - [amdgpu] Part2 add runtime (#6482) (by **Zeyu Li**)
   - [refactor] Remove dependencies on Program::this_thread_config() in codegen_cc.cpp (#7088) (by **PGZXB**)
   - [refactor] Remove dependencies on Program::this_thread_config() in gfx::run_codegen (#7089) (by **PGZXB**)
   - [Bug] Fix num_splits in parallel_struct_for (#7121) (by **Yi Xu**)
   - [Doc] Move glossary to top level (#7118) (by **Zhao Liang**)
   - [metal] Update Metal RHI impl & add support for shared arrays (#7107) (by **Bob Cao**)
   - [ci] Update amdgpu ci (#7117) (by **Zeyu Li**)
   - [refactor] Move Kernel::lower() outside the taichi::lang::Kernel (#7048) (by **PGZXB**)
   - [amdgpu] Part1 add codegen (#6469) (by **Zeyu Li**)
   - [Aot] Deprecate element shape and field dim for AOT symbolic args (#7100) (by **Haidong Lan**)
   - [refactor] Remove Program::current_ast_builder() (#7075) (by **PGZXB**)
   - [aot] Switch Metal to SPIR-V codegen (#7093) (by **PENGUINLIONG**)
   - [Lang] Remove deprecated ti.Matrix.rotation2d() (#7098) (by **Yi Xu**)
   - [doc] Modified some errors in the function examples (#7094) (by **welann**)
   - [ci] More Windows git hacks (#7102) (by **Proton**)
   - [Lang] Remove filename kwarg in aot Module save() (#7085) (by **Ailing**)
   - [aot] Rename device capability atomic_i64 to atomic_int64 for consistency (#7095) (by **PENGUINLIONG**)
   - [Lang] Remove sourceinspect deprecation warning message (#7081) (by **Zhao Liang**)
   - [example] Remove gui warning message (#7090) (by **Zhao Liang**)
   - [refactor] Remove unnecessary Kernel::arch (#7074) (by **PGZXB**)
   - [refactor] Remove unnecessary parameter of irpass::scalarize (#7087) (by **PGZXB**)
   - [Bug] Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by **Yi Xu**)
   - [lang] Migrate TensorType expansion for TextureOpExpression from Python code to Frontend IR (#6968) (by **Zhanlue Yang**)
   - [lang] Migrate TensorType expansion for ReturnStmt from Python code to Frontend IR (#6946) (by **Zhanlue Yang**)
   - [doc] Update ndarray deprecation warning to 1.5.0 (#7083) (by **Haidong Lan**)
   - [amdgpu] Update amdgpu module call (#7022) (by **Zeyu Li**)
   - [amdgpu] Add convert addressspace pass related unit test (#7023) (by **Zeyu Li**)
   - [ir] Let real function return nested StructType (by **lin-hitonami**)
   - [ir] Replace FuncCallExpression with FrontendFuncCallStmt (by **lin-hitonami**)
   - [example] Update gallery images (#7053) (by **Zhao Liang**)
   - [Doc] Update type.md (#7038) (by **Zhao Liang**)
   - [misc] Bump version to v1.5.0 (#7077) (by **Lin Jiang**)
   - [rhi] Update Stream `new_command_list` API (#7073) (by **Bob Cao**)
   - [Doc] Fix docstring (#7065) (by **Zhao Liang**)
   - [ci] Workaround windows checkout 'Needed a single revision' issue (#7078) (by **Proton**)
   - [Lang] Make slicing a single row/column of a matrix return a vector (#7068) (by **Yi Xu**)

v1.4.1

1 year ago

Highlights:

Full changelog:

[ci] Tolerate duplicates when registering version (#7281) (by Proton)
[misc] Fix manylinux2014 warning not printing (#7270) (by Proton)
[misc] Bump version to 1.4.1 (by Lin Jiang)
[misc] Update submodule taichi_assets (#7203) (by Lin Jiang)
[bug] Fix example circle-packing (#7194) (by Lin Jiang)

v1.4.0

1 year ago

Deprecation Notice

Support for sparse SNodes on the Metal backend has been removed.
ti.Matrix.rotation2d() has been removed.
The packed switch in ti.init() has been removed.
The dynamic_index switch in ti.init() is now deprecated and will be removed in v1.5.0. See the feature introduction below for details.
Slicing from a single row/column of a matrix (e.g.a[x, a:b]) now returns a vector instead of a matrix.

New features

AOT

Taichi AOT is officially available in Taichi v1.4.0, along with a native Taichi Runtime (TiRT) library taichi_c_api. Native applications can now load compiled AOT modules and launch Taichi kernels without a Python interpreter.

In this release, TiRT has stabilized the Vulkan backend on desktop platforms and Android. You can find prebuilt TiRT binaries on the release page. You can refer to a comprehensive tutorial on the doc site; the detailed TiRT C-API documentation is available at https://docs.taichi-lang.org/docs/taichi_core.

Ndarray

Taichi ndarray is now formally released in v1.4.0. The ndarray is an array object that holds contiguous multi-dimensional data to allow easy exchange with external libraries. See documentation for more details.

Dynamic index

Before v1.4.0, when you wanted to access a vector/matrix with a runtime variable instead of a compile-time constant, you had to set ti.init(dynamic_index=True). However, that option only works for LLVM-based backends (CPU & CUDA) and may slow down runtime performance because all matrices are affected. Starting from v1.4.0, that option is no longer needed. You can use variable indices whenever necessary on all backends without affecting the performance of those matrices with only constant indices.

Improvements

Performance

The compilation speed has been optimized by ~2x.

Example list & ti gallery

Since v1.0.0, we have been enriching our taichi example collection, bringing the number of demos in the gallery window from eight to twelve. Run ti gallery to check out some new demos!

Bug fixes

Incorrect behavior of struct fors on sparse SNodes in certain cases has been fixed. (#7121)
CUDA will no longer allocate extra device memory when performing to_numpy() and from_numpy(). (#7008)
StructType is now allowed as a type hint to ti.func. (#6964)
Incorrect recompilation caused by filling in a matrix field with the same matrix has been fixed. (#6951)
Matrix type inference has been fixed. (#6928)
Getting 64-bit data from ndarrays in the Python scope is now handled correctly. (#6836)
Name collision problem in ti.dataclass has been fixed. (#6737)

Highlights:

Aot module
- Deprecate element shape and field dim for AOT symbolic args (#7100) (by Haidong Lan)
Bug fixes
- Fix num_splits in parallel_struct_for (#7121) (by Yi Xu)
- Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by Yi Xu)
- Fix getting 64-bit data from ndarray in Python scope (#6836) (by Yi Xu)
- Avoid overwriting global tmp with dynamic_index=True (#6820) (by Yi Xu)
Build system
- Deprecate export_core (#7028) (by Zhanlue Yang)
Command line interface
- Add "ti cache clean" command to clean the offline cache files manually (#6937) (by PGZXB)
Documentation
- Update tutorial.md (#7176) (by Zhao Liang)
- Update math_module.md (#7175) (by Zhao Liang)
- Update debugging.md (#7173) (by Zhao Liang)
- Fix C++ tutorial does not display on doc site (#7174) (by Zhao Liang)
- Update doc regarding dynamic index (#7148) (by Yi Xu)
- Move glossary to top level (#7118) (by Zhao Liang)
- Update type.md (#7038) (by Zhao Liang)
- Fix docstring (#7065) (by Zhao Liang)
- Remove packed mode in doc (#7030) (by Zhao Liang)
- Minor doc update (#6952) (by Zhao Liang)
- Glossary (#6101) (by Olinaaaloompa)
- Update dac (#6875) (by Gabriel Vainer)
- Update faq.md (#6921) (by Zhao Liang)
- Update dataclass.md (#6876) (by Gabriel Vainer)
- Update the documentation about Dynamic SNode (#6752) (by Lin Jiang)
- Stop mentioning packed mode (#6755) (by Yi Xu)
Error messages
- Raise errors when using metal sparse (#7113) (by Lin Jiang)
- Do not show warning when the offline cache path does not exist (#7005) (by PGZXB)
GUI
- Support colored texts (#7036) (by Dunfan Lu)
Intermediate representation
- Allow a maximum of 12 SNode indices (#6901) (by Dunfan Lu)
Language and syntax
- Raise errors when using the packed switch (#7125) (by Yi Xu)
- Fix cannot use taichi in REPL (#7114) (by Zhao Liang)
- Remove deprecated ti.Matrix.rotation2d() (#7098) (by Yi Xu)
- Remove filename kwarg in aot Module save() (#7085) (by Ailing)
- Remove sourceinspect deprecation warning message (#7081) (by Zhao Liang)
- Make slicing a single row/column of a matrix return a vector (#7068) (by Yi Xu)
- Deprecate the dynamic_index switch (#7071) (by Yi Xu)
- Add irpass::eliminate_immutable_local_vars() test cases for TensorType (#7043) (by Zhanlue Yang)
- Fix gui docstring (#7003) (by Zhao Liang)
- Support dynamic indexing in spirv (#6990) (by Yi Xu)
- Support dynamic indexing in metal (#6985) (by Yi Xu)
- Support LU sparse solver on CUDA backend (#6967) (by pengyu)
- Fix struct type problem (#6949) (by Zhao Liang)
- Add warning message when converting dynamic snode to numpy (#6853) (by Zhao Liang)
- Deprecate sourceinspect dependency (#6894) (by Zhao Liang)
- Warn users if ndarray size is out of int32 boundary (#6846) (by Yi Xu)
- Remove the real_matrix switch (#6885) (by Yi Xu)
- Enable real_matrix and real_matrix_scalarize by default (#6801) (by Zhanlue Yang)
- Raise an error for the semantic change of transpose() (#6813) (by Yi Xu)
- Add bool type in python as an alias to i32 (#6742) (by daylily)
- Add deprecation warning for the removal of the packed switch (#6753) (by Yi Xu)
Metal backend
- Raise deprecate warning and error when using sparse snodes on metal (#6739) (by Lin Jiang)
Miscellaneous
- Strictly check ndim with external array (#7126) (by Haidong Lan)
- Refactored flattend_values() to avoid potential conflicts in flattened statements (#6749) (by Zhanlue Yang)

Full changelog:

[Doc] Update tutorial.md (#7176) (by Zhao Liang)
[aot] (cherry-pick) Removed unused archs in C-API (#7167), FindTaichi CMake module to help outside project integration (#7168) (#7177) (by PENGUINLIONG)
[docs] Create windows_debug.md (#7164) (by Bob Cao)
[Doc] Update math_module.md (#7175) (by Zhao Liang)
[Doc] Update debugging.md (#7173) (by Zhao Liang)
[Doc] Fix C++ tutorial does not display on doc site (#7174) (by Zhao Liang)
[doc] Fix spelling of "paticle_field" (#7024) (by Xiang (Kevin) Li)
[doc] Update accelerate_python.md to use ti.max (#7161) (by Tao Jin)
[aot] Fixed ti_get_last_error signature (#7165) (by PENGUINLIONG)
[example] Update quaternion arithmetics in fractal_3d_ggui (#7139) (by Zhao Liang)
[doc] Add doc ndarray (#7157) (by Olinaaaloompa)
[doc] Update field.md (Fields advanced) (#6867) (by Gabriel Vainer)
[ci] Use make_changelog.py to generate the full changelog (#7152) (by Lin Jiang)
[aot] Introduce new AOT deployment tutorial (#7144) (by PENGUINLIONG)
[Doc] Update doc regarding dynamic index (#7148) (by Yi Xu)
[Misc] Strictly check ndim with external array (#7126) (by Haidong Lan)
[ci] Run test when pushing to rc branches (#7146) (by Lin Jiang)
[ci] Disable backward_cpp on macOS (#7145) (by Proton)
[gui] Fix scene line renderable (#7131) (by Bob Cao)
[Lang] Raise errors when using the packed switch (#7125) (by Yi Xu)
[cpu] Reuse VirtualMemoryAllocator for CPU ndarray memory allocation (#7128) (by Ailing)
[ci] Temporarily disable ad_external_array on Metal (#7136) (by Bob Cao)
[Error] Raise errors when using metal sparse (#7113) (by Lin Jiang)
[misc] Cherry-pick #7072 into rc-v1.4.0 (#7135) (by Ailing)
[aot] Rename device capability atomic_i64 to atomic_int64 for consistency (#7095) (by PENGUINLIONG)
[Lang] Fix cannot use taichi in REPL (#7114) (by Zhao Liang)
[Bug] Fix num_splits in parallel_struct_for (#7121) (by Yi Xu)
[Doc] Move glossary to top level (#7118) (by Zhao Liang)
[Aot] Deprecate element shape and field dim for AOT symbolic args (#7100) (by Haidong Lan)
[Lang] Remove deprecated ti.Matrix.rotation2d() (#7098) (by Yi Xu)
[doc] Modified some errors in the function examples (#7094) (by welann)
[ci] More Windows git hacks (#7102) (by Proton)
[Lang] Remove filename kwarg in aot Module save() (#7085) (by Ailing)
[Lang] Remove sourceinspect deprecation warning message (#7081) (by Zhao Liang)
[example] Remove gui warning message (#7090) (by Zhao Liang)
[Bug] Fix ret_type and cast_type of UnaryOpStmt in Scalarize (#7082) (by Yi Xu)
[doc] Update ndarray deprecation warning to 1.5.0 (#7083) (by Haidong Lan)
[example] Update gallery images (#7053) (by Zhao Liang)
[Doc] Update type.md (#7038) (by Zhao Liang)
[Doc] Fix docstring (#7065) (by Zhao Liang)
[Lang] Make slicing a single row/column of a matrix return a vector (#7068) (by Yi Xu)
[ci] Workaround windows checkout 'Needed a single revision' issue (#7078) (by Proton)
[lang] Make sure ndarrays created in python frontend are initialized as zero (#7060) (by Ailing)
[Lang] Deprecate the dynamic_index switch (#7071) (by Yi Xu)
[misc] Update python package metadata (#7063) (by Proton)
[bug] Fixed compilation error caused by #7047 (#7069) (by PGZXB)
[opt] Automatically identify allocas to scalarize (#7055) (by Yi Xu)
[refactor] Remove ir parameter of KernelCodeGen::KernelCodeGen(Kernel *kernel, IRNode *ir) (#7046) (by PGZXB)
[refactor] Remove unnecessary IRNode::kernel (#7047) (by PGZXB)
[refactor] Remove dependencies on Program::current_ast_builder() in C++ side (#7044) (by PGZXB)
[ci] Version sanity check before publishing (#7062) (by Proton)
[ci] Make changelog generation working again (#7058) (by Proton)
[rhi] Update CommandList dispatch API (#7052) (by Bob Cao)
[aot] C-API versioning (#7050) (by PENGUINLIONG)
[refactor] Remove offloaded parameter of Program::compile() (#7045) (by PGZXB)
[lang] Migrate TensorType expansion for subscription indices from Python to Frontend IR (#6942) (by Zhanlue Yang)
[opt] Add ExtractPointers pass for dynamic index (#7051) (by Yi Xu)
[Lang] Add irpass::eliminate_immutable_local_vars() test cases for TensorType (#7043) (by Zhanlue Yang)
[Lang] Fix gui docstring (#7003) (by Zhao Liang)
[rhi] Update compute CommandList APIs (except dispatch) (#7037) (by Bob Cao)
[ir] Let GetElementExpression&Statement support index list (#7049) (by Lin Jiang)
[aot] C-API opengl runtime interop (#7042) (by PENGUINLIONG)
[ci] Pin pre-commit python version to 3.10 (#7041) (by Proton)
[opengl] Enable more gles tests in CI (#7031) (by Ailing)
[ci] Tuning headless demo VRAM usage (#7039) (by Proton)
[Build] Deprecate export_core (#7028) (by Zhanlue Yang)
[GUI] Support colored texts (#7036) (by Dunfan Lu)
[aot] Revert "C-API opengl runtime interop (#7014)" (#7032) (by Proton)
[ci] Update pre-commit app versions (#7025) (by Proton)
[Doc] Remove packed mode in doc (#7030) (by Zhao Liang)
Revert "[opengl] Enable more gles tests in CI" (#7029) (by Ailing)
[build] Remove libexport_core.so dependency for Android App CI (#6997) (by Zhanlue Yang)
[opengl] Enable more gles tests in CI (#7010) (by Ailing)
[aot] C-API opengl runtime interop (#7014) (by damnkk)
[misc] Add macro to control amdgpu-related header file (#7021) (by Zeyu Li)
[bug] Fix device memory allocation for numpy array on CUDA backend (#7008) (by Zhanlue Yang)
[ci] Try enabling MSVC and check build times (#6905) (by Bob Cao)
[gfx] Update Device API: Splitting ResourceBinder into seperate Shade… (#7020) (by Proton)
[gfx] Revert "Update Device API: Splitting ResourceBinder into sepera… (#7019) (by Proton)
[amdgpu] Update amdgpu device to new API (#7018) (by Bob Cao)
[perf] Fix fill ndarray size problem. (#6992) (by Haidong Lan)
[cuda] Fix LLVM15 rsqrt perf regression (#7012) (by Haidong Lan)
[gfx] Update Device API: Splitting ResourceBinder into seperate ShaderResourceSet & RasterResources (#6954) (by Bob Cao)
[opt] Add ImmediateIRModifier to provide amortized constant-time replace_usages_with() (#7001) (by Yi Xu)
[amdgpu] Part0 add render hardware interface (#6464) (by Zeyu Li)
[Error] Do not show warning when the offline cache path does not exist (#7005) (by PGZXB)
[Lang] [spirv] Support dynamic indexing in spirv (#6990) (by Yi Xu)
[misc] Remove unnecessary CompileConfig::lazy_compilation (#7009) (by PGZXB)
[ci] Add C++ tests on AMDGPU RHI (#6597) (by Zeyu Li)
[ci] Update taichi-release-tests branch (disable QuanTaichi GOL) (#7011) (by Proton)
[amdgpu] Part3 update runtime module (#6486) (by Zeyu Li)
[opengl] Fix tests running both on opengl and vulkan (#7006) (by Ailing)
[ir] Record the return types to a StructType (#6995) (by Lin Jiang)
[lang] Get the CHI-IR struct type in python (#6994) (by Lin Jiang)
[ir] Change type maps to unordered maps and add mutexes (#7000) (by Lin Jiang)
[ir] Add struct type to CHI-IR (#6982) (by Lin Jiang)
[misc] Add repography activity stats (#6991) (by Proton)
[aot] Enable validation layers for C-API tests (#6893) (by Zhanlue Yang)
[opengl] Add ti.gles arch and enable tests (#6988) (by Ailing)
[Lang] [metal] Support dynamic indexing in metal (#6985) (by Yi Xu)
[opengl] Reset opengl context when taichi program resets (#6987) (by Ailing)
[Lang] Support LU sparse solver on CUDA backend (#6967) (by pengyu)
[misc] Keeping up with new python-wheel implementation (#6986) (by Proton)
[aot] Recover AOT CI script (#6970) (by PENGUINLIONG)
[lang] Migrate TensorType expansion for svd from Python code to Frontend IR (#6972) (by Zhanlue Yang)
[misc] Adding XCode project support (#6976) (by Bob Cao)
[bug] Fix taichi_ngp starting from ti example (#6973) (by Ailing)
[ci] Revert "Fix missing c_api.so in linux nightly" (#6974) (by Ailing)
[ci] Build: auto install vulkan on Linux (#6969) (by Proton)
[ci] Auto setup miniforge3 env when build (#6966) (by Proton)
[Lang] Fix struct type problem (#6949) (by Zhao Liang)
[aot] C-API breaking changes! (#6955) (by PENGUINLIONG)
[lang] Fix scalarization for PrintStmt (#6945) (by Zhanlue Yang)
[bug] Allow StructType as type hint to ti.func (#6964) (by Yi Xu)
[refactor] Remove legacy code for dynamic index (#6961) (by Yi Xu)
[aot] Fix rwtexture with template_args (#6960) (by Ailing)
[ci] Fix missing c_api.so in linux nightly (#6962) (by Ailing)
[lang] Migrate TensorType expansion for SNode indices from Python to Frontend IR (#6934) (by Zhanlue Yang)
[doc] New FAQ added (#6963) (by Olinaaaloompa)
[ci] Sync CI cache script & workflow (#6959) (by Proton)
[ci] Update release test branch, reduce running time (#6944) (by Proton)
[ci] Remove redundant tests (#6947) (by Proton)
[bug] Fix recompilation of filling a matrix field with the same matrix (#6951) (by Yi Xu)
[aot] Fixed C-API behavior tests (#6939) (by PENGUINLIONG)
[refactor] Remove _PyScopeMatrixImpl (#6943) (by Yi Xu)
[aot] Fix validation warning: OpImageFetch should operate on OpImage instead of OpSampledImage (#6925) (by Zhanlue Yang)
[CLI] Add "ti cache clean" command to clean the offline cache files manually (#6937) (by PGZXB)
[Doc] Minor doc update (#6952) (by Zhao Liang)
[ci] Fix forgotten build script paths (#6941) (by Proton)
[opt] Add pass eliminate_immutable_local_vars (#6926) (by Yi Xu)
[ci] Fix pre-commit errors (#6940) (by Proton)
[doc] Editorial updates (#6935) (by Olinaaaloompa)
[ci] Workflow Rewrite: Building on Linux (#6848) (by Proton)
[refactor] Remove _IntermediateMatrix and _MatrixFieldElement (#6932) (by Yi Xu)
[aot] C_API behavior test (#6904) (by damnkk)
[lang] Fix matrix type inference and remove _MatrixEntriesInitializer (#6928) (by Yi Xu)
[lang] Reorder sparse matrix before solving (#6886) (by pengyu)
[Doc] Glossary (#6101) (by Olinaaaloompa)
[aot] Refactor C-API error tests (#6890) (by Zhanlue Yang)
[doc] Update layout.md (Fields) (#6868) (by Gabriel Vainer)
[Doc] Update dac (#6875) (by Gabriel Vainer)
[lang] Support 'len' with Matrix-typed operands (#6923) (by Zhanlue Yang)
[doc] Update sparse.md (#6908) (by Gabriel Vainer)
[doc] Update performance.md (#6911) (by Gabriel Vainer)
[doc] Update debugging.md (#6909) (by Gabriel Vainer)
[doc] Update profiler.md (#6910) (by Gabriel Vainer)
[bug] Add GetElementExpression to offline cache key (#6918) (by PGZXB)
[ci] Reenable AMDGPU CI, disable OpenGL tests in AMDGPU task (#6887) (by Proton)
[lang] Fix accidental changes during matrix refactor (#6914) (by Yi Xu)
[example] Add circle-packing example (#6870) (by Zhao Liang)
[Doc] Update faq.md (#6921) (by Zhao Liang)
[misc] Show suggestion when locking metadata.lock fails (#6919) (by PGZXB)
[doc] New FAQs (#6055) (by Olinaaaloompa)
[example] Add poission disk sampling example (#6852) (by Zhao Liang)
[vulkan] Improve Vulkan RHI impl with lower overhead internal implementations (#6912) (by Bob Cao)
[doc] Link to LLVM 15 built for Visual Studio 2022 (#6916) (by PENGUINLIONG)
[lang] Fix issue of IfExpr with TensorTyped operands (#6897) (by Zhanlue Yang)
[doc] Update hello_world.md (#6889) (by Gabriel Vainer)
[IR] Allow a maximum of 12 SNode indices (#6901) (by Dunfan Lu)
[doc] Update odop.md (#6874) (by Gabriel Vainer)
[doc] Update external.md (#6869) (by Gabriel Vainer)
[Doc] Update dataclass.md (#6876) (by Gabriel Vainer)
[doc] Update cloth_simulation.md (#6898) (by Vissidarte-Herman)
[example] Update marching squares example (#6851) (by Zhao Liang)
[Lang] Add warning message when converting dynamic snode to numpy (#6853) (by Zhao Liang)
[Lang] Deprecate sourceinspect dependency (#6894) (by Zhao Liang)
[aot] Added C-API behavior tests (#6871) (by damnkk)
[aot] Gather satellite repo URLs (#6860) (by PENGUINLIONG)
[refactor] Remove _TiScopeMatrixImpl (#6892) (by Yi Xu)
[ci] Python test minor fixes (#6891) (by Proton)
[ir] Add ir_traits namespace to use less dynamic casts & Run CFG only ever once (#6812) (by Bob Cao)
[Lang] Warn users if ndarray size is out of int32 boundary (#6846) (by Yi Xu)
[build] Enable strip for libtaichi_c_api.so with Release Build (#6845) (by Zhanlue Yang)
[Lang] Remove the real_matrix switch (#6885) (by Yi Xu)
[build] Turn on function level linking for taichi_c_api (#6840) (by Zhanlue Yang)
[test] Remove tests with real_matrix=True and real_matrix_scalarize=True (#6873) (by Yi Xu)
[misc] Revert back to master after #6843 merged (#6883) (by Bob Cao)
[vulkan] Cleanup spdlog related logging from Vulkan RHI (#6843) (by Bob Cao)
[ci] Temporarily disable AMDGPU CI (#6872) (by Proton)
[Lang] Enable real_matrix and real_matrix_scalarize by default (#6801) (by Zhanlue Yang)
[bug] MatrixType bug fix: Fix error with static-grouped-ndrange (#6839) (by Zhanlue Yang)
[example] Fix jacobian example (#6849) (by Mingrui Zhang)
[bug] Fix flaky mass_spring_game_ggui.py on Mac M1 by setting up default values for VulkanCapabilities (#6850) (by Zhanlue Yang)
[example] Solve implicit fem using sparsee solver (#6827) (by pengyu)
[build] Migrate cmake targets from OBJECT to STATIC for libtaichi_c_api.so (#6831) (by Zhanlue Yang)
[Bug] Fix getting 64-bit data from ndarray in Python scope (#6836) (by Yi Xu)
[test] Avoid constant folding in overflow tests (#6835) (by Ailing)
[aot] Added C-API behavior test (#6837) (by damnkk)
[bug] Matrix refactor bug fix: Fix cross scope matrix operations (#6822) (by Zhanlue Yang)
[build] Refactored and removed RuntimeCUDA and RuntimeCUDAInjector (#6830) (by Zhanlue Yang)
[bug] Matrix refactor bug fix: Fix logical binary operations with TensorTyped operands (#6817) (by Zhanlue Yang)
[example] Add order-independent transparency example (#6829) (by Lin Jiang)
[opt] Re-enable constant folding when debug=True (#6824) (by Ailing)
[Bug] Avoid overwriting global tmp with dynamic_index=True (#6820) (by Yi Xu)
[bug] Matrix refactor bug fix: Fix restrictions on BinaryOp/TernaryOp operands' broadcasting (#6805) (by Zhanlue Yang)
[aot] C-API Device capability improvements (#6773) (by PENGUINLIONG)
[misc] Headers dependency cleanup from RHI (#6699) (by Bob Cao)
[ci] Revert "Temporarily disable desktop headless tests (#6811)" (#6816) (by Proton)
[misc] Bump version to v1.4.0 (#6804) (by PENGUINLIONG)
[ci] Add AMDGPU relected ci (#6743) (by Zeyu Li)
[test] Remove unnecessary duplicated python runtime test runs (#6808) (by Ailing)
[Lang] Raise an error for the semantic change of transpose() (#6813) (by Yi Xu)
[refactor] Remove unnecessary checks in program (#6802) (by Ailing)
[vulkan] Support texture type args in aot add_kernel (#6796) (by Ailing)
[ci] Temporarily disable desktop headless tests (#6811) (by Proton)
[bug] Fix name collision in ti.dataclass (#6737) (by Yi Xu)
[bug] MatrixType bug fix: Add additional restrictions for unpacking a Matrix (#6795) (by Zhanlue Yang)
[doc] Update docstring for grad replaced (#6800) (by Mingrui Zhang)
[build] Add MSBuild option to setup.py (#6724) (by Bob Cao)
[Lang] [type] Add bool type in python as an alias to i32 (#6742) (by daylily)
[lang] Use less gpu memory when building sparse matrix (#6781) (by pengyu)
[example] Add cuda options for sparse matrix examples (#6785) (by pengyu)
[misc] Remove usage of deprecated num_channels/channel_format type hint in rw_texture in codebase (#6791) (by Ailing)
[bug] MatrixType bug fix: Fix error with BLS (#6664) (by Zhanlue Yang)
[vulkan] Support rw_texture in aot add_kernel (#6789) (by Ailing)
[bug] MatrixType bug fix: Fix error with quant (#6776) (by Yi Xu)
[bug] MatrixType bug fix: Fix test_ad_gdar_diffmpm (#6786) (by Yi Xu)
[vulkan] Deprecate num_channels and channel_format args in rw_texture type annotation (#6782) (by Ailing)
[misc] Remove the default potential_bug label on bug report issues (#6784) (by Ailing)
[bug] MatrixType bug fix: Fix error with texture (#6775) (by Yi Xu)
[vulkan] Make sure kernel recompiles when texture dtype changes (#6774) (by Ailing)
[aot] Clean up exported symbols for libtaichi_c_api.so (#6140) (by Zhanlue Yang)
[Misc] Refactored flattend_values() to avoid potential conflicts in flattened statements (#6749) (by Zhanlue Yang)
[aot] Warn the user about out-of-range access in C++ wrapper (#6492) (by PENGUINLIONG)
[build] Initial distributed compiling support (#6762) (by Proton)
[aot] Revert C-API Device capability improvements (#6772) (by PENGUINLIONG)
[aot] C-API Device capability improvements (#6702) (by PENGUINLIONG)
[aot] C-API to get available archs (#6766) (by PENGUINLIONG)
[doc] Update sparse matrix document (#6719) (by pengyu)
[autodiff] Separate non-linear operators to an individual class (#6700) (by Mingrui Zhang)
[bug] Fix dereferencing nullptr (#6763) (by Yi Xu)
[Doc] Update the documentation about Dynamic SNode (#6752) (by Lin Jiang)
[doc] Update dev install about clang version (#6759) (by Ailing)
[build] Improve TI_WITH_CUDA guards for CUDA related test cases (#6698) (by Zhanlue Yang)
[Lang] Add deprecation warning for the removal of the packed switch (#6753) (by Yi Xu)
[lang] Improve sparse matrix building on GPU (#6748) (by pengyu)
[aot] JSON serde (#6754) (by PENGUINLIONG)
[bug] MatrixType bug fix: Fix error with to_numpy() and from_numpy() (#6726) (by Zhanlue Yang)
[Doc] Stop mentioning packed mode (#6755) (by Yi Xu)
[lang] Get the length of dynamic SNode by x.length() (#6750) (by Lin Jiang)
[llvm] Support nested struct with matrix return value on real function (#6734) (by Lin Jiang)
[Metal] [error] Raise deprecate warning and error when using sparse snodes on metal (#6739) (by Lin Jiang)
[build] Integrate backward_cpp to test targets for enabling C++ stack trace (#6697) (by Zhanlue Yang)
[aot] Load AOT module from memory (#6692) (#6714) (by PENGUINLIONG)
[ci] Add dockerfile.ubuntu-18.04.amdgpu (#6736) (by Zeyu Li)
[doc] Update LLVM10 -> LLVM15 in installation guide (#6747) (by Zhanlue Yang)
[misc] Fix warnings of taichi examples (#6740) (by PGZXB)
[example] Ti-example: instant ngp renderer (#6673) (by Youtian Lin)
[build] Use a separate prebuilt llvm15 binary for manylinux environment (#6732) (by Ailing)

v1.3.0

1 year ago

Deprecation Notice

Using sparse data structures on the Metal backend is now deprecated. The support for Dynamic SNode has been removed in v1.3.0, and the support for Pointer/Bitmasked SNode will be removed in v1.4.0.
The packed switch in ti.init() is now deprecated and will be removed in v1.4.0. See the feature introduction below for details.
ti.Matrix.rotation2d() is now deprecated and will be removed in v1.4.0. Use ti.math.rotation2d() instead.
To clearly distinguish vectors from matrices, transpose() on a vector is no longer allowed. If you want something like a @ b.transpose(), write a.outer_product(b) instead.
Ndarray: The arguments of ndarray type annotation element_dim, element_shape and field_dim will be deprecated in v1.4.0. The field_dim is renamed to ndim to make it more intuitive. element_dim and element_shape will be replaced by passing a matrix type into dtype argument. For example, the ti.types.ndarray(element_dim=2, element_shape=(3,3)) will be replaced by ti.types.ndarray(dtype=ti.matrix(3,3)).

New features

Dynamic SNode

To support variable-length fields, Taichi provides dynamic SNodes. You can now use the dynamic SNode on fields of different data types, even struct fields and matrix fields. You can use x[i].append(...) to append an element, use x[i].length() to get the length, and use x[i].deactivate() to clear the list as shown in the following code snippet.

pair = ti.types.struct(a=ti.i16, b=ti.i64)
pair_field = pair.field()

block = ti.root.dense(ti.i, 4)
pixel = block.dynamic(ti.j, 100, chunk_size=4)
pixel.place(pair_field)
l = ti.field(ti.i32)
ti.root.dense(ti.i, 5).place(l)

@ti.kernel
def dynamic_pair():
    for i in range(4):
        pair_field[i].deactivate()
        for j in range(i * i):
            pair_field[i].append(pair(i, j + 1))
        # pair_field = [[],
        #              [(1, 1)],
        #              [(2, 1), (2, 2), (2, 3), (2, 4)],
        #              [(3, 1), (3, 2), ... , (3, 8), (3, 9)]]
        l[i] = pair_field[i].length()  # l = [0, 1, 4, 9]

Packed Mode

Packed mode was introduced in v0.8.0 to allow users to trade runtime performance for memory usage. In v1.3.0, after the elimination of runtime overhead in common cases, packed mode has become the default mode. There's no longer any automatic padding behavior behind the scenes, so users can use fields and SNodes without surprise.

Sparse Matrix

We introduce the experimental sparse matrix and sparse solver on the CUDA backend. The API of using is the same as CPU backend. Currently, only the f32 data type and LLT linear solver are supported on CUDA. You can only use ti.ndarray to compute SpMV and linear solver operation. Float64 data type and other linear solvers are under implementation.

Improvements

Python Frontend

Matrix slicing now supports augmented assign (e.g. +=) besides assign.

Taichi Examples

Our user https://github.com/Linyou contributed an excellent example on instant ngp renderer PR #6673. Run taichi_ngp to check it out!

[Developers only] LLVM15 upgrade

Starting from v1.3.0, Taichi has upgraded its LLVM dependency to version 15.0.0. If you're interested in contributing or simply building Taichi from source, please follow our installation doc for developers. Note this change has no impact on Taichi users.

Highlights

Documentation
- Update the documentation about Dynamic SNode (#6752) (by Lin Jiang)
- Stop mentioning packed mode (#6755) (by Yi Xu)
Language and syntax
- Add deprecation warning for the removal of the packed switch (#6753) (by Yi Xu)
Metal backend
- Raise deprecate warning and error when using sparse snodes on metal (#6739) (by Lin Jiang)

Full changelog

[aot] Revert C-API Device capability improvements (#6772) (by PENGUINLIONG)
[aot] C-API Device capability improvements (#6702) (by PENGUINLIONG)
[aot] C-API to get available archs (#6766) (by PENGUINLIONG)
[doc] Update sparse matrix document (#6719) (by pengyu)
[autodiff] Separate non-linear operators to an individual class (#6700) (by Mingrui Zhang)
[bug] Fix dereferencing nullptr (#6763) (by Yi Xu)
[Doc] Update the documentation about Dynamic SNode (#6752) (by Lin Jiang)
[doc] Update dev install about clang version (#6759) (by Ailing)
[build] Improve TI_WITH_CUDA guards for CUDA related test cases (#6698) (by Zhanlue Yang)
[Lang] Add deprecation warning for the removal of the packed switch (#6753) (by Yi Xu)
[lang] Improve sparse matrix building on GPU (#6748) (by pengyu)
[aot] JSON serde (#6754) (by PENGUINLIONG)
[bug] MatrixType bug fix: Fix error with to_numpy() and from_numpy() (#6726) (by Zhanlue Yang)
[Doc] Stop mentioning packed mode (#6755) (by Yi Xu)
[lang] Get the length of dynamic SNode by x.length() (#6750) (by Lin Jiang)
[llvm] Support nested struct with matrix return value on real function (#6734) (by Lin Jiang)
[Metal] [error] Raise deprecate warning and error when using sparse snodes on metal (#6739) (by Lin Jiang)
[build] Integrate backward_cpp to test targets for enabling C++ stack trace (#6697) (by Zhanlue Yang)
[aot] Load AOT module from memory (#6692) (#6714) (by PENGUINLIONG)
[ci] Add dockerfile.ubuntu-18.04.amdgpu (#6736) (by Zeyu Li)
[doc] Update LLVM10 -> LLVM15 in installation guide (#6747) (by Zhanlue Yang)
[misc] Fix warnings of taichi examples (#6740) (by PGZXB)
[example] Ti-example: instant ngp renderer (#6673) (by Youtian Lin)
[build] Use a separate prebuilt llvm15 binary for manylinux environment (#6732) (by Ailing)

v1.2.2

1 year ago

Molten-vk version is downgraded to v1.1.10 to fix a few GGUI issues.

Full changelog:

[build] Downgrade molten-vk version to v1.1.10 (#6564) (by Zhanlue Yang)

v1.2.1

1 year ago

This is a bug fix release for v1.2.0.

Full changelog:

[mesh] Fix MeshTaichi warnings in CUDA backend (#6369) (by Chang Yu)
[Bug] Fix cache_loop_invariant_global_vars pass (#6462) (by Lin Jiang)

v1.2.0

1 year ago

Starting from the v1.2.0 release, Taichi follows semantic versioning where regular releases cutting from master branch bumps MINOR version and PATCH version is only bumped when cherry-picking critial bug fixes.

Deprecation Notice

Indexing multi-dimensional ti.ndrange() with a single loop index will be disallowed in future releases.

Highlights

New features

Offline Cache

We introduced the offline cache on CPU and CUDA backends in v1.1.0. In this release, we support this feature on other backends, including Vulkan, OpenGL, and Metal.

If your code behaves abnormally, disable offline cache by setting the environment variable TI_OFFLINE_CACHE=0 or offline_cache=False in the ti.init() method call and file an issue with us on Taichi's GitHub repo.
See Offline cache for more information.

GDAR (Global Data Access Rule)

A checker is provided for detecting potential violations of global data access rules.

The checker only works in debug mode. To enable it, set debug=True when calling ti.init().
Set validation=True when using ti.ad.Tape() to validate the kernels captured by ti.ad.Tape(). If a violation occurs, the checker pinpoints the line of code breaking the rules.

For example:

import taichi as ti
ti.init(debug=True)

N = 5
x = ti.field(dtype=ti.f32, shape=N, needs_grad=True)
loss = ti.field(dtype=ti.f32, shape=(), needs_grad=True)
b = ti.field(dtype=ti.f32, shape=(), needs_grad=True)

@ti.kernel
def func_1():
    for i in range(N):
        loss[None] += x[i] * b[None]

@ti.kernel
def func_2():
    b[None] += 100

b[None] = 10
with ti.ad.Tape(loss, validation=True):
    func_1()
    func_2()

"""
taichi.lang.exception.TaichiAssertionError:
(kernel=func_2_c78_0) Breaks the global data access rule. Snode S10 is overwritten unexpectedly.
File "across_kernel.py", line 16, in func_2:
    b[None] += 100
    ^^^^^^^^^^^^^^
"""

Improvements

Performance

Improved Vulkan performance with loops (#6072) (by Lin Jiang)

Python Frontend

PrefixSumExecutor is added to improve the performance of prefix-sum operations. The legacy prefix-sum function allocates auxiliary gpu buffers at every function call, which causes an obvious performance problem. The new PrefixSumExecutor is able to avoid allocating buffers again and again. For arrays with the same length, the PrefixSumExecutor only needs to be initialized once, then it is able to perform any number of times prefix-sum operations without redundant field allocations. The prefix-sum operation is only supported on CUDA backend currently. (#6132) (by Yu Zhang)

Usage:
```
N = 100
arr0 = ti.field(dtype, N)
arr1 = ti.field(dtype, N)
arr2 = ti.field(dtype, N)
arr3 = ti.field(dtype, N)
arr4 = ti.field(dtype, N)

# initialize arr0, arr1, arr2, arr3, arr4, ...
# ...

# Performing an inclusive in-place's parallel prefix sum,
# only one executor is needed for a specified sorting length.
executor = ti.algorithms.PrefixSumExecutor(N)
executor.run(arr0)
executor.run(arr1)
executor.run(arr2)
executor.run(arr3)
executor.run(arr4)
```
Runtime integer overflow detection on addition, subtraction, multiplication and shift left operators on Vulkan, CPU and CUDA backends is now available when debug mode is on. To use overflow detection on Vulkan backend, you need to enable printing, and the overflow detection of 64-bit multiplication on Vulkan backend requires NVIDIA driver 510 or higher. (#6178) (#6279) (by Lin Jiang)

For the following program:
```
import taichi as ti

ti.init(debug=True)

@ti.kernel
def add(a: ti.u64, b: ti.u64)->ti.u64:
    return a + b

add(2 ** 63, 2 ** 63)
  The following warning is printed at runtime:
Addition overflow detected in File "/home/lin/test/overflow.py", line 7, in add:
    return a + b
           ^^^^^
```
Printing is now supported on Vulkan backend on Unix/Windows platforms. To enable printing on vulkan backend, follow instructions at https://docs.taichi-lang.org/docs/master/debugging#applicable-backends (#6075) (by Ailing)

GGUI

Setting the initial position of GGUI window is now supported. Please refer to this link https://docs.taichi-lang.org/docs/master/ggui#create-a-window to checkout details and usage. (#6156) (by Mocki)

Taichi Examples

Three new examples from community contributors are also merged in this release. They include:

Animating the fundamental solution of a Laplacian equation, (#6249) (by @bismarckkk)
Animating the Kerman vortex street using LBM, (#6249) (by @hietwl)
Animating the two streams of instability (#6249) (by JiaoLuhuai)

You can view these examples by running ti example in terminal and select the corresponding index.

Important bug fixes

"ti.data_oriented" class instance now correctly releases its allocated memory upon garbage collection. (#6256) (by Zhanlue Yang)
"ti.fields" can now be correctly indexed using non-i32 typed indices. (#6276) (by Zhanlue Yang)
"ti.select" and "ti.ifte" can now be printed correctly in Taichi Kernels. (#6297) (by Zhanlue Yang)
Before this release, setting u64 arguments with numbers greater than 2^63 raises error, and u64 return values are treated as i64 in Python (integers greater than 2^63 are returned as negative numbers). This release fixed those two bugs. (#6267) (#6364) (by Lin Jiang)
Taichi now raises an error when the number of the loop variables does not match the dimension of the ndrange for loop instead of malfunctioning. (#6360) (by Lin Jiang)
calling ti.append with vector/matrix now throws more proper error message. (#6322) (by Ailing)
Division on unsigned integers now works properly on LLVM backends. (#6128) (by Yi Xu)
Operator ">>=" now works properly. (#6153) (by Yi Xu)
Numpy int is now allowed for SNode shape setting. (#6211) (by Yi Xu)
Dimension check for GlobalPtrStmt is now aware of whether it is a cell access. (#6275) (by Yi Xu)
Before this release, Taichi autodiff may fail in cases where the condition of an if statement depends on the index of a outer for-loop. The bug has been fixed in this release. (#6207) (by Mingrui Zhang)

Full changelog:

[Error] Deprecate ndrange with number of the loop variables != the dimension of the ndrange (#6422) (by Lin Jiang)
Adjust aot_demo.sh (by jim19930609)
[error] Warn Linux users about manylinux2014 build on startup i(#6416) (by Proton)
[misc] Bug fix (by jim19930609)
[misc] Bump version (by jim19930609)
[vulkan] [bug] Stop using the buffer device address feature on macOS (#6415) (by Yi Xu)
[Lang] [bug] Allow filling a field with Expr (#6391) (by Yi Xu)
[misc] Rc v1.2.0 cherry-pick PR number 2 (#6384) (by Zhanlue Yang)
[misc] Revert PR 6360 (#6386) (by Zhanlue Yang)
[misc] Rc v1.2.0 c1 (#6380) (by Zhanlue Yang)
[bug] Fix potential bug in #6362 (#6363) (#6371) (by Zhanlue Yang)
[example] Add example "laplace equation" (#6302) (by 猫猫子Official)
[ci] Android Demo: leave Docker containers intact for debugging (#6357) (by Proton)
[autodiff] Skip gradient kernel compilation for validation kernel (#6356) (by Mingrui Zhang)
[autodiff] Move autodiff gdar checker to release (#6355) (by Mingrui Zhang)
[aot] Removed constraint on same-allocation copy (#6354) (by PENGUINLIONG)
[ci] Add new performance monitoring (#6349) (by Proton)
[dx12] Only use llvm to compile dx12. (#6339) (by Xiang Li)
[opengl] Fix with_opengl when TI_WITH_OPENGL is off (#6353) (by Ailing)
[Doc] Add instructions about running clang-tidy checks locally (by Ailing Zhang)
[build] Enable readability-redundant-member-init in clang-tidy check (by Ailing Zhang)
[build] Enable TI_WITH_VULKAN and TI_WITH_OPENGL for clang-tidy checks (by Ailing Zhang)
[build] Enable a few modernize checks in clang-tidy (by Ailing Zhang)
[autodiff] Recover kernel autodiff mode after validation (#6265) (by Mingrui Zhang)
[test] Adjust rtol for sparse_linear_solver tests (#6352) (by Ailing)
[lang] MatrixType bug fix: Fix array indexing with MatrixType-index (#6323) (by Zhanlue Yang)
[Lang] MatrixNdarray refactor part13: Add scalarization for TernaryOpStmt (#6314) (by Zhanlue Yang)
[Lang] MatrixNdarray refactor part12: Add scalarization for AtomicOpStmt (#6312) (by Zhanlue Yang)
[build] Enable a few modernize checks in clang-tidy (by Ailing Zhang)
[build] Enable google-explicit-constructor check in clang-tidy (by Ailing Zhang)
[build] Enable google-build-explicit-make-pair check in clang-tidy (by Ailing Zhang)
[build] Enable a few bugprone related rules in clang-tidy (by Ailing Zhang)
[build] Enable modernize-use-override in clang-tidy (by Ailing Zhang)
[ci] Use .clang-tidy for check_static_analyzer job (by Ailing Zhang)
[mesh] Support arm64 backend for MeshTaichi (#6329) (by Chang Yu)
[lang] Throw proper error message if calling ti.append with vector/matrix (#6322) (by Ailing)
[aot] Fixed buffer device address import (#6326) (by PENGUINLIONG)
[aot] Fixed export of get_instance_proc_addr (#6324) (by PENGUINLIONG)
[build] Allow building test when LLVM is off (#6327) (by Ailing)
[bug] Fix generating LLVM AOT module for the second time failed (#6311) (by PGZXB)
[aot] Per-parameter documentation in C-API header (#6317) (by PENGUINLIONG)
[ci] Revert "Add end-to-end CI tests for meshtaichi (#6321)" (#6325) (by Proton)
[ci] Add end-to-end CI tests for meshtaichi (#6321) (by yixu)
[doc] Update the document about offline cache (#6313) (by PGZXB)
[aot] Include taichi_cpu.h in taich.h (#6315) (by Zhanlue Yang)
[Vulkan] [bug] Change the format string of 64bit unsigned integer type from %llu to %lu (#6308) (by Lin Jiang)
[mesh] Refactor MeshTaichi API (#6306) (by Chang Yu)
[lang] MatrixType bug fix: Allow dynamic_index=True when real_matrix_scalarize=True (#6304) (by Yi Xu)
[lang] MatrixType bug fix: Enable irpass::cfg_optimization if real_matrix_scalarize is on (#6300) (by Zhanlue Yang)
[metal] Enable offline cache by default on Metal (#6307) (by PGZXB)
[Vulkan] Add overflow detection on vulkan when debug=True (#6279) (by Lin Jiang)
[aot] Inline documentations (#6301) (by PENGUINLIONG)
[aot] Support exporting interop info for TiMemory on Cpu/Cuda backends (#6242) (by Zhanlue Yang)
[lang] MatrixType bug fix: Avoid checks for legacy Matrix-class when real_matrix is on (#6292) (by Zhanlue Yang)
[aot] Support setting vector/matrix argument in C++ wrapper of C-API (#6298) (by Ailing)
[lang] MatrixType bug fix: Fix MatrixType validations in build_call_if_is_type() (#6294) (by Zhanlue Yang)
[bug] Fix asserting failed when registering kernels with same name on Metal (#6271) (by PGZXB)
[ci] Add more release tests (#5839) (by Proton)
[lang] MatrixType bug fix: Allow indexing a matrix r-value (#6291) (by Yi Xu)
[bug] Fix duplicate runs with 'run_tests.py --cpp -k' when selecting AOT tests (#6296) (by Zhanlue Yang)
[bug] Fix segmentation fault with TextureOpStmt ir_printer (#6297) (by Zhanlue Yang)
[ci] Add taichi-aot-demo headless demos (#6280) (by Proton)
[bug] Serialize missing fields of metal::TaichiKernelAttributes and metal::KernelAttributes (#6270) (by PGZXB)
[metal] Implement offline cache cleaning on metal (#6272) (by PGZXB)
[aot] Reorganized C-API headers (#6199) (by PENGUINLIONG)
[lang] [bug] Fix setting integer arguments within u64 range but greater than i64 range (#6267) (by Lin Jiang)
[autodiff] Skip gdar checking for user defined grad kernel (#6273) (by Mingrui Zhang)
[bug] Fix AotModuleBuilder::add_compiled_kernel (#6287) (by PGZXB)
[Bug] [lang] Make dimension check for GlobalPtrStmt aware of whether it is a cell access (#6275) (by Yi Xu)
[refactor] Move setting visible device to vulkan instance initialization (by Ailing Zhang)
[bug] Add unit test to detect memory leak from data_oriented classes (#6278) (by Zhanlue Yang)
[aot] Ship runtime *.bc files with C-API for LLVM AOT (#6285) (by Zhanlue Yang)
[bug] Convert non-i32 type indices to i32 for GlobalPtrStmt (#6276) (by Zhanlue Yang)
[Doc] Renamed syntax.md to kernel_function.md, plus miscellaneous edits (#6277) (by Vissidarte-Herman)
[lang] Fixed validation scope (#6262) (by PENGUINLIONG)
[bug] Prevent ti.kernel from directly caching the passed-in arguments to avoid memory leak (#6256) (by Zhanlue Yang)
[autodiff] Add demote atomics before gdar checker (#6266) (by Mingrui Zhang)
[autodiff] Add grad check feature and related test (#6245) (by PhrygianGates)
[lang] Fixed contraction cast (#6255) (by PENGUINLIONG)
[Example] Add karman vortex street example (#6249) (by Zhao Liang)
[ci] Lift GitHub CI timeout (#6260) (by Proton)
[metal] Support offline cache on metal (#6227) (by PGZXB)
[dx12] Add DirectX-Headers as a submodule (#6259) (by Xiang Li)
[bug] Fix link error with TI_WITH_OPENGL:BOOL=ON but TI_WITH_VULKAN:BOOL=OFF (#6257) (by PGZXB)
[dx12] Disable DX12 for cpu only test. (#6253) (by Xiang Li)
[Lang] MatrixNdarray refactor part11: Fuse ExternalPtrStmt and PtrOffsetStmt (#6189) (by Zhanlue Yang)
[Doc] Rename index.md to hello_world.md (#6244) (by Vissidarte-Herman)
[Doc] Update syntax.md (#6236) (by Zhao Liang)
[spirv] Generate OpBitFieldUExtract for BitExtractStmt (#6208) (by Yi Xu)
[Bug] [lang] Allow numpy int as snode dimension (#6211) (by Yi Xu)
[doc] Update document about building and running Taichi C++ tests (#6228) (by PGZXB)
[misc] Disable the offline cache if printing ir is enabled (#6234) (by PGZXB)
[vulkan] [opengl] Enable offline cache by default on Vulkan and OpenGL (#6233) (by PGZXB)
[Doc] Update math_module.md (#6235) (by Zhao Liang)
[Doc] Update debugging.md (#6238) (by Zhao Liang)
[dx12] Add ti.dx12. (#6174) (by Xiang Li)
[lang] Set ret_type for AtomicOpStmt (#6213) (by Ailing)
[Doc] Update global settings (#6201) (by Olinaaaloompa)
[doc] Editorial updates (#6216) (by Vissidarte-Herman)
[Doc] Update hello world (#6191) (by Olinaaaloompa)
[Doc] Update math module (#6203) (by Olinaaaloompa)
[Doc] Update profiler (#6214) (by Olinaaaloompa)
[autodiff] Store if condition in adstack (#6207) (by Mingrui Zhang)
[Doc] Update debugging.md (#6212) (by Zhao Liang)
[Doc] Update debugging.md (#6200) (by Zhao Liang)
[bug] Fixed type inference error with ExternalPtrStmt (#6210) (by Zhanlue Yang)
[example] Request to add my code into examples (#6185) (by JiaoLuhuai)
[Lang] MatrixNdarray refactor part10: Remove redundant MatrixInitStmt generated from scalarization (#6171) (by Zhanlue Yang)
[aot] Apply ti_get_last_error_message() for all C-API test cases (#6195) (by Zhanlue Yang)
[llvm] [refactor] Merge create_call and call (#6192) (by Lin Jiang)
[build] Support executing manually-specified cpp tests for run_tests.py (#6206) (by Zhanlue Yang)
[doc] Editorial updates to field.md (#6202) (by Vissidarte-Herman)
[Lang] MatrixNdarray refactor part9: Add scalarization for AllocaStmt (#6168) (by Zhanlue Yang)
[Lang] Support GPU solve with analyzePattern and factorize (#6158) (by pengyu)
[Lang] MatrixField refactor 9/n: Allow dynamic index of matrix field when real_matrix=True (#6194) (by Yi Xu)
[Doc] Fixed broken links (#6193) (by Olinaaaloompa)
[ir] MatrixField refactor 8/n: Rename PtrOffsetStmt to MatrixPtrStmt (#6187) (by Yi Xu)
[Doc] Update field.md (#6182) (by Zhao Liang)
[bug] Relax dependent Pillow version (#6170) (by Ailing)
[Doc] Update data_oriented_class.md (#6181) (by Zhao Liang)
[Doc] Update kernels and functions (#6176) (by Zhao Liang)
[Doc] Update type.md (#6180) (by Zhao Liang)
[Doc] Update getting started (#6175) (by Zhao Liang)
[llvm] MatrixField refactor 7/n: Simplify codegen for TensorType allocation and access (#6169) (by Yi Xu)
[LLVM] Add runtime overflow detection on LLVM-based backends (#6178) (by Lin Jiang)
Revert "[LLVM] Add runtime overflow detection on LLVM-based backends" (#6177) (by Ailing)
[dx12] Add aot for dx12. (#6099) (by Xiang Li)
[LLVM] Add runtime overflow detection on LLVM-based backends (#6166) (by Lin Jiang)
[doc] C-API documentation & generator (#5736) (by PENGUINLIONG)
[gui] Support for setting the initial position of GGUI window (#6156) (by Mocki)
[metal] Maintain a print string table per kernel (#6160) (by PGZXB)
[Lang] MatrixNdarray refactor part8: Add scalarization for BinaryOpStmt with TensorType-operands (#6086) (by Zhanlue Yang)
[Doc] Refactor debugging (#6102) (by Olinaaaloompa)
[doc] Updated the position of Sparse Matrix (#6167) (by Vissidarte-Herman)
[Doc] Refactor global settings (#6071) (by Zhao Liang)
[Doc] Refactor external arrays (#6065) (by Zhao Liang)
[Doc] Refactor simt (#6151) (by Zhao Liang)
[Doc] Refactor Profiler (#6142) (by Olinaaaloompa)
[Doc] Add doc for math module (#6145) (by Zhao Liang)
[aot] Fixed texture interop (#6164) (by PENGUINLIONG)
[misc] Remove TI_UI namespace macros (#6163) (by Lin Jiang)
[llvm] Add comment about the structure of the CodeGen (#6150) (by Lin Jiang)
[Bug] [lang] Fix augmented assign for sar (#6153) (by Yi Xu)
[Test] Add scipy to test GPU sparse solver (#6162) (by pengyu)
[bug] Fix crashing when loading old offline cache files (for gfx backends) (#6157) (by PGZXB)
[lang] Remove print at the end of parallel sort (#6161) (by Haidong Lan)
[misc] Move some offline cache utils from analysis/ to util/ (#6155) (by PGZXB)
[Lang] Matrix/Vector refactor: support basic matrix ops (#6077) (by Mike He)
[misc] Remove namespace macros (#6154) (by Lin Jiang)
[Doc] Update gui_system (#6152) (by Zhao Liang)
[aot] Track layouts for imported image & tests (#6138) (by PENGUINLIONG)
[ci] Fix build cache problems (#6149) (by Proton)
[Misc] Add prefix sum executor to avoid multiple field allocations (#6132) (by YuZhang)
[opt] Cache loop-invariant global vars to local vars (#6072) (by Lin Jiang)
[aot] Improve C++ wrapper implementation (#6146) (by PENGUINLIONG)
[doc] Refactored ODOP (#6143) (by Vissidarte-Herman)
[Lang] Support basic sparse matrix operations on GPU. (#6082) (by Jiafeng Liu)
[Lang] MatrixField refactor 6/n: Add tests for MatrixField scalarization (#6137) (by Yi Xu)
[vulkan] Fix SPV physical ptr load alignment (#6139) (by Bob Cao)
[bug] Let every thread has its own CompileConfig (#6124) (by Lin Jiang)
[refactor] Remove redundant codegen of floordiv (#6135) (by Yi Xu)
[doc] Miscellaneous editorial updates (#6131) (by Vissidarte-Herman)
Revert "[spirv] Fixed OpLoad with physical address" (#6136) (by Lin Jiang)
[bug] [llvm] Fix is_same_type when the suffix of a type is the prefix of the suffix of the other type (#6126) (by Lin Jiang)
[bug] [vulkan] Only enable non_semantic_info cap when validation layer is on (#6129) (by Ailing)
[Llvm] Fix codegen for div (unsigned) (#6128) (by Yi Xu)
[Lang] MatrixField refactor 5/n: Lower access of matrix field element into CHI IR (#6119) (by Yi Xu)
[Lang] Fix invalid assertion for matrix values (#6125) (by Zhanlue Yang)
[opengl] Fix GLES support (#6121) (by Ailing)
[Lang] MatrixNdarray refactor part7: Add scalarization for UnaryOpStmt with TensorType-operand (#6080) (by Zhanlue Yang)
[doc] Editorial updates (#6116) (by Vissidarte-Herman)
[misc] Allow more commits in changelog generation (#6115) (by Yi Xu)
[aot] Import MoltenVK (#6090) (by PENGUINLIONG)
[vulkan] Instruct users to install vulkan sdk if they want to use validation layer (#6098) (by Ailing)
[ci] Use local caches on self-hosted runners, and code refactoring. (#5846) (by Proton)
[misc] Bump version to v1.1.4 (#6112) (by Taichi Gardener)
[doc] Fixed a broken link (#6111) (by Vissidarte-Herman)
[doc] Update explanation on data-layout (#6110) (by Qian Bao)
[Doc] Move developer utilities to contribution (#6109) (by Olinaaaloompa)
[Doc] Added Accelerate PyTorch (#6106) (by Vissidarte-Herman)
[Doc] Refactor ODOP (#6013) (by Zhao Liang)
[opengl] Support offline cache on opengl (#6104) (by PGZXB)
[build] Fix building with TI_WITH_OPENGL:BOOL=OFF and TI_WITH_DX11:BOOL=ON failed (#6108) (by PGZXB)

Taichi Versions Save

v1.7.1

v1.7.0

1. New features

1.1 Real Function

Key Updates

Limitations

Usage Example

1.2 Enhancements in Kernel Arguments and Return Values

Support for Multiple Return Values in Taichi Kernel:

Removal of Size Limit on Kernel Arguments and Return Values:

1.3 Argument Pack

Key Advantages

Usage Example

Supported Data Types

Limitations

2. Improvements

2.1 CUDA Memory Allocation Improvements

Dynamic VRAM Allocation:

Changes in device_memory_GB and device_memory_fraction Usage:

Impact on VRAM Consumption:

2.2 CUDA SIMT APIs

2.3 Sparse grid APIs

2.4 GGUI

2.5 AOT

2.6 Error reporting

2.7 Autodiff

3. Bug Fixes

3.1 Autodiff Bugfixes

3.2 AOT Bugfixes

3.3 API Bugfixes

3.4 Build & Environment Bugfixes

3.5 GGUI Bugfixes

4. Deprecation Notice

5. Full changelog

v1.6.0

Deprecation Notice

New features

Struct arguments

Ndarray

Improvements

Performance

New Examples

Misc

Developer Experience

Full changelog

v1.5.0

Deprecation Notice

New features

AOT

Ndarray

Improvements

Python Frontend

Performance

GGUI

Full changelog:

v1.4.1

v1.4.0

Deprecation Notice

New features

AOT

Ndarray

Dynamic index

Improvements

Performance

Example list & ti gallery

Bug fixes

Highlights:

Full changelog:

v1.3.0

Deprecation Notice

New features

Dynamic SNode

Packed Mode

Sparse Matrix

Improvements

Python Frontend

Taichi Examples

[Developers only] LLVM15 upgrade

Highlights

Changes in `device_memory_GB` and `device_memory_fraction` Usage: