Productive, portable, and performant GPU programming in Python.
Highlights:
Full changelog:
This is a bug fix release for v1.1.0. Full changelog:
High-resolution simulations can deliver great visual quality, but are often limited by the capacity of the onboard GPU memory. This release adds quantized data types, allowing you to define your own integers, fixed-point numbers, or floating-point numbers of arbitrary number of bits that may strike a balance between your hardware limits and simulation effects. See Using quantized data types for a comprehensive introduction.
A Taichi kernel is implicitly compiled the first time it is called. The compilation results are kept in an online in-memory cache to reduce the overhead in the subsequent function calls. As long as the kernel function is unchanged, it can be directly loaded and launched. The cache, however, is no longer available when the program terminates. Then, if you run the program again, Taichi has to re-compile all kernel functions and reconstruct the online in-memory cache. And the first launch of a Taichi function is always slow due to the compilation overhead. To address this problem, this release adds the offline cache feature, which dumps the compilation cache to the disk for future runs. The first launch overhead can be drastically reduced in subsequent runs. Taichi now constructs and maintains an offline cache by default. The following table shows the launch overhead of running cornell_box on the CUDA backend with and without offline cache:
Time spent on compilation and cached data loading | |
---|---|
Offline cache disabled | 24.856s |
Offline cache enabled (1st run) | 25.435s |
Offline cache enabled (2nd run) | 0.677s |
Note that, for now, the offline cache feature works only on the CPU and CUDA backends. If your code behaves abnormally, disable offline cache by setting the environment variable TI_OFFLINE_CACHE=0
or ti.init(offline_cache=False)
and file an issue with us on Taichi's GitHub repo. See Offline cache for more information.
Adds forward-mode automatic differentiation via ti.ad.FwdMode
. Unlike the existing reverse-mode automatic differentiation, which computes vector-Jacobian product (vJp), forward-mode computes Jacobian-vector product (Jvp) when evaluating derivatives. Therefore, forward-mode automatic differentiation is much more efficient in situations where the number of a function's outputs is greater than its inputs. Read this example, which demonstrates Jacobian matrix computation in forward mode and reverse mode.
GPU's shared memory is a fast small memory that is visible within each thread block (or workgroup in Vulkan). It is widely used in scenarios where performance is a crucial concern. To give you access to your GPU's shared memory, this release adds the SharedArray API under the namespace ti.simt.block
.
The following diagram illustrates the performance benefits of Taichi's SharedArray. With SharedArray, Taichi Lang is comparable to or even outperforms the equivalent CUDA code.
Taichi now supports texture bilinear sampling and raw texel fetch on both Vulkan and OpenGL backends. This feature leverages the hardware texture unit and diminishes the need for manual composition of bilinear interpolation code in image processing tasks. This feature also provides an easy way for texture mapping in tasks such as rasterization or ray-tracing. On Vulkan backend, Taichi additionally supports image load and store. You can directly manipulate texels of an image and use this very image in subsequent texture mapping.
Note that the current texture and image APIs are in the early stages and subject to change. In the future we plan to support bindless textures to extend to tasks such as ray-tracing. We also plan to extend full texture support to all backends that support texture APIs.
Run ti example simple_texture
to see an example of texture support!
ti.ui.Window.get_depth_buffer(field)
;ti.ui.Window.get_depth_buffer_as_numpy()
.Scene.lines(vertices, width)
.ti.Matrix.field(4, 4, ti.f32, shape=N)
) and call ti.ui.Scene.mesh_instance(vertices, transforms=TransformMatrixField)
to put various mesh instances at different places.Scene.mesh()
or Scene.mesh_instance()
by setting show_wireframe=True
.Taichi dataclass: Taichi now recommends using the @ti.dataclass
decorator to define struct types, or even attach functions to them. See Taichi dataclasses for more information.
@ti.dataclass
class Sphere:
center: vec3
radius: ti.f32
@ti.func
def area(self):
# a function to run in taichi scope
return 4 * math.pi * self.radius * self.radius
def is_zero_sized(self):
# a python scope function
return self.radius == 0.0
As shown in the dataclass example above, vec2
, vec3
, and vec4
in the taichi.math
module (same for ivec
and uvec
) can be directly used as type hints. The numeric precision of these types is determined by default_ip
or default_fp
in ti.init()
.
More flexible instantiation for a struct
or dataclass
:
In earlier releases, to instantiate a taichi.types.struct
and taichi.dataclass
, you have to explicitly put down a complete list of member-value pairs like:
ray = Ray(ro=vec3(0), rd=vec3(1, 0, 0), t=1.0)
As of this release, you are given more options. The positional arguments are passed to the struct members in the order they are defined; the keyword arguments set the corresponding struct members. Unspecified struct members are automatically set to zero. For example:
# use positional arguments to set struct members in order
ray = Ray(vec3(0), vec3(1, 0, 0), 1.0)
# ro is set to vec3(0) and t will be set to 0
ray = Ray(vec3(0), rd=vec3(1, 0, 0))
# both ro and rd are set to vec3(0)
ray = Ray(t=1.0)
# ro is set to vec3(1), rd=vec3(0) and t=0.0
ray = Ray(1)
# all members are set to 0.
ray = Ray()
Supports calling fill()
from both the Python scope and the Taichi scope.
In earlier releases, you can only call fill()
from the Python scope, which is a method in the ScalarField
or MatrixField
class. As of this release, you can call this method from either the Python scope or the Taichi scope. See the following code snippet:
x = ti.field(int, shape=(10, 10))
x.fill(1)
@ti.kernel
def test():
x.fill(-1)
More flexible initialization for customized matrix types:
As the following code snippet shows, matrix types created using taichi.types.matrix()
or taichi.types.vector()
can be initialized more flexibly: Taichi automatically combines the inputs and converts them to a matrix whose shape matches the shape of the target matrix type.
# mat2 and vec3 are predefined types in the ti.math module
mat2 = ti.types.matrix(2, 2, float)
vec3 = ti.types.vector(3, float)
m = mat2(1) # [[1., 1.], [1., 1.]]
m = mat2(1, 2, 3, 4) # [[1., 2.], [3, 4.]]
m = mat2([1, 2], [3, 4]) # [[1., 2.], [3, 4.]]
m = mat2([1, 2, 3, 4]) # [[1., 2.], [3, 4.]]
v = vec3(1, 2, 3)
m = mat2(v, 4) # [[1., 2.], [3, 4.]]
Makes ti.f32(x)
syntax sugar for ti.cast(x, ti.f32)
, if x
is neither a literal nor of a compound data type. Same for other primitive types such as ti.i32
, ti.u8
, or ti.f64
.
More convenient axes order adjustment: A common way to improve the performance of a Taichi program is to adjust the order of axes when laying out field data in the memory. In earlier releases, this requires in-depth knowledge about the data definition language (the SNode system) and may become an extra burden in situations where sparse data structures are not required. As of this release, Taichi supports specifying the order of axes when defining a Taichi field.
# Before
x = ti.field(ti.i32)
y = ti.field(ti.i32)
ti.root.dense(ti.i, M).dense(ti.j, N).place(x) # row-major
ti.root.dense(ti.j, N).dense(ti.i, M).place(y) # column-major
# New syntax
x = ti.field(ti.i32, shape=(M, N), order='ij')
y = ti.field(ti.i32, shape=(M, N), order='ji')
# SoA vs. AoS example
p = ti.Vector.field(3, ti.i32, shape=(M, N), order='ji', layout=ti.Layout.SOA)
q = ti.Vector.field(3, ti.i32, shape=(M, N), order='ji', layout=ti.Layout.AOS)
pow()
has a negative exponent (#5275)ti.ndrange
(#4478)ti.BitpackedFields
ti.from_paddle
ti.to_paddle
ti.FieldsBuilder.lazy_dual
ti.math module
ti.Texture
ti.ref
ti.dataclass
ti.simt.block.SharedArray
Old API | New API |
---|---|
ti.clear_all_gradients |
ti.ad.clear_all_gradients |
ti.Tape |
ti.ad.Tape |
ti.FieldsBuilder.bit_array |
ti.FieldsBuilder.quant_array |
ti.ui.Window.write_image |
ti.ui.Window.save_image |
ti.ui.Window.GUI |
ti.ui.Window.get_gui |
ti.ui.make_camera
: Please construct cameras with ti.ui.Camera
instead.As announced in v1.0.0 release, we no longer provide official python3.6 wheels through pypi. Users who need taichi with python3.6 may still build from source but its support is not guaranteed.
The taichi_glsl
package on pypi will no longer be maintained as of this release. GLSL-related features will be implemented in the official taichi.math
module, which includes data types and handy functions for daily math and shader development:
vec2
, vec3
, and vec4
.mat2
,mat3
, and mat4
.step()
,clamp()
, and smoothstep()
.Official support for MacOS Mojave (10.14, released in 2018) will be dropped starting from v1.2.0. Please upgrade your MacOS if possible or let us know if you have any concerns.
Highlights:
Full changelog:
OpLoad
with physical address (#5212) (by PENGUINLIONG)Highlights:
Full changelog:
Highlights:
The v1.0.2 release is a patch fix that improves Taichi's stability on multiple platforms, especially for GGUI and the Vulkan backend.
Full changelog:
Highlights:
Full changelog:
v1.0.0 was released on April 13, 2022.
Taichi's license is changed from MIT to Apache-2.0 after a public vote in #4607.
This release supports Python 3.10 on all supported operating systems (Windows, macOS, and Linux).
Before v1.0.0, Taichi works only on Linux distributions that support glibc 2.27+ (for example Ubuntu 18.04+). As of v1.0.0, in addition to the normal Taichi wheels, Taichi provides the manylinux2014-compatible wheels to work on most modern Linux distributions, including CentOS 7.
ti.ext_arr()
and uses ti.types.ndarray()
instead. ti.types.ndarray()
supports both Taichi Ndarrays and external arrays, for example NumPy arrays.By working together with OPPO US Research Center, Taichi delivers Taichi AOT, a solution for deploying kernels in non-Python environments, such as in mobile devices.
Compiled Taichi kernels can be saved from a Python process, then loaded and run by the provided C++ runtime library. With a set of APIs, your Python/Taichi code can be easily deployed in any C++ environment. We demonstrate the simplicity of this workflow by porting the implicit FEM (finite element method) demo released in v0.9.0 to an Android application. Download the Android package and find out what Taichi AOT has to offer! If you want to try out this solution, please also check out the taichi-aot-demo
repo.
# In Python app.py
module = ti.aot.Module(ti.vulkan)
module.add_kernel(my_kernel, template_args={'x': x})
module.save('my_app')
The following code snippet shows the C++ workflow for loading the compiled AOT modules.
// Initialize Vulkan program pipeline
taichi::lang::vulkan::VulkanDeviceCreator::Params evd_params;
evd_params.api_version = VK_API_VERSION_1_2;
auto embedded_device =
std::make_unique<taichi::lang::vulkan::VulkanDeviceCreator>(evd_params);
std::vector<uint64_t> host_result_buffer;
host_result_buffer.resize(taichi_result_buffer_entries);
taichi::lang::vulkan::VkRuntime::Params params;
params.host_result_buffer = host_result_buffer.data();
params.device = embedded_device->device();
auto vulkan_runtime = std::make_unique<taichi::lang::vulkan::VkRuntime>(std::move(params));
// Load AOT module saved from Python
taichi::lang::vulkan::AotModuleParams aot_params{"my_app", vulkan_runtime.get()};
auto module = taichi::lang::aot::Module::load(taichi::Arch::vulkan, aot_params);
auto my_kernel = module->get_kernel("my_kernel");
// Allocate device buffer
taichi::lang::Device::AllocParams alloc_params;
alloc_params.host_write = true;
alloc_params.size = /*Ndarray size for `x`*/;
alloc_params.usage = taichi::lang::AllocUsage::Storage;
auto devalloc_x = embedded_device->device()->allocate_memory(alloc_params);
// Execute my_kernel without Python environment
taichi::lang::RuntimeContext host_ctx;
host_ctx.set_arg_devalloc(/*arg_id=*/0, devalloc_x, /*shape=*/{128}, /*element_shape=*/{3, 1});
my_kernel->launch(&host_ctx);
Note that Taichi only supports the Vulkan backend in the C++ runtime library. The Taichi team is working on supporting more backends.
All Taichi functions are inlined into the Taichi kernel during compile time. However, the kernel becomes lengthy and requires longer compile time if it has too many Taichi function calls. This becomes especially obvious if a Taichi function involves compile-time recursion. For example, the following code calculates the Fibonacci numbers recursively:
@ti.func
def fib_impl(n: ti.template()):
if ti.static(n <= 0):
return 0
if ti.static(n == 1):
return 1
return fib_impl(n - 1) + fib_impl(n - 2)
@ti.kernel
def fibonacci(n: ti.template()):
print(fib_impl(n))
In this code, fib_impl()
recursively calls itself until n
reaches 1
or 0
. The total time of the calls to fib_impl()
increases exponentially as n
grows, so the length of the kernel also increases exponentially. When n
reaches 25
, it takes more than a minute to compile the kernel.
This release introduces "real function", a new type of Taichi function that compiles independently instead of being inlined into the kernel. It is an experimental feature and only supports scalar arguments and scalar return value for now.
You can use it by decorating the function with @ti.experimental.real_func
. For example, the following is the real function version of the code above.
@ti.experimental.real_func
def fib_impl(n: ti.i32) -> ti.i32:
if n <= 0:
return 0
if n == 1:
return 1
return fib_impl(n - 1) + fib_impl(n - 2)
@ti.kernel
def fibonacci(n: ti.i32):
print(fib_impl(n))
The length of the kernel does not increase as n
grows because the kernel only makes a call to the function instead of inlining the whole function. As a result, the code takes far less than a second to compile regardless of the value of n
.
The main differences between a normal Taichi function and a real function are listed below:
if
/ for
/ while
statements in a normal Taichi function.Previously, you cannot explicitly give a type to a literal. For example,
@ti.kernel
def foo():
a = 2891336453 # i32 overflow (>2^31-1)
In the code snippet above, 2891336453
is first turned into a default integer type (ti.i32
if not changed). This causes an overflow. Starting from v1.0.0, you can write type annotations for literals:
@ti.kernel
def foo():
a = ti.u32(2891336453) # similar to 2891336453u in C
You can use ti.loop_config
to control the behavior of the subsequent top-level for-loop. Available parameters are:
block_dim
: Sets the number of threads in a block on GPU.parallelize
: Sets the number of threads to use on CPU.serialize
: If you set serialize
to True
, the for-loop runs serially, and you can write break statements inside it (Only applies on range/ndrange for-loops). Setting serialize
to True
Equals setting parallelize
to 1
.Here are two examples:
@ti.kernel
def break_in_serial_for() -> ti.i32:
a = 0
ti.loop_config(serialize=True)
for i in range(100): # This loop runs serially
a += i
if i == 10:
break
return a
break_in_serial_for() # returns 55
n = 128
val = ti.field(ti.i32, shape=n)
@ti.kernel
def fill():
ti.loop_config(parallelize=8, block_dim=16)
# If the kernel is run on the CPU backend, 8 threads will be used to run it
# If the kernel is run on the CUDA backend, each block will have 16 threads
for i in range(n):
val[i] = i
math
moduleThis release adds a math
module to support GLSL-standard vector operations and to make it easier to port GLSL shader code to Taichi. For example, vector types, including vec2
, vec3
, vec4
, mat2
, mat3
, and mat4
, and functions, including mix()
, clamp()
, and smoothstep()
, act similarly to their counterparts in GLSL. See the following examples:
You can use the rgba
, xyzw
, uvw
properties to get and set vector entries:
import taichi.math as tm
@ti.kernel
def example():
v = tm.vec3(1.0) # (1.0, 1.0, 1.0)
w = tm.vec4(0.0, 1.0, 2.0, 3.0)
v.rgg += 1.0 # v = (2.0, 3.0, 1.0)
w.zxy += tm.sin(v)
Each Taichi vector is implemented as a column vector. Ensure that you put the the matrix before the vector in a matrix multiplication.
@ti.kernel
def example():
M = ti.Matrix([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
v = tm.vec3(1, 2, 3)
w = (M @ v).xyz # [1, 2, 3]
@ti.kernel
def example():
v = tm.vec3(0., 1., 2.)
w = tm.smoothstep(0.0, 1.0, v.xyz)
w = tm.clamp(w, 0.2, 0.8)
ti gallery
This release introduces a CLI command ti gallery
, allowing you to select and run Taichi examples in a pop-up window. To do so:
ti gallery
A window pops up:
As of v1.0.0, Taichi accepts matrix or vector types as parameters and return values. You can use ti.types.matrix
or ti.types.vector
as the type annotations.
Taichi also supports basic, read-only matrix slicing. Use the mat[:,:]
syntax to quickly retrieve a specific portion of a matrix. See Slicings for more information.
The following code example shows how to get numbers in four corners of a 3x3
matrix mat
:
import taichi as ti
ti.init()
@ti.kernel
def foo(mat: ti.types.matrix(3, 3, ti.i32)) -> ti.types.matrix(2, 2, ti.i32)
corners = mat[::2, ::2]
return corners
mat = ti.Matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
corners = foo(mat) # [[1 3] [7 9]]
Note that in a slice, the lower bound, the upper bound, and the stride must be constant integers. If you want to use a variable index together with a slice, you should set ti.init(dynamic_index=True)
. For example:
import taichi as ti
ti.init(dynamic_index=True)
@ti.kernel
def foo(mat: ti.types.matrix(3, 3, ti.i32), ind: ti.i32) -> ti.types.matrix(3, 1, ti.i32):
col = mat[:, ind]
return col
mat = ti.Matrix([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
col = foo(mat, 2) # [3 6 9]
Flexiblity is key to the user experience of an automatic-differentiation (AD) system. Before v1.0.0, Taichi AD system requires that a differentiable Taichi kernel only consist multiple simply nested for-loops (shown in task1
below). This was once called the Kernel Simplicity Rule (KSR). KSR prevents Taichi's users from writing differentiable kernels with multiple serial for-loops (shown in task2
below) or with a mixture of serial for-loop and non-for statements (shown in task3
below).
# OK: multiple simply nested for-loops
@ti.kernel
def task1():
for i in range(2):
for j in range(3):
for k in range(3):
y[None] += x[None]
# Error: multiple serial for-loops
@ti.kernel
def task2():
for i in range(2):
for j in range(3):
y[None] += x[None]
for j in range(3):
y[None] += x[None]
# Error: a mixture of serial for-loop and non-for
@ti.kernel
def task3():
for i in range(2):
y[None] += x[None]
for j in range(3):
y[None] += x[None]
With KSR being removed from this release, code with different kinds of for-loops structures can be differentiated, as shown in the snippet below.
# OK: A complicated control flow that is still differentiable in Taichi
for j in range(2):
for i in range(3):
y[None] += x[None]
for i in range(3):
for ii in range(2):
y[None] += x[None]
for iii in range(2):
y[None] += x[None]
for iv in range(2):
y[None] += x[None]
for i in range(3):
for ii in range(2):
for iii in range(2):
y[None] += x[None]
Taichi provides a demo to demonstrate how to implement a differentiable simulator using this enhanced Taichi AD system.
assert
statementThis release supports including an f-string in an assert
statement as an error message. You can include scalar variables in the f-string. See the example below:
import taichi as ti
ti.init(debug=True)
@ti.kernel
def assert_is_zero(n: ti.i32):
assert n == 0, f"The number is {n}, not zero"
assert_is_zero(42) # TaichiAssertionError: The number is 42, not zero
Note that the assert
statement works only in debug mode.
This release comes with the first version of the Taichi language specification, which attempts to provide an exhaustive description of the syntax and semantics of the Taichi language and makes a decent reference for Taichi's users and developers when they determine if a specific behavior is correct, buggy, or undefined.
Deprecated | Replaced by |
---|---|
ti.ext_arr() |
ti.types.ndarray() |
Highlights:
Full changelog:
Highlights:
Full changelog: