Integer + Floating Point Compression Filter
-
Fastest transpose/shuffle
- :new: (2019.11) ALL TurboTranspose functions now available under 64 bits ARMv8 including NEON SIMD.
-
Byte/Nibble transpose/shuffle for improving compression of binary data (ex. floating point data)
- :sparkles: Scalar/SIMD Transpose/Shuffle 8,16,32,64,... bits
- :+1: Dynamic CPU detection and JIT scalar/sse/avx2 switching
- 100% C (C++ headers), usage as simple as memcpy
-
Byte Transpose
-
Fastest byte transpose
- :new: (2019.11) 2D,3D,4D transpose
-
Nibble Transpose
- nearly as fast as byte transpose
- more efficient, up to 10 times! faster than Bitshuffle
- :new: better compression (w/ lz77) and
10 times! faster than one of the best floating-point compressors SPDP
- can compress/decompress (w/ lz77) better and faster than other domain specific floating point compressors
- Scalar and SIMD Transform
-
Delta encoding for sorted lists
-
Zigzag encoding for unsorted lists
-
Xor encoding
- :new: lossy floating point compression with user-defined error
Transpose Benchmark:
- Benchmark Intel CPU: Skylake i7-6700 3.4GHz gcc 9.2 single thread
- Benchmark ARM: ARMv8 A73-ODROID-N2 1.8GHz
- Speed test
Benchmark w/ 16k buffer
BOLD = pareto frontier.
E:Encode, D:Decode
./tpbench -s# file -B16K (# = 8,4,2)
E cycles/byte |
D cycles/byte |
Transpose 64 bits AVX2 |
.199 |
.134 |
TurboTranspose Byte |
.326 |
.201 |
Blosc byteshuffle |
.394 |
.260 |
TurboTranspose Nibble |
.848 |
.478 |
Bitshuffle 8 |
E cycles/byte |
D cycles/byte |
Transpose 32 bits AVX2 |
.121 |
.102 |
TurboTranspose Byte |
.451 |
.139 |
Blosc byteshuffle |
.345 |
.229 |
TurboTranspose Nibble |
.773 |
.476 |
Bitshuffle |
E cycles/byte |
D cycles/byte |
Transpose 16 bits AVX2 |
.095 |
.071 |
TurboTranspose Byte |
.640 |
.108 |
Blosc byteshuffle |
.329 |
.198 |
TurboTranspose Nibble |
.758 |
1.177 |
Bitshuffle 2 |
.067 |
.067 |
memcpy |
E MB/s |
D MB/s |
16 bits ARM 2019.11 |
8192 |
16384 |
TurboTranspose Byte |
8192 |
8192 |
blosc byteshuffle |
1638 |
2341 |
TurboTranspose Nibble |
356 |
287 |
blosc bitshuffle |
16384 |
16384 |
memcpy |
E MB/s |
D MB/s |
32 bits ARM 2019.11 |
8192 |
8192 |
TurboTranspose Byte |
8192 |
8192 |
blosc byteshuffle |
1820 |
2341 |
TurboTranspose Nibble |
372 |
252 |
blosc bitshuffle |
E MB/s |
D MB/s |
64 bits ARM 2019.11 |
4096 |
8192 |
TurboTranspose Byte |
5461 |
5461 |
blosc byteshuffle |
1490 |
1490 |
TurboTranspose Nibble |
372 |
260 |
blosc bitshuffle |
Transpose/Shuffle benchmark w/ large files (100MB).
MB/s: 1,000,000 bytes/second
./tpbench -s# file (# = 8,4,2)
E MB/s |
D MB/s |
Transpose 16 bits AVX2 2019.11 |
9208 |
9795 |
TurboTranspose Byte |
8382 |
7689 |
Blosc byteshuffle |
9377 |
9584 |
TurboTranspose Nibble |
2750 |
2530 |
Blosc bitshuffle |
13725 |
13900 |
memcpy |
E MB/s |
D MB/s |
Transpose 32 bits AVX2 2019.11 |
9718 |
9713 |
TurboTranspose Byte |
9181 |
9030 |
Blosc byteshuffle |
8750 |
9472 |
TurboTranspose Nibble |
2767 |
2942 |
Blosc bitshuffle 4 |
E MB/s |
D MB/s |
Transpose 64 bits AVX2 2019.11 |
8998 |
9573 |
TurboTranspose Byte |
8721 |
8586 |
Blosc byteshuffle 2 |
8252 |
9222 |
TurboTranspose Nibble |
2711 |
2053 |
Blosc bitshuffle 2 |
E MB/s |
D MB/s |
16 bits ARM 2019.11 |
872 |
3998 |
TurboTranspose Byte |
678 |
3852 |
blosc byteshuffle |
1365 |
2195 |
TurboTranspose Nibble |
357 |
280 |
blosc bitshuffle |
3921 |
3913 |
memcpy |
E MB/s |
D MB/s |
32 bits ARM 2019.11 |
1828 |
3768 |
TurboTranspose Byte |
1769 |
3713 |
blosc byteshuffle |
1456 |
2299 |
TurboTranspose Nibble |
374 |
243 |
blosc bitshuffle |
E MB/s |
D MB/s |
64 bits ARM 2019.11 |
1793 |
3572 |
TurboTranspose Byte |
1784 |
3544 |
blosc byteshuffle |
1176 |
1267 |
TurboTranspose Nibble |
331 |
203 |
blosc bitshuffle |
- Compression test (transpose/shuffle+lz4)
:new: Download IcApp a new benchmark for TurboPFor+TurboTranspose
for testing allmost all integer and floating point file types.
Note: Lossy compression benchmark with icapp only.
- Speed test (file msg_sweep3d)
C size |
ratio % |
C MB/s |
D MB/s |
Name AVX2 |
11,348,554 |
18.1 |
2276 |
4425 |
TurboTranspose Nibble+lz |
22,489,691 |
35.8 |
1670 |
3881 |
TurboTranspose Byte+lz |
43,471,376 |
69.2 |
348 |
402 |
SPDP |
44,626,407 |
71.0 |
1065 |
2101 |
bitshuffle+lz |
62,865,612 |
100.0 |
13300 |
13300 |
memcpy |
./tpbench -s4 -z *.sp
File |
File size |
lz % |
Tp8lz |
Tp4lz |
BSlz |
spdp1 |
|
spdp9 |
Tp4lzt |
eTp4lzt |
msg_bt |
133194716 |
94.3 |
70.4 |
66.4 |
73.9 |
70.0 |
|
67.4 |
54.7 |
32.4 |
msg_lu |
97059484 |
100.4 |
77.1 |
70.4 |
75.4 |
76.8 |
|
74.0 |
61.0 |
42.2 |
msg_sppm |
139497932 |
11.7 |
11.6 |
12.6 |
15.4 |
14.4 |
|
13.7 |
9.0 |
5.6 |
msg_sp |
145052928 |
100.3 |
68.8 |
63.7 |
68.1 |
67.9 |
|
65.3 |
52.6 |
24.9 |
msg_sweep3d |
62865612 |
98.7 |
35.8 |
18.1 |
71.0 |
69.6 |
|
13.7 |
9.8 |
3.8 |
num_brain |
70920000 |
100.4 |
76.5 |
71.1 |
77.4 |
79.1 |
|
73.9 |
63.4 |
32.6 |
num_comet |
53673984 |
92.4 |
79.0 |
77.6 |
82.1 |
84.5 |
|
84.6 |
70.1 |
41.7 |
num_control |
79752372 |
99.4 |
89.5 |
90.7 |
88.1 |
98.3 |
|
98.5 |
81.4 |
51.2 |
num_plasma |
17544800 |
100.4 |
0.7 |
0.7 |
75.5 |
30.7 |
|
2.9 |
0.3 |
0.2 |
obs_error |
31080408 |
89.2 |
73.1 |
70.0 |
76.9 |
78.3 |
|
49.4 |
20.5 |
12.2 |
obs_info |
9465264 |
93.6 |
70.2 |
61.9 |
72.9 |
62.4 |
|
43.8 |
27.3 |
15.1 |
obs_spitzer |
99090432 |
98.3 |
90.4 |
95.6 |
93.6 |
100.1 |
|
100.7 |
80.2 |
52.3 |
obs_temp |
19967136 |
100.4 |
89.5 |
92.4 |
91.0 |
99.4 |
|
100.1 |
84.0 |
55.8 |
Tp8=Byte transpose, Tp4=Nibble transpose, lz = lz4
eTp4Lzt = lossy compression with lzturbo and allowed error = 0.0001 (1e-4)
Slow but best compression: SPDP9 and lzt = lzturbo,39
File |
File size |
lz % |
Tp8lz |
Tp4lz |
BSlz |
spdp1 |
|
spdp9 |
Tp4lzt |
eTp4lzt |
msg_bt |
266389432 |
94.5 |
77.2 |
76.5 |
81.6 |
77.9 |
|
75.4 |
69.9 |
16.0 |
msg_lu |
194118968 |
100.4 |
82.7 |
81.0 |
83.7 |
83.3 |
|
79.6 |
75.5 |
21.0 |
msg_sppm |
278995864 |
18.9 |
14.5 |
14.9 |
19.5 |
21.5 |
|
19.8 |
11.2 |
2.8 |
msg_sp |
290105856 |
100.4 |
79.2 |
77.5 |
80.2 |
78.8 |
|
77.1 |
71.3 |
12.4 |
msg_sweep3d |
125731224 |
98.7 |
50.7 |
36.7 |
80.4 |
76.2 |
|
33.2 |
27.3 |
1.9 |
num_brain |
141840000 |
100.4 |
82.6 |
81.1 |
84.5 |
87.8 |
|
83.3 |
77.0 |
16.3 |
num_comet |
107347968 |
92.8 |
83.3 |
78.8 |
76.3 |
86.5 |
|
86.0 |
69.8 |
21.2 |
num_control |
159504744 |
99.6 |
92.2 |
90.9 |
89.4 |
97.6 |
|
98.9 |
85.5 |
25.8 |
num_plasma |
35089600 |
75.2 |
0.7 |
0.7 |
84.5 |
77.3 |
|
3.0 |
0.3 |
0.1 |
obs_error |
62160816 |
78.7 |
81.0 |
77.5 |
84.4 |
87.9 |
|
62.3 |
23.4 |
6.3 |
obs_info |
18930528 |
92.3 |
75.4 |
70.6 |
82.4 |
81.7 |
|
51.2 |
33.1 |
7.7 |
obs_spitzer |
198180864 |
95.4 |
93.2 |
93.7 |
86.4 |
100.1 |
|
102.4 |
78.0 |
26.9 |
obs_temp |
39934272 |
100.4 |
93.1 |
93.8 |
91.7 |
98.0 |
|
97.4 |
88.2 |
28.8 |
eTp4Lzt = lossy compression with allowed error = 0.0001
Compile:
git clone git://github.com/powturbo/TurboTranspose.git
cd TurboTranspose
Linux + Windows MingW
make
or
make AVX2=1
Windows Visual C++
nmake /f makefile.vs
or
nmake AVX2=1 /f makefile.vs
Testing:
-
benchmark "transpose" functions
./tpbench [-s#] [-z] file
s# = element size #=2,4,8,16,... (default 4)
-z = only lz77 compression benchmark (bitshuffle package mandatory)
Function usage:
Byte transpose:
void tpenc( unsigned char *in, unsigned n, unsigned char *out, unsigned esize);
void tpdec( unsigned char *in, unsigned n, unsigned char *out, unsigned esize)
in : input buffer
n : number of bytes
out : output buffer
esize : element size in bytes (2,4,8,...)
Nibble transpose:
void tp4enc( unsigned char *in, unsigned n, unsigned char *out, unsigned esize);
void tp4dec( unsigned char *in, unsigned n, unsigned char *out, unsigned esize)
in : input buffer
n : number of bytes
out : output buffer
esize : element size in bytes (2,4,8,...)
Environment:
OS/Compiler (64 bits):
- Linux: GNU GCC (>=4.6)
- Linux: Clang (>=3.2)
- Windows: MinGW-w64 makefile
- Windows: Visual c++ (>=VS2008) - makefile.vs (for nmake)
- Windows: Visual Studio project file - vs/vs2017 - Thanks to PavelP
- Linux ARM: 64 bits aarch64 ARMv8: gcc (>=6.3)
- Linux ARM: 64 bits aarch64 ARMv8: clang
Multithreading:
- All TurboTranspose functions are thread safe
References:
Last update: 25 Oct 2019