OAP Versions Save

Optimized Analytics Package for Spark* Platform

v0.8.4-spark-2.4.4

3 years ago

OAP 0.8.4-Spark-2.4.4 is the 4th maintenance release based on the branch-0.8-spark-2.4.x of OAP. Compared with 0.8.3-Spark-2.4.4, this release recommend Plasma backend as cache strategy for SQL Data Source Cache, decouple SQL Data Source Cache from Spark source code, fix bugfixes and enhance stability.

What’s New in Version 0.8.4?

Features

#1865 [OAP-CACHE]Decouple spark code include DataSourceScanExec.scala, OneApplicationResource.scala, Decouple VectorizedColumnReader.java, VectorizedPlainValuesReader.java, VectorizedRleValuesReader.java and OnHeapColumnVector.java for OAP-0.8.4.
#1813 [OAP-cache] package redis client jar into oap-cache

Bugs Fixed

#2044 [OAP-CACHE] Build error due to synchronizedSet on branch 0.8
#2027 [oap-shuffle] Should load native library from jar directly
#1981 [OAP-CACHE] Error runing q32 binary cache
#1980 [SDLe][RPMem-Shuffle]Issues from Static Code Analysis with Klocwork need to be fixed
#1828 OAP PmofShuffleManager log info error
#1918 [OAP-CACHE] Plasma throw exception:get an invalid value- branch 0.8

Credit

Thanks to everyone who contributed to OAP development as well as those who provided necessary support. It’s your effort to make this release happen. And look forward to your continuous effort to make better future releases.

v1.0.0-spark-3.0.0

3 years ago

OAP 1.0.0 is the third major release after OAP has become an umbrella product. This release mainly optimized Native SQL Engine and Intel MLlib performance, decoupled SQL Data Source Cache from Spark source code, fixed bugfixes and enhanced stability.

What’s New in Version 1.0.0?

Features

#1588 [OAP-CACHE] Make Parquet file splitable
#1337 [oap-cacnhe] Discard OAP data format
#1679 [OAP-CACHE]Remove the code related to reading and writing OAP data format
#1680 [OAP-CACHE]Decouple spark code includes FileFormatDataWriter, FileFormatWriter and OutputWriter
#1846 [oap-native-sql] spark sql unit test
#1811 [OAP-cache]provide one-step starting scripts like plasma-sever redis-server
#1519 [oap-native-sql] upgrade cmake
#1835 [oap-native-sql] Support ColumnarBHJ to Build and Broadcast HashRelation in driver side
#1848 [OAP-CACHE]Decouple spark code include OneApplicationResource.scala
#1824 [OAP-CACHE]Decouple spark code includes DataSourceScanExec.scala.
#1838 [OAP-CACHE]Decouple spark code includes VectorizedColumnReader.java, VectorizedPlainValuesReader.java, VectorizedRleValuesReader.java and OnHeapColumnVector.java
#1839 [oap-native-sql] Add prefetch to columnar shuffle split
#1756 [Intel MLlib] Add Kmeans "tolerance" support and test cases
#1818 [OAP-Cache]Make Spark webUI OAP Tab more user friendly
#1831 [oap-native-sql] ColumnarWindow: Support reusing same window spec in multiple functions
#1765 [oap-native-sql] Support WSCG in nativesql
#1517 [oap-native-sql] implement SortMergeJoin
#1654 [oap-native-sql] Columnar shuffle TPCDS enabling
#1700 [oap-native-sql] Support inside join condition project
#1717 [oap-native-sql] support null in columnar literal and subquery
#1704 [oap-native-sql] Add ColumnarUnion and ColumnarExpand
#1647 [oap-native-sql] row to columnar for decimal
#1638 [oap-native-sql] adding full TPC-DS support
#1498 [oap-native-sql] stddev_samp support
#1547 [oap-native-sql] adding metrics for input/output batches

Performance

#1955 [OAP-CACHE] Plasma shows lower performance comparing with vanilla spark.
#2023 [OAP-MLlib] Use oneAPI official release instead of beta versions
#1829 [oap-native-sql] Optimize columnar shuffle and option to use AVX512
#1734 [oap-native-sql] use non-codegen for sort with one key
#1706 [oap-native-sql] Optimize columnar shuffle write

Bugs Fixed

#2054 [OAP-MLlib] Faild run Intel mllib after updating the version of oneapi.
#2012 [SQL Data Source Cache] The task will be suspended when using plasma cache.
#1640 [SQL Data Source Cache] The task will be suspended when using plasma cache and starting 2 executors per worker.
#2028 [OAP-Cache]When using Plasma Spark webUI OAP Tab cache metrics are not right 
#1979 [SDLe][native-sql-engine] Issues from Static Code Analysis with Klocwork need to be fixed
#1938 [oap-native-sql] Stability test failed when running TPCH for 10 rounds.
#1924 [OAP-CACHE] Decouple hearbeat message and use conf to determine whether to report locailty information
#1921 [SDLe][rpmem-shuffle] The master branch and branch-1.0-spark-3.0 can't pass BDBA analysis with libsqlitejdbc dependency.
#1743 [oap-native-sql] Error not reported when creating CodeGenerator instance
#1864 [oap-native-sql] hash conflict in hashagg
#1934 [oap-native-sql] backport to 1.0
#1929 [oap-native-sql] memleak in non-codegen aggregate
#1907 [OAP-cache]Cannot find the class of redis-client
#1742 [oap-native-sql] SortArraysToIndicesKernel: incorrect null ordering with multiple sort keys
#1854 [oap-native-sql] Fix columnar shuffle file not deleted
#1844 [oap-native-sql] Fix columnar shuffle spilled file not deleted
#1580 [oap-native-sql] Hash Collision in multiple keys scenario
#1754 [Intel MLlib] Improve LibLoader creating temp dir name with UUID
#1825 Fail to run PMemBlockPlatformTest when building oap cache
#1815 [oap-native-sql] Memory management: Error on task end if there are unclosed child allocators
#1808 [oap-native-sql] ColumnarWindow: Memory leak on converting input/output batches
#1806 [oap-native-sql] Fix Columnar Shuffle Memory Leak
#1783 [oap-native-sql] ColumnarWindow: Rank() returns wrong result when input row number >= 65536
#1776 [oap-native-sql] memory leakage in native code
#1760 [oap-native-sql] fix columnar sorting on string
#1733 [oap-native-sql]TPCH Q18 memory leakage
#1694 [oap-native-sql] TPC-H q15 failed for ConditionedProbeArraysVisitorImpl MakeResultIterator does not support dependency type other than Batch
#1682 [oap-native-sql] fix aggregate without codegen
#1707 [oap-native-sql] Fix collect batch metric
#1669 [oap-native-sql] TPCH Q1 results is not correct w/ hashagg codegen off
#1629 [oap-native-sql] clean up building steps
#1602 [oap-native-sql] rework copyfromjar function
#1599 [oap-native-sql] Columnar BHJ fail on TPCH-Q15

Credit

Thanks to everyone who contributed to OAP development as well as those who provided necessary support. It’s your effort to make this release happen. And look forward to your continuous effort to make better future releases.

v0.8.3-spark-2.4.4

3 years ago

OAP 0.8.3-Spark-2.4.4 is the third maintenance release based on the branch-0.8-spark-2.4.x of OAP. Compared with 0.8.2-Spark-2.4.4, this release supports dynamically turning on/off the SQL Data Source Cache feature. In addition, we refined documentation & scripts for better user experience.

What’s New in Version 0.8.3?

Feature improvements

[#1731] Add OAP cache runtime enable

Documentation & Scripts

[#1698] Highlight spark with numa patch should be built on correct hadoop version in user environment [#1762] Add doc for disable OAP cache in runtime [#1758] Update build scripts [#1764] Update Docs for OAP 0.8.3

v0.9.0-spark-3.0.0

3 years ago

OAP 0.9.0 is the second major release after OAP has become an umbrella product. This release mainly adds 3 features: Intel MLlib, Unified Arrow Data Source and Native SQL Engine, supports Apache Spark 3.0.0 for OAP all modules, offers a Conda package solution to help users automatically build and install OAP.

What’s New in Version 0.9.0?

New features

Intel MLlib

[#1405]Add Intel MLlib to OAP in oap-mllib directory, including oneDAL and oneCCL-based optimized K-Means, Scala and Python version examples and K-Means benchmark for HiBench datasets [#1454]Add failonerror to maven-antrun-plugin build-natives and change CCL include and lib dir for oneapi beta07 and change to dynamic links (libccl.a dep on icc) [#1470]Package all oneCCL libs into jar and change related loading logics [#1495]Support to use batch iterator to set data from jvm to native [#1496]Combine the multi numeric tables to one in executor process [#1570]Optimize the toArray operator in the iterator batch [#1571]Remove unnecessary stages for kmeans-hibench [#1575]Kmeans-hibench default init to Random, remove daal from example pom [#1664]Add platform compatibility check [#1702]Fix Intel-MLlib UnsatisfiedLinkError: org.apache.spark.ml.util.OneDAL$.cAddNumericTable

Unified Arrow Data Source and Native SQL Engine

[#1327][#1413][#1431][#1436]Add Unified Arrow Data Source and Native SQL Engine to OAP [#1603]Fix extract resource from jar

Other feature improvements and code refinements

SQL Index and Data Source Cache feature

[#1310][#1412]SQL Index and Data Source Cache feature supports Spark-3.0.0 [#1123]SQL Index supports building index for different partitions concurrently [#1438]Fix OAP not apply ReuseExchange Optimization [#1456]Remove unnecessary memcopy in plasma code path [#1550]Support DPP for SQL Date Source Cache [#1229]Cache backend fallback

Remote Shuffle

[#1356]Remote shuffle manager for Spark-3.0.0 [#1424]Shuffle reader batch fetch [#1451]Spark-3.0.0 support followup

Shuffle Remote PMem Extension

[#1432]Integrating RPMem shuffle to OAP 0.9 [#1491]Bug fix to support decision support query [#1606]Remove Eclipse IDE's preference settings [#1542]Add shuffle block removing operation in one Spark context [#1692]To solve vulnerabilities which prevent SDL scanning successfully [#1746]To make java side native library from jar directly

RDD Cache PMem Extension

[#1409]Support Spark-3.0.0 for RDD Cache

Conda Package for OAP

[#1272][#1352][#1443][#1507][#1650][#1703][#1718]Provide a Conda package to help users automatically install and build OAP

Credit

This release would not have been possible without the following contributors: @carsonwang(Wang Carson, Intel) @Eugene-Mark(Ma Eugene, Intel) @gczsjdy (Guo Chenzhao, Intel) @haojinIntel(Jin Hao, Intel) @HongW2019(Wang Hong, Intel) @jerrychenhf(Chen Haifeng, Intel) @Jian-Zhang(Zhang Jian, Intel) @JiayiChen785 @jikunshang(Ji Kunshang, Intel) @JkSelf(Jia Ke, Intel) @jovany-wang(Wang Qing, Baidu) @lidinghao (Li Hao, Baidu) @LuciferYang (Yang Jie, Baidu) @offthewall123(Xu Dingyu, Intel) @rongma1997(Ma Rong, Intel) @rui-mo(Mo Rui, Intel) @winningsix(Xu Cheng, Intel) @xuechendi(Xue Chendi, Intel) @xwu99(Wu Xiaochang, Intel) @yao531441(Yao Qing, Intel) @yeyuqiang (Ye Yuqiang, Intel) @yma11(Ma Yan, Intel) @zhixingheyi-tian (Shen Xiangxiang, Intel) @zhouyuan(Zhou Yuan, Intel) @zhztheplayer(Zhang Hongze, Intel)

v0.8.2-spark-2.4.4

3 years ago

OAP 0.8.2-Spark-2.4.4 is the second maintenance release based on the branch-0.8-spark-2.4.x of OAP. Compared with 0.8.1-Spark-2.4.4, this release fixed the conflict between multiple module jars in the classpath. For Shuffle Remote PMem Extension, we resolved the SparkContext resource release issue. In addition, we optimized Plasma Backend performance, fixed ReuseExchange unavailable issue, improved OAP tab WebUI, and refined documentation & scripts for better user experience.

What’s New in Version 0.8.2?

Feature improvements

[#1478] Decouple Spark source codes from OAP SQL Index and Data Source Cache [#1472] Initialize PMem with AppDirect mode and KMemDax mode in block manager [#1456] Optimize Plasma usage and add Plasma memory manager [#1485] Modify Spark OAP TAB webUI

BugFix

[#1612] Add shuffle block removing operation within one Spark context [#1480] Set NUMA id in spark conf instance [#1513] Fix oap-perf-suite to pass oap-cache daily test due to SparkEnv.scala decouple [#1534] Fix the issue of OAP not applying ReuseExchange optimization [#1585] Fix on Installation scripts for user

Credit

This release would not have been possible without the following contributors: @Eugene-Mark(Ma Eugene, Intel)
@haojinIntel(Jin Hao, Intel) @HongW2019(Wang Hong, Intel) @jerrychenhf(Chen Haifeng, Intel) @jikunshang(Ji Kunshang, Intel) @lidinghao (Li Hao, Baidu) @LuciferYang (Yang Jie, Baidu) @winningsix(Xu Cheng, Intel) @yao531441(Yao Qing, Intel) @yeyuqiang (Ye Yuqiang, Intel) @zhixingheyi-tian (Shen Xiangxiang, Intel)

v0.8.1-spark-2.4.4

3 years ago

OAP 0.8.1-Spark-2.4.4 is a maintenance release based on the branch-0.8-spark-2.4.x of OAP. Compared with 0.8.0-Spark-2.4.4, This release introduces one new feature--Shuffle Remote PMem Extension, so that OAP 0.8.1-Spark-2.4.4 now inculdes 3 major PMem related features, the other two are SQL Data Source Cache and RDD Cache PMEM Extension. In addition, we restructured entire documentation framework for better user experience.

What’s New in Version 0.8.1?

New features

Shuffle Remote PMem Extension

[#1375]Introduce Shuffle Remote PMem Extension to OAP [#1383][#1386]Bug fix for Spark2.4.4 [#1391][#1398]Modify pom.xml to add RPMem shuffle as a sub project and rename RPMem shuffle jar file [#1396]Distribute jnipmdk dll through jar

Credit

This release would not have been possible without the following contributors: @Eugene-Mark(Ma Eugene, Intel) @haojinIntel(Jin Hao, Intel) @HongW2019(Wang Hong, Intel) @jerrychenhf(Chen Haifeng, Intel) @Jian-Zhang (Zhang Jian, Intel) @jikunshang(Ji Kunshang, Intel) @winningsix(Xu Cheng, Intel) @xuechendi(Xue Chendi, Intel) @yao531441(Yao Qing, Intel) @yeyuqiang (Ye Yuqiang, Intel) @yma11(Ma Yan, Intel) @zhixingheyi-tian (Shen Xiangxiang, Intel)

v0.8.0-spark-2.4.4

3 years ago

OAP 0.8.0 is the seventh major release and is also the first major release after OAP has become an umbrella product. This release mainly adds two features: Remote-Shuffle and Plasma external cache service, introduces a Beta feature RDD cache PMem extension, supports DAX KMEM mode on DCPMM

What’s New in Version 0.8.0?

New features

Remote-Shuffle

[#1156] Add Remote Shuffle to OAP [#1172] Introduce a new performance evaluation tool that starts multiple threads to saturate the I/O bottleneck, and deprecate the old micro micro-benchmark [#1206][#1233] Refactor remote shuffle and allow producing a test jar with dependencies [#1239] Disable hash-based shuffle writer by default and submit a minor fix for hash shuffle writer return right id when there is no records [#1234] Reuse file handle in reduce stage

Plasma external cache service

[#1200] Support Plasma and introduce external cache strategy [#1248] Fix a bug on external cache initialization throwing 'unsupported cache' exception [#1260] Fix plasma metrics issue on WebUI [#1284] Update arrow-plasma version and fix API change [#1300][#1318] Change arrow-plasma groupId and version [#1317]Add a configuration to decide whether cache data when using external cache

Beta feature RDD cache PMem extension

[#1346] Move RDD cache PMem extension into OAP [#1349] Fix NUMA changes in RDD Cache

Support DAX KMEM mode

[#1210] Support DAX KMEM mode

New module oap-common

[#1218] Rename common package to OAP-Common

Other feature improvements and refinement

[#1229] Add cacheFallBackDetect to enable cache backend fallback and add detectPM function [#1224][#1287][#1302][#1325][#1339]Add scripts to help users build all OAP modules and dependencies needed

Credit

This release would not have been possible without the following contributors: @gczsjdy(Guo Chenzhao, Intel) @haojinIntel(Jin Hao, Intel) @HongW2019(Wang Hong, Intel) @intelkevinputnam(Kevin Putnam, Intel) @jerrychenhf(Chen Haifeng, Intel) @jikunshang(Ji Kunshang, Intel) @offthewall123(Xu Dingyu, Intel) @winningsix(Xu Cheng, Intel) @yao531441(Yao Qing, Intel) @yeyuqiang (Ye Yuqiang, Intel) @zhixingheyi-tian (Shen Xiangxiang, Intel)

v0.7.0-spark-2.4.4

4 years ago

OAP 0.7.0 is the sixth major release. This release supports Binary and ColumnVector cache for Parquet & ORC, introduces vmemcache and non-evictable cache strategy on DCPMM, refactors cache management framework, enables Index/Data cache separation with different cache strategies on the same or different cache media, and uses an independent OAP cache pool for DRAM. This release has great performance improvements with Binary + Vmemcache strategy. Other major updates include adding the cluster performance test suite, decoupling 3 source code files from Spark, and addressing index issues in production.

What’s New in Version 0.7.0?

New features

Refactor cache management framework

[#1113] Refactor OAP MemoryManager and FiberCacheManager

Support Binary Cache for Parquet

[#1118] Add new DataFiberId named BinaryDataFiberId, rename the old DataFiberId to VectorDataFiberId [#1121] Import ParquetFileReader source code with version 1.10.1 [#1125] Introduce ParquetCacheableFileReader support cache BinaryDataFiber [#1133] Enable Parquet binary cache work in query

Support Binary Cache for ORC

[#1144] [#1146] Derive subclasses from RecordReaderImpl and RecordReaderUtils for implementing ORC Binary Cache [#1148] Enable ORC binary cache work in query [#1157] [#1170] Make RequestedColIds work in ORC Binary Cache and ORC Index, and fix the ORC BinaryCache FiberCache release issue

Introduce vmemcache and non-evictable cache strategy based on DCPMM

[#1137] Introduce vmemcache and non-evictale cache strategy based on DCPMM [#1152] Fix the issue on index not working when vmemcache enables [#1179] [#1197] [#1204] [#1242] Initialize MemoryManager based on cache strategy [#1174] [#1192] [#1195] [#1199] Fix vmemcache issue on cache guardian using too much memory

Support Index and Data cache separation with different cache backend

[#1159] Enable index/data cache separation with different cache backend, and add mix cache backend

Use independent OAP cache pool for DRAM

[#1117] Make the OAP DRAM cache independent of Spark offHeap for better cache management and avoiding Spark Executor launch issues

Bump up Guava version to 18.0

[#1147] Bump up Guava version to 18.0

Add cluster performance test suite

[#1181] Add cluster performance test suite oap-perf-suite to OAP, which can be run as functionality tests for OAP

Other feature improvements and code refinement

[#1136] Lazy initialize the input stream of Parquet file reader to avoid unnecessary filesystem operation [#1126] Make DropIndexCommand concurrently execute on Executor side to improve performance [#1140] Call OrcFilters.createFilter directly and remove OrcFiltersAdapter

Decouple 3 source code files from Spark

[#1128] [#1130] Remove SqlBase.g4 which relies on the source code of Spark SQL. Inject custom SqlParser on OapExtensions class and add a new OapSqlBase.g4 file which only contains the grammar of OAP [#1120] Decouple HiveThriftServer2 and SparkSQLCLIDriver from Spark

Credit

This release would not have been possible without the following contributors:

@HongW2019(Wang Hong, Intel) @haojinIntel(Jin Hao, Intel) @intelkevinputnam(Kevin Putnam, Intel) @ivoson (Huang Tengfei, Baidu) @jerrychenhf(Chen Haifeng, Intel) @jikunshang(Ji Kunshang, Intel) @LuciferYang (Yang Jie, Baidu) @lidinghao (Li Hao, Baidu) @linnashuang (Shuang Linna, Intel) @offthewall123(Xu Dingyu, Intel) @winningsix(Xu Cheng, Intel) @yeyuqiang (Ye Yuqiang, Intel) @yma11(Ma Yan, Intel) @zhixingheyi-tian (Shen Xiangxiang, Intel)

v0.7.0-spark-2.4.4-rc1

4 years ago

v0.6.1-spark-2.4.0-cdh6.3.1-rc1

4 years ago