LakeSoul Versions Save

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

2.1.0

1 year ago

v2.1.0 Release Notes

LakeSoul 2.1.0 brings new Flink CDC sink implementation which supports all tables (with different schemas) in one entire MySQL database sync in one Flink job, automatic schema sync and evolution, automatic new table creation and exactly once guarantee. The currently supported flink version is 1.14.

In 2.1.0 we also reimplement Spark catalog so that it could be used as a standalone catalog rather than a session catalog extension. This change is to avoid some inconsistencies in Spark's v2 table commands, e.g. show tables cannot support v2 tables until 3.3.

Packages for Spark and Flink are separated into two maven submodules. The maven coordinates are com.dmetasoul:lakesoul-spark:2.1.0-spark-3.1.2 and com.dmeatsoul:lakesoul-flink:2.1.0-flink-1.14. All the required transitive dependencies have already been shaded into the released jars.

v2.1.0 发布日志

LakeSoul 2.1.0 增加了全新的 Flink CDC Sink 功能，支持 MySQL 数据库整库千表（支持不同 schema）同步，自动 Schema 变更同步，自动新表感知和严格一次（Exactly Once）语义保证。

Spark 支持部分重写了 Catalog 的实现，使得 Catalog 可以作为非 Session Catalog 扩展使用，主要目的是规避 Spark 在 3.3 版本之前，一些 DDL Command 不支持 V2 表的问题。

Spark 和 Flink 分别拆分成了两个 Maven 子模块。在工程中引用的 Maven 坐标分别是 com.dmetasoul:lakesoul-spark:2.1.0-spark-3.1.2 and com.dmeatsoul:lakesoul-flink:2.1.0-flink-1.14。他们各自的依赖库已经通过 shade 的方式打包到了发布的 jar 包中。

Merged Pull Requests

CDC support v1: add table property to identify change kind column by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/1
Cdc support v2 by @moresun in https://github.com/meta-soul/LakeSoul/pull/3
support merge into sql when can be converted to upsert by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/4
Optimize duplicate tests and code by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/6
support create hash partitioned table by sql by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/7
remove cdc filter from mergescan by @moresun in https://github.com/meta-soul/LakeSoul/pull/9
fix build error and some coding styles by @bakey in https://github.com/meta-soul/LakeSoul/pull/10
Update README.md by @moresun in https://github.com/meta-soul/LakeSoul/pull/13
add a cdc sink example by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/17
update all links in readme to relative by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/18
[Doc] add cdc cn doc by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/19
Bump fastjson from 1.2.75 to 1.2.83 by @dependabot in https://github.com/meta-soul/LakeSoul/pull/38
Catalog refactor by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/45
Bump mysql-connector-java from 8.0.19 to 8.0.28 by @dependabot in https://github.com/meta-soul/LakeSoul/pull/46
Bump postgresql from 42.2.14 to 42.3.3 by @dependabot in https://github.com/meta-soul/LakeSoul/pull/47
bump version to 2.0.0 by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/48
fix maven packaging by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/55
Feature flink order sink by @YangZH-v2 in https://github.com/meta-soul/LakeSoul/pull/56
add parquet-column dependency fix localEnv unable run bug by @YangZH-v2 in https://github.com/meta-soul/LakeSoul/pull/64
support exactly once semantics for flink write by @YangZH-v2 in https://github.com/meta-soul/LakeSoul/pull/65
fix filter bug when cdc column is not used by @F-PHantam in https://github.com/meta-soul/LakeSoul/pull/68
Align hash bucket and sort logic in flink with spark #60 by @YangZH-v2 in https://github.com/meta-soul/LakeSoul/pull/69
Split submodules for maven project by @F-PHantam in https://github.com/meta-soul/LakeSoul/pull/70
Bump postgresql from 42.3.3 to 42.4.1 in /lakesoul-common by @dependabot in https://github.com/meta-soul/LakeSoul/pull/71
add MergeNonNullOp for merge operator by @moresun in https://github.com/meta-soul/LakeSoul/pull/73
add docker compose for local test. fix maven install gpg signing by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/76
clean up unused code by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/77
fix MultiPartitionMergeBucketScan bug by @F-PHantam in https://github.com/meta-soul/LakeSoul/pull/81
Fix flink cdc write event order by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/82
supports database(namespace) & support mysql cdc using flink by @Ceng23333 in https://github.com/meta-soul/LakeSoul/pull/85
Bump snakeyaml from 1.30 to 1.31 in /lakesoul-spark by @dependabot in https://github.com/meta-soul/LakeSoul/pull/88
Support multiple tables sink for Flink CDC by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/86
flink cdc task add argument serverTimeZone by @F-PHantam in https://github.com/meta-soul/LakeSoul/pull/90
Fix maven dependency by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/91

New Contributors

@dmetasoul01 made their first contribution in https://github.com/meta-soul/LakeSoul/pull/1
@moresun made their first contribution in https://github.com/meta-soul/LakeSoul/pull/3
@bakey made their first contribution in https://github.com/meta-soul/LakeSoul/pull/10
@dependabot made their first contribution in https://github.com/meta-soul/LakeSoul/pull/38
@YangZH-v2 made their first contribution in https://github.com/meta-soul/LakeSoul/pull/56
@F-PHantam made their first contribution in https://github.com/meta-soul/LakeSoul/pull/68
@Ceng23333 made their first contribution in https://github.com/meta-soul/LakeSoul/pull/85

Full Changelog: https://github.com/meta-soul/LakeSoul/commits/2.1.0

v2.0.1-spark-3.1.2

1 year ago

What's Changed

fix maven packaging by @dmetasoul01 in https://github.com/meta-soul/LakeSoul/pull/55

v2.0.0-spark-3.1.2

1 year ago

1. Catalog refactoring

Replacing the Cassandra protocol with the Postgres protocol
metadata Use PG protocol to rewrite table operations, partition operations, and data operation related functions, and use transaction mechanism to achieve data submission collision detection to ensure ACID attributes
Interface with Spark and metadata, translate Spark-related metadata operations into the underlying interface, and realize the cross-border distribution between the upper computing platform and the underlying development storage layer

2. DDL

Spark SQL related DDL statements (create alter, etc.) transformation
Spark DataFrame | DataSet related DDL statement (save, etc.) transformation

3. Data Writing

Transformation of SparkSQL-related DML statements (insert into, update, etc.)
Spark DataFrame | DataSet related DML statements (write function, etc.)
LakeSoulTable upsert function transformation
LakeSoulTable compaction function transformation, and support to mount to hive

4. Data Reading

A variety of ParquetScan transformation, remove the write version sorting mechanism, adapt to the new metadata UUID file list format
LakeSoulTable adds a snapshot reading function to read the historical content according to the specified partition version
LakeSoulTable adds a history rollback function to roll back to a certain historical version of the specified partition
Added and modified the default MergeOprator function to make it easier for users to operate Merge results