LakeSoul Versions Save

LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.

2.1.0

1 year ago

v2.1.0 Release Notes

LakeSoul 2.1.0 brings new Flink CDC sink implementation which supports all tables (with different schemas) in one entire MySQL database sync in one Flink job, automatic schema sync and evolution, automatic new table creation and exactly once guarantee. The currently supported flink version is 1.14.

In 2.1.0 we also reimplement Spark catalog so that it could be used as a standalone catalog rather than a session catalog extension. This change is to avoid some inconsistencies in Spark's v2 table commands, e.g. show tables cannot support v2 tables until 3.3.

Packages for Spark and Flink are separated into two maven submodules. The maven coordinates are com.dmetasoul:lakesoul-spark:2.1.0-spark-3.1.2 and com.dmeatsoul:lakesoul-flink:2.1.0-flink-1.14. All the required transitive dependencies have already been shaded into the released jars.

v2.1.0 发布日志

LakeSoul 2.1.0 增加了全新的 Flink CDC Sink 功能,支持 MySQL 数据库整库千表(支持不同 schema)同步,自动 Schema 变更同步,自动新表感知和严格一次(Exactly Once)语义保证。

Spark 支持部分重写了 Catalog 的实现,使得 Catalog 可以作为非 Session Catalog 扩展使用,主要目的是规避 Spark 在 3.3 版本之前,一些 DDL Command 不支持 V2 表的问题。

Spark 和 Flink 分别拆分成了两个 Maven 子模块。在工程中引用的 Maven 坐标分别是 com.dmetasoul:lakesoul-spark:2.1.0-spark-3.1.2 and com.dmeatsoul:lakesoul-flink:2.1.0-flink-1.14。他们各自的依赖库已经通过 shade 的方式打包到了发布的 jar 包中。

Merged Pull Requests

New Contributors

Full Changelog: https://github.com/meta-soul/LakeSoul/commits/2.1.0

v2.0.1-spark-3.1.2

1 year ago

What's Changed

v2.0.0-spark-3.1.2

1 year ago

1. Catalog refactoring

  1. Replacing the Cassandra protocol with the Postgres protocol
  2. metadata Use PG protocol to rewrite table operations, partition operations, and data operation related functions, and use transaction mechanism to achieve data submission collision detection to ensure ACID attributes
  3. Interface with Spark and metadata, translate Spark-related metadata operations into the underlying interface, and realize the cross-border distribution between the upper computing platform and the underlying development storage layer

2. DDL

  1. Spark SQL related DDL statements (create alter, etc.) transformation
  2. Spark DataFrame | DataSet related DDL statement (save, etc.) transformation

3. Data Writing

  1. Transformation of SparkSQL-related DML statements (insert into, update, etc.)
  2. Spark DataFrame | DataSet related DML statements (write function, etc.)
  3. LakeSoulTable upsert function transformation
  4. LakeSoulTable compaction function transformation, and support to mount to hive

4. Data Reading

  1. A variety of ParquetScan transformation, remove the write version sorting mechanism, adapt to the new metadata UUID file list format
  2. LakeSoulTable adds a snapshot reading function to read the historical content according to the specified partition version
  3. LakeSoulTable adds a history rollback function to roll back to a certain historical version of the specified partition
  4. Added and modified the default MergeOprator function to make it easier for users to operate Merge results