An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
We are excited to announce the release of Delta Lake 3.1.0. This release includes several exciting new features.
Details by each component.
Delta Spark 3.1.0 is built on Apache Spark™ 3.5. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.
The key features of this release are:
Other notable changes include:
This release of Delta adds a new module called delta-sharing-spark which enables reading Delta tables shared using the Delta Sharing protocol in Apache Spark™. It is migrated from https://github.com/delta-io/delta-sharing/tree/main/spark repository to https://github.com/delta-io/delta/tree/master/sharing repository. Last release version of delta-sharing-spark is 1.0.4 from the previous location. Next release of delta-sharing-spark is with the current release of Delta which is 3.1.0.
Supported read types are: read snapshot of the table, incrementally read the table using streaming or read the changes (Change Data Feed) between two versions of the table.
“Delta Format Sharing” is newly introduced since delta-sharing-spark 3.1, which supports reading shared Delta tables with advanced Delta features such as deletion vectors and column mapping.
Below is an example of reading a Delta table shared using the Delta Sharing protocol in a Spark environment. For more examples refer to the documentation.
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.appName("...")
.master("...")
.config(
"spark.sql.extensions",
"io.delta.sql.DeltaSparkSessionExtension"
).config(
"spark.sql.catalog.spark_catalog",
"org.apache.spark.sql.delta.catalog.DeltaCatalog"
).getOrCreate()
val tablePath = "<profile-file-path>#<share-name>.<schema-name>.<table-name>"
// Batch query
spark.read
.format("deltaSharing")
.option("responseFormat", "delta")
.load(tablePath)
.show(10)
Delta Universal Format (UniForm) allows you to read Delta tables from Iceberg and Hudi (coming soon) clients. Delta 3.1.0 provided the following improvements:
LIST
and MAP
data types and improves compatibility with popular Iceberg reader clients.The Delta Kernel project is a set of Java libraries (Rust will be coming soon!) for building Delta connectors that can read (and, soon, write to) Delta tables without the need to understand the Delta protocol details).
id
mode. Now tables with column mapping id mode can be read by Kernel.slf4j
loggingFor more information, refer to:
The key features of this release are
There are no updates to Standalone in this release.
Ala Luszczak, Allison Portis, Ami Oka, Amogh Akshintala, Andreas Chatzistergiou, Bart Samwel, BjarkeTornager, Christos Stavrakakis, Costas Zarifis, Daniel Tenedorio, Dhruv Arya, EJ Song, Eric Maynard, Felipe Pessoto, Fred Storage Liu, Fredrik Klauss, Gengliang Wang, Gerhard Brueckl, Haejoon Lee, Hao Jiang, Jared Wang, Jiaheng Tang, Jing Wang, Johan Lasperas, Kaiqi Jin, Kam Cheung Ting, Lars Kroll, Li Haoyi, Lin Zhou, Lukas Rupprecht, Mark Jarvin, Max Gekk, Ming DAI, Nick Lanham, Ole Sasse, Paddy Xu, Patrick Leahey, Peter Toth, Prakhar Jain, Renan Tomazoni Pinzon, Rui Wang, Ryan Johnson, Sabir Akhadov, Scott Sandre, Serge Rielau, Shixiong Zhu, Tathagata Das, Thang Long Vu, Tom van Bussel, Venki Korukanti, Vitalii Li, Wei Luo, Wenchen Fan, Xin Zhao, jintao shen, panbingkun
We are excited to announce the preview release of Delta Lake 3.1.0. This release includes several exciting new features.
Delta Spark 3.1.0 is built on Apache Spark™ 3.5. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.
Delta-spark PyPi
) on how to install the pypi locally)The key features of this release are:
This release of Delta adds a new module called delta-sharing-spark which enables reading Delta tables shared using the Delta Sharing protocol in Apache Spark™. It is migrated from https://github.com/delta-io/delta-sharing/tree/main/spark repository to https://github.com/delta-io/delta/tree/master/sharing repository. The last release version of delta-sharing-spark is 1.0.4 from the previous location. Next release of delta-sharing-spark is with the current release of Delta which is 3.1.0.
Supported read types are: read snapshot of the table, incrementally read the table using streaming or read the changes (Change Data Feed) between two versions of the table.
“Delta Format Sharing” is newly introduced since delta-sharing-spark 3.1, which supports reading shared Delta tables with advanced Delta features such as deletion vectors and column mapping.
Below is an example of reading a Delta table shared using the Delta Sharing protocol in a Spark environment. For more examples refer to the documentation.
import org.apache.spark.sql.SparkSession
val spark = SparkSession
.builder()
.appName("...")
.master("...")
.config(
"spark.sql.extensions",
"io.delta.sql.DeltaSparkSessionExtension"
).config(
"spark.sql.catalog.spark_catalog",
"org.apache.spark.sql.delta.catalog.DeltaCatalog"
).getOrCreate()
val tablePath = "<profile-file-path>#<share-name>.<schema-name>.<table-name>"
// Batch query
spark.read
.format("deltaSharing")
.option("responseFormat", "delta")
.load(tablePath)
.show(10)
Delta Universal Format (UniForm) allows you to read Delta tables from Iceberg and Hudi (coming soon) clients. Delta 3.1.0 provided the following improvements:
LIST
and MAP data types and improves compatibility with popular Iceberg reader clients.DESCRIBE EXTENDED TABLE
.REORG TABLE table APPLY (UPGRADE UNIFORM(ICEBERG_COMPAT_VERSION=2))
on existing Delta tables.The Delta Kernel project is a set of Java libraries (Rust will be coming soon!) for building Delta connectors that can read (and, soon, write to) Delta tables without the need to understand the Delta protocol details).
Delta 3.0.0 released the first version of Kernel. In this release, read support is further enhanced and APIs are solidified by taking into account the feedback received from connectors trying out the first version Delta 3.0.0.
For more information, refer to:
Delta-Flink 3.1.0 is built on top of Apache Flink™ 1.16.1.
The key features of this release are
There are no updates to standalone in this release.
Ala Luszczak, Allison Portis, Ami Oka, Amogh Akshintala, Andreas Chatzistergiou, Bart Samwel, BjarkeTornager, Christos Stavrakakis, Costas Zarifis, Daniel Tenedorio, Dhruv Arya, EJ Song, Eric Maynard, Felipe Pessoto, Fred Storage Liu, Fredrik Klauss, Gengliang Wang, Gerhard Brueckl, Haejoon Lee, Hao Jiang, Jared Wang, Jiaheng Tang, Jing Wang, Johan Lasperas, Kaiqi Jin, Kam Cheung Ting, Lars Kroll, Li Haoyi, Lin Zhou, Lukas Rupprecht, Mark Jarvin, Max Gekk, Ming DAI, Nick Lanham, Ole Sasse, Paddy Xu, Patrick Leahey, Peter Toth, Prakhar Jain, Renan Tomazoni Pinzon, Rui Wang, Ryan Johnson, Sabir Akhadov, Scott Sandre, Serge Rielau, Shixiong Zhu, Tathagata Das, Thang Long Vu, Tom van Bussel, Venki Korukanti, Vitalii Li, Wei Luo, Wenchen Fan, Xin Zhao, jintao shen, panbingkun
Download Spark 3.5.0 from https://spark.apache.org/downloads.html
For this preview, we have published the artifacts to a staging repository. Here’s how you can use them:
Add –-repositories https://oss.sonatype.org/content/repositories/iodelta-1133
to the command line arguments.
Example:
spark-submit --packages io.delta:delta-spark_2.12:3.1.0 \
--repositories \
https://oss.sonatype.org/content/repositories/iodelta-1133 examples/examples.py
Currently, Spark shells (PySpark and Scala) do not accept the external repositories option. However, once the artifacts have been downloaded to the local cache, the shells can be run with Delta 3.1.0 by just providing the --packages io.delta:delta-spark_2.12:3.1.0 argument.
bin/spark-shell --packages io.delta:delta-spark_2.12:3.1.0 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1133 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
bin/spark-sql --packages io.delta:delta-spark_2.12:3.1.0 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1133 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
<repositories>
<repository>
<id>staging-repo</id>
<url>https://oss.sonatype.org/content/repositories/iodelta-1133</url>
</repository>
</repositories>
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-spark_2.12</artifactId>
<version>3.1.0</version>
</dependency>
libraryDependencies += "io.delta" %% "delta-spark" % "3.1.0"
resolvers += "Delta" at https://oss.sonatype.org/content/repositories/iodelta-1133
pip install ~/Downloads/delta_spark-3.1.0-py3-none-any.whl
pip show delta-spark
should show output similar to the belowName: delta-spark
Version: 3.1.0
Summary: Python APIs for using Delta Lake with Apache Spark
Home-page: https://github.com/delta-io/delta/
Author: The Delta Lake Project Authors
Author-email: [email protected]
License: Apache-2.0
Location: <user-home>/.conda/envs/delta-release/lib/python3.8/site-packages
Requires: importlib-metadata, pyspark
We are excited to announce the preview release of Delta Lake 3.1.0. This release includes several exciting new features.
Delta Spark 3.1.0 is built on Apache Spark™ 3.5. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.
Delta-spark PyPi
) on how to install the pypi locally)VACCUM
command to be Delta protocol complaint so that the commands garbage collects the files that are truly not needed anymore.Delta Universal Format (UniForm) will allow you to read Delta tables with Hudi and Iceberg clients. Delta 3.1.0 provided the following improvements:
REORG TABLE table APPLY (UPGRADE UNIFORM(ICEBERG_COMPAT_VERSION=2))
to upgrade existing Delta tables to Uniform.The Delta Kernel project is a set of Java libraries (Rust will be coming soon!) for building Delta connectors that can read (and, soon, write to) Delta tables without the need to understand the Delta protocol details).
Delta 3.0.0 released the first version of Kernel. In this release, read support is further enhanced and APIs are solidified by taking into account the feedback received from connectors trying out the first version Delta 3.0.0.
For more information, refer to:
Delta-Flink 3.1.0 is built on top of Apache Flink™ 1.16.1.
The key features of this release are
There are no updates to standalone in this release.
Ala Luszczak, Allison Portis, Ami Oka, Amogh Akshintala, Andreas Chatzistergiou, Bart Samwel, BjarkeTornager, Christos Stavrakakis, Costas Zarifis, Daniel Tenedorio, Dhruv Arya, EJ Song, Eric Maynard, Felipe Pessoto, Fred Storage Liu, Fredrik Klauss, Gengliang Wang, Gerhard Brueckl, Haejoon Lee, Hao Jiang, Jared Wang, Jiaheng Tang, Jing Wang, Johan Lasperas, Kaiqi Jin, Kam Cheung Ting, Lars Kroll, Li Haoyi, Lin Zhou, Lukas Rupprecht, Mark Jarvin, Max Gekk, Ming DAI, Nick Lanham, Ole Sasse, Paddy Xu, Patrick Leahey, Peter Toth, Prakhar Jain, Renan Tomazoni Pinzon, Rui Wang, Ryan Johnson, Sabir Akhadov, Scott Sandre, Serge Rielau, Shixiong Zhu, Tathagata Das, Thang Long Vu, Tom van Bussel, Venki Korukanti, Vitalii Li, Wei Luo, Wenchen Fan, Xin Zhao, ericm-db, jintao shen, panbingkun
Download Spark 3.5.0 from https://spark.apache.org/downloads.html
For this preview, we have published the artifacts to a staging repository. Here’s how you can use them:
Add –-repositories https://oss.sonatype.org/content/repositories/iodelta-1132
to the command line arguments.
Example:
spark-submit --packages io.delta:delta-spark_2.12:3.1.0 \
--repositories \
https://oss.sonatype.org/content/repositories/iodelta-1132 examples/examples.py
Currently, Spark shells (PySpark and Scala) do not accept the external repositories option. However, once the artifacts have been downloaded to the local cache, the shells can be run with Delta 3.1.0 by just providing the --packages io.delta:delta-spark_2.12:3.1.0 argument.
bin/spark-shell --packages io.delta:delta-spark_2.12:3.1.0 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1132 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
bin/spark-sql --packages io.delta:delta-spark_2.12:3.1.0 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1132 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
<repositories>
<repository>
<id>staging-repo</id>
<url>https://oss.sonatype.org/content/repositories/iodelta-1132</url>
</repository>
</repositories>
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-spark_2.12</artifactId>
<version>3.1.0</version>
</dependency>
libraryDependencies += "io.delta" %% "delta-spark" % "3.1.0"
resolvers += "Delta" at https://oss.sonatype.org/content/repositories/iodelta-1132
pip install ~/Downloads/delta_spark-3.1.0-py3-none-any.whl
pip show delta-spark
should show output similar to the belowName: delta-spark
Version: 3.1.0
Summary: Python APIs for using Delta Lake with Apache Spark
Home-page: https://github.com/delta-io/delta/
Author: The Delta Lake Project Authors
Author-email: [email protected]
License: Apache-2.0
Location: <user-home>/.conda/envs/delta-release/lib/python3.8/site-packages
Requires: importlib-metadata, pyspark
We are excited to announce the preview release of Delta Lake 3.1.0. This release includes several exciting new features.
Delta Spark 3.1.0 is built on Apache Spark™ 3.5. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.
Delta-spark PyPi
) on how to install the pypi locally)Delta Universal Format (UniForm) will allow you to read Delta tables with Hudi and Iceberg clients. Delta 3.1.0 provided the following improvements:
REORG TABLE table APPLY (UPGRADE UNIFORM(ICEBERG_COMPAT_VERSION=2))
to upgrade existing Delta tables to Uniform.The Delta Kernel project is a set of Java libraries (Rust will be coming soon!) for building Delta connectors that can read (and, soon, write to) Delta tables without the need to understand the Delta protocol details).
Delta 3.0.0 released the first version of Kernel. In this release, read support is further enhanced and APIs are solidified by taking into account the feedback received from connectors trying out the first version Delta 3.0.0.
For more information, refer to:
Delta-Flink 3.1.0 is built on top of Apache Flink™ 1.16.1.
The key features of this release are
There are no updates to standalone in this release.
Ala Luszczak, Allison Portis, Ami Oka, Amogh Akshintala, Andreas Chatzistergiou, Bart Samwel, BjarkeTornager, Christos Stavrakakis, Costas Zarifis, Daniel Tenedorio, Dhruv Arya, EJ Song, Eric Maynard, Felipe Pessoto, Fred Storage Liu, Fredrik Klauss, Gengliang Wang, Gerhard Brueckl, Haejoon Lee, Hao Jiang, Jared Wang, Jiaheng Tang, Jing Wang, Johan Lasperas, Kaiqi Jin, Kam Cheung Ting, Lars Kroll, Li Haoyi, Lin Zhou, Lukas Rupprecht, Mark Jarvin, Max Gekk, Ming DAI, Nick Lanham, Ole Sasse, Paddy Xu, Patrick Leahey, Peter Toth, Prakhar Jain, Renan Tomazoni Pinzon, Rui Wang, Ryan Johnson, Sabir Akhadov, Scott Sandre, Serge Rielau, Shixiong Zhu, Tathagata Das, Thang Long Vu, Tom van Bussel, Venki Korukanti, Vitalii Li, Wei Luo, Wenchen Fan, Xin Zhao, ericm-db, jintao shen, panbingkun
Download Spark 3.5.0 from https://spark.apache.org/downloads.html
For this preview, we have published the artifacts to a staging repository. Here’s how you can use them:
Add –-repositories https://oss.sonatype.org/content/repositories/iodelta-1129
to the command line arguments.
Example:
spark-submit --packages io.delta:delta-spark_2.12:3.1.0 \
--repositories \
https://oss.sonatype.org/content/repositories/iodelta-1129 examples/examples.py
Currently, Spark shells (PySpark and Scala) do not accept the external repositories option. However, once the artifacts have been downloaded to the local cache, the shells can be run with Delta 3.1.0 by just providing the --packages io.delta:delta-spark_2.12:3.1.0 argument.
bin/spark-shell --packages io.delta:delta-spark_2.12:3.1.0 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1129 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
bin/spark-sql --packages io.delta:delta-spark_2.12:3.1.0 \
--repositories https://oss.sonatype.org/content/repositories/iodelta-1129 \
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
<repositories>
<repository>
<id>staging-repo</id>
<url>https://oss.sonatype.org/content/repositories/iodelta-1129</url>
</repository>
</repositories>
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-spark_2.12</artifactId>
<version>3.1.0</version>
</dependency>
libraryDependencies += "io.delta" %% "delta-spark" % "3.1.0"
resolvers += "Delta" at https://oss.sonatype.org/content/repositories/iodelta-1129
pip install ~/Downloads/delta_spark-3.1.0-py3-none-any.whl
pip show delta-spark
should show output similar to the belowName: delta-spark
Version: 3.1.0
Summary: Python APIs for using Delta Lake with Apache Spark
Home-page: https://github.com/delta-io/delta/
Author: The Delta Lake Project Authors
Author-email: [email protected]
License: Apache-2.0
Location: <user-home>/.conda/envs/delta-release/lib/python3.8/site-packages
Requires: importlib-metadata, pyspark
We are excited to announce the final release of Delta Lake 3.0.0. This release includes several exciting new features and artifacts.
Here are the most important aspects of 3.0.0:
Unlike the initial preview release, Delta Spark is now built on top of Apache Spark™ 3.5. See the Delta Spark section below for more details.
Delta Universal Format (UniForm) will allow you to read Delta tables with Hudi and Iceberg clients. Iceberg support is available with this release. UniForm takes advantage of the fact that all table storage formats, such as Delta, Iceberg, and Hudi, actually consist of Parquet data files and a metadata layer. In this release, UniForm automatically generates Iceberg metadata and commits to Hive metastore, allowing Iceberg clients to read Delta tables as if they were Iceberg tables. Create a UniForm-enabled table using the following command:
CREATE TABLE T (c1 INT) USING DELTA TBLPROPERTIES (
'delta.universalFormat.enabledFormats' = 'iceberg');
Every write to this table will automatically keep Iceberg metadata updated. See the documentation here for more details, and the key implementations here and here.
The Delta Kernel project is a set of Java libraries (Rust will be coming soon!) for building Delta connectors that can read (and, soon, write to) Delta tables without the need to understand the Delta protocol details).
You can use this library to do the following:
Reading a Delta table with Kernel APIs is as follows.
TableClient myTableClient = DefaultTableClient.create() ; // define a client
Table myTable = Table.forPath(myTableClient, "/delta/table/path"); // define what table to scan
Snapshot mySnapshot = myTable.getLatestSnapshot(myTableClient); // define which version of table to scan
Predicate scanFilter = ... // define the predicate
Scan myScan = mySnapshot.getScanBuilder(myTableClient) // specify the scan details
.withFilters(scanFilter)
.build();
Scan.readData(...) // returns the table data
Full example code can be found here.
For more information, refer to:
This release of Delta contains the Kernel Table API and default TableClient API definitions and implementation which allow:
All previous connectors from https://github.com/delta-io/connectors have been moved to this repository (https://github.com/delta-io/delta) as we aim to unify our Delta connector ecosystem structure. This includes Delta-Standalone, Delta-Flink, Delta-Hive, PowerBI, and SQL-Delta-Import. The repository https://github.com/delta-io/connectors is now deprecated.
Delta Spark 3.0.0 is built on top of Apache Spark™ 3.5. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13. Note that the Delta Spark maven artifact has been renamed from delta-core to delta-spark.
The key features of this release are:
Path.toUri.toString
calls for each row in a table, resulting in a several hundred percent speed boost on DELETE operations (only when Deletion Vectors have been enabled on the table).DROP COLUMN
and RENAME COLUMN
have been used. This includes streaming support for Change Data Feed. See the documentation here for more details.delta.dataSkippingStatsColumns
. Previously, Delta would only collect file-skipping statistics for the first N columns in the table schema (default to 32). Now, users can easily customize this.CONVERT TO DELTA
. This feature was excluded from the Delta Lake 2.4 release since Iceberg did not yet support Apache Spark 3.4 (or 3.5). This command generates a Delta table in the same location and does not rewrite any parquet files.spark.sql.storeAssignmentPolicy
instead of spark.sql.ansi.enabled
.Other notable changes include
OPTIMIZE <catalog>.<db>.<tbl>
will work.overwriteSchema
when partitionOverwriteMode is set to dynamicdata
subdirectory via the SQL configuration spark.databricks.delta.write.dataFilesToSubdir
. This is used to add UniForm support on BigQuery.Delta-Flink 3.0.0 is built on top of Apache Flink™ 1.16.1.
The key features of this release are
Other notable changes include
The key features in this release are:
io.delta.standalone.checkpointing.enabled
to false
. This is only safe and suggested to do if another job will periodically perform the checkpointing.SHALLOW CLONE
s and create Delta tables with external files.Adam Binford, Ahir Reddy, Ala Luszczak, Alex, Allen Reese, Allison Portis, Ami Oka, Andreas Chatzistergiou, Animesh Kashyap, Anonymous, Antoine Amend, Bart Samwel, Bo Gao, Boyang Jerry Peng, Burak Yavuz, CabbageCollector, Carmen Kwan, ChengJi-db, Christopher Watford, Christos Stavrakakis, Costas Zarifis, Denny Lee, Desmond Cheong, Dhruv Arya, Eric Maynard, Eric Ogren, Felipe Pessoto, Feng Zhu, Fredrik Klauss, Gengliang Wang, Gerhard Brueckl, Gopi Krishna Madabhushi, Grzegorz Kołakowski, Hang Jia, Hao Jiang, Herivelton Andreassa, Herman van Hovell, Jacek Laskowski, Jackie Zhang, Jiaan Geng, Jiaheng Tang, Jiawei Bao, Jing Wang, Johan Lasperas, Jonas Irgens Kylling, Jungtaek Lim, Junyong Lee, K.I. (Dennis) Jung, Kam Cheung Ting, Krzysztof Chmielewski, Lars Kroll, Lin Ma, Lin Zhou, Luca Menichetti, Lukas Rupprecht, Martin Grund, Min Yang, Ming DAI, Mohamed Zait, Neil Ramaswamy, Ole Sasse, Olivier NOUGUIER, Pablo Flores, Paddy Xu, Patrick Pichler, Paweł Kubit, Prakhar Jain, Pulkit Singhal, RunyaoChen, Ryan Johnson, Sabir Akhadov, Satya Valluri, Scott Sandre, Shixiong Zhu, Siying Dong, Son, Tathagata Das, Terry Kim, Tom van Bussel, Venki Korukanti, Wenchen Fan, Xinyi, Yann Byron, Yaohua Zhao, Yijia Cui, Yuhong Chen, Yuming Wang, Yuya Ebihara, Zhen Li, aokolnychyi, gurunath, jintao shen, maryannxue, noelo, panbingkun, windpiger, wwang-talend, sherlockbeard
We are excited to announce the preview release of Delta Lake 3.0.0. This release includes several exciting new features and artifacts.
Here are the most important aspects of 3.0.0.
Delta Universal Format (UniForm) will allow you to read Delta tables with Hudi and Iceberg clients. Iceberg support is available with this preview and Hudi will be coming soon. UniForm takes advantage of the fact that all table storage formats (Delta, Iceberg, and Hudi) actually consist of Parquet data files and a metadata layer. In this release, UniForm automatically generates Iceberg metadata, allowing Iceberg clients to read Delta tables as if they were Iceberg tables. Create an UniForm-enabled table using the following command:
CREATE TABLE T (c1 INT) USING DELTA TBLPROPERTIES (
'delta.universalFormat.enabledFormats' = 'iceberg');
Every write to this table will automatically keep Iceberg metadata updated. See the documentation here for more details.
The Delta Kernel project is a set of Java libraries (Rust will be coming soon) for building Delta connectors that can read (and soon, write to) Delta tables without the need to understand the Delta protocol details).
You can use this library to do the following:
Here is an example of a simple table scan with a filter:
TableClient myTableClient = DefaultTableClient.create() ; // define a client (more details below)
Table myTable = Table.forPath("/delta/table/path"); // define what table to scan
Snapshot mySnapshot = myTable.getLatestSnapshot(myTableClient); // define which version of table to scan
Scan myScan = mySnapshot.getScanBuilder(myTableClient) // specify the scan details
.withFilters(scanFilter)
.build();
Scan.readData(...) // returns the table data
For more information, refer to Delta Kernel Github docs.
All previous connectors from https://github.com/delta-io/connectors have been moved to this repository (https://github.com/delta-io/delta) as we aim to unify our Delta connector ecosystem structure. This includes Delta-Standalone, Delta-Flink, Delta-Hive, PowerBI, and SQL-Delta-Import. The repository https://github.com/delta-io/connectors is now deprecated.
Delta Spark 3.0.0 is built on top of Apache Spark™ 3.4. Similar to Apache Spark, we have released Maven artifacts for both Scala 2.12 and Scala 2.13. Note that the Delta Spark maven artifact has been renamed from delta-core
to delta-spark
.
The key features of this release are
Path.toUri.toString
calls for each row in a table, resulting in a several hundred percent speed boost on DELETE operations (only when Deletion Vectors have been enabled on the table).DROP COLUMN
and RENAME COLUMN
have been used. This includes streaming support for Change Data Feed. See the documentation here for more details.delta.dataSkippingStatsColumns
. Previously, Delta would only collect file-skipping statistics for the first N
columns in the table schema (default to 32). Now, users can easily customize this.CONVERT TO DELTA
. This feature was excluded from the Delta Lake 2.4 release since Iceberg did not yet support Apache Spark 3.4. This command generates a Delta table in the same location and does not rewrite any parquet files.Other notable changes include
data
subdirectory via the SQL configuration spark.databricks.delta.write.dataFilesToSubdir
. This is used to add UniForm support on BigQuery.Delta-Flink 3.0.0 is built on top of Apache Flink™ 1.16.1.
The key features of this release are
CREATE
Delta tables, SELECT
data from them (uses the Delta Source), and INSERT
new data into them (uses the Delta Sink). Note: for correct operations on the Delta tables, you must first configure the Delta Catalog using CREATE CATALOG
before running a SQL command on Delta tables. For more information, please see the documentation here.The key features in this release are:
io.delta.standalone.checkpointing.enabled
to false
. This is only safe and suggested to do if another job will periodically perform the checkpointing.SHALLOW CLONE
s and create Delta tables with external files.Liquid Clustering, a new effort to revamp how clustering works in Delta, which addresses the shortcomings of Hive-style partitioning and current ZORDER clustering. This feature will be available to preview soon; meanwhile, for more information, please refer to Liquid Clustering #1874.
Ahir Reddy, Ala Luszczak, Alex, Allen Reese, Allison Portis, Antoine Amend, Bart Samwel, Boyang Jerry Peng, CabbageCollector, Carmen Kwan, Christos Stavrakakis, Denny Lee, Desmond Cheong, Eric Ogren, Felipe Pessoto, Fred Liu, Fredrik Klauss, Gerhard Brueckl, Gopi Krishna Madabhushi, Grzegorz Kołakowski, Herivelton Andreassa, Jackie Zhang, Jiaheng Tang, Johan Lasperas, Junyong Lee, K.I. (Dennis) Jung, Kam Cheung Ting, Krzysztof Chmielewski, Lars Kroll, Lin Ma, Luca Menichetti, Lukas Rupprecht, Ming DAI, Mohamed Zait, Ole Sasse, Olivier Nouguier, Pablo Flores, Paddy Xu, Patrick Pichler, Paweł Kubit, Prakhar Jain, Ryan Johnson, Sabir Akhadov, Satya Valluri, Scott Sandre, Shixiong Zhu, Siying Dong, Son, Tathagata Das, Terry Kim, Tom van Bussel, Venki Korukanti, Wenchen Fan, Yann Byron, Yaohua Zhao, Yuhong Chen, Yuming Wang, Yuya Ebihara, aokolnychyi, gurunath, jintao shen, maryannxue, noelo, panbingkun, windpiger, wwang-talend
We are excited to announce the release of Delta Lake 2.4.0 on Apache Spark 3.4. Similar to Apache Spark™, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.
DELETE
command. Previously, when deleting rows from a Delta table, any file with at least one matching row would be rewritten. With Deletion Vectors these expensive rewrites can be avoided. See What are deletion vectors? for more details.PURGE
to remove Deletion Vectors from the current version of a Delta table by rewriting any data files with deletion vectors. See the documentation for more details.REPLACE WHERE
expressions in SQL to selectively overwrite data. Previously “replaceWhere” options were only supported in the DataFrameWriter APIs.WHEN NOT MATCHED BY SOURCE
clauses in SQL for the Merge command.INSERT INTO
queries. Delta will automatically generate the values for any unspecified generated columns.TimestampNTZ
data type added in Spark 3.3. Using TimestampNTZ
requires a Delta protocol upgrade; see the documentation for more information.char
or varchar
column to a compatible type in the ALTER TABLE
command. The new behavior is the same as in Apache Spark and allows upcasting from char
or varchar
to varchar
or string
.overwriteSchema
with dynamic partition overwrite. This can corrupt the table as not all the data may be removed, and the schema of the newly written partitions may not match the schema of the unchanged partitions.DataFrame
for Change Data Feed reads when there are no commits within the timestamp range provided. Previously an error would be thrown.Note: the Delta Lake 2.4.0 release does not include the Iceberg to Delta converter because iceberg-spark-runtime
does not support Spark 3.4 yet. The Iceberg to Delta converter is still supported when using Delta 2.3 with Spark 3.3.
Alkis Evlogimenos, Allison Portis, Andreas Chatzistergiou, Anton Okolnychyi, Bart Samwel, Bo Gao, Carl Fu, Chaoqin Li, Christos Stavrakakis, David Lewis, Desmond Cheong, Dhruv Shah, Eric Maynard, Fred Liu, Fredrik Klauss, Haejoon Lee, Hussein Nagree, Jackie Zhang, Jintian Liang, Johan Lasperas, Lars Kroll, Lukas Rupprecht, Matthew Powers, Ming DAI, Ming Dai, Naga Raju Bhanoori, Paddy Xu, Prakhar Jain, Rahul Shivu Mahadev, Rui Wang, Ryan Johnson, Sabir Akhadov, Satya Valluri, Scott Sandre, Shixiong Zhu, Tom van Bussel, Venki Korukanti, Vitalii Li, Wenchen Fan, Xi Liang, Yaohua Zhao, Yuming Wang
We are excited to announce the release of Delta Lake 2.3.0 on Apache Spark 3.3. Similar to Apache Spark™, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.
CONVERT TO DELTA
. This generates a Delta table in the same location and does not rewrite any parquet files. See the documentation for details.SHALLOW CLONE
for Delta, Parquet, and Iceberg tables to clone a source table without copying the data files. SHALLOW CLONE
creates a copy of the source table’s definition but refers to the source table’s data files.INSERT
/DELETE
/UPDATE
/MERGE
etc. operations using SQL configurations spark.databricks.delta.write.txnAppId
and spark.databricks.delta.write.txnVersion
.DeltaTable
APIs. SQL Support will be added in Spark 3.4.CREATE TABLE LIKE
to create empty Delta tables using the definition and metadata of an existing table or view.table_changes
table-valued function.DROP COLUMN
and RENAME COLUMN
have been used. See the documentation for more details.delta.enableFastS3AListFrom
to true
to enable it.VACUUM
operations in the transaction log. With this feature, VACUUM
operations and their associated metrics (e.g. numDeletedFiles
) will now show up in table history.MERGE
for UPDATE SET <assignments> and INSERT (...) VALUES (...) actions
. Previously, schema evolution was only supported for UPDATE SET *
and INSERT *
actions..show()
support for COUNT(*)
aggregate pushdown.df.saveAsTable
for overwrite and append mode.trunc
and date_trunc
functions.date_format
function with format yyyy-MM-dd
.replaceWhere
with the DataFrame V2 overwrite API to correctly evaluate less than conditions.INSERT OVERWRITE
with complex data types when the source schema is read incompatible.VACUUM
where sometimes the default retention period was used to remove files instead of the retention period specified in the table properties.deltaTable.details()
Python/Scala/Java API.VACUUM table_name DRY RUN
.Allison Portis, Andreas Chatzistergiou, Andrew Li, Bo Zhang, Brayan Jules, Burak Yavuz, Christos Stavrakakis, Daniel Tenedorio, Dhruv Shah, Felipe Pessoto, Fred Liu, Fredrik Klauss, Gengliang Wang, Haejoon Lee, Hussein Nagree, Jackie Zhang, Jiaheng Tang, Jintian Liang, Johan Lasperas, Jungtaek Lim, Kam Cheung Ting, Koki Otsuka, Lars Kroll, Lin Ma, Lukas Rupprecht, Ming DAI, Mitchell Riley, Ole Sasse, Paddy Xu, Prakhar Jain, Pranav, Rahul Shivu Mahadev, Rajesh Parangi, Ryan Johnson, Scott Sandre, Serge Rielau, Shixiong Zhu, Slim Ouertani, Tobias Fabritz, Tom van Bussel, Tushar Machavolu, Tyson Condie, Venki Korukanti, Vitalii Li, Wenchen Fan, Xinyi Yu, Yaohua Zhao, Yingyi Bu
We are excited to announce the preview release of Delta Lake 2.3.0 on Apache Spark 3.3. Similar to Apache Spark™, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.
CONVERT TO DELTA
. This generates a Delta table in the same location and does not rewrite any parquet files.SHALLOW CLONE
for Delta, Parquet, and Iceberg tables to clone a source table without copying the data files. SHALLOW CLONE
creates a copy of the source table’s definition but refers to the source table’s data files.INSERT
/DELETE
/UPDATE
/MERGE
etc. operations using SQL configurations spark.databricks.delta.write.txnAppId
and spark.databricks.delta.write.txnVersion
.DeltaTable
APIs. SQL Support will be added in Spark 3.4.CREATE TABLE LIKE
to create empty Delta tables using the definition and metadata of an existing table or view.table_changes
table-valued function.DROP COLUMN
and RENAME COLUMN
have been used.delta.enableFastS3AListFrom
to true
to enable it.VACUUM
operations in the transaction log. With this feature, VACUUM
operations and their associated metrics (e.g. numDeletedFiles
) will now show up in table history.MERGE
for UPDATE SET <assignments> and INSERT (...) VALUES (...) actions
. Previously, schema evolution was only supported for UPDATE SET *
and INSERT *
actions..show()
support for COUNT(*)
aggregate pushdown.df.saveAsTable
for overwrite and append mode.trunc
and date_trunc
functions.date_format
function with format yyyy-MM-dd
.replaceWhere
with the DataFrame V2 overwrite API to correctly evaluate less than conditions.INSERT OVERWRITE
with complex data types when the source schema is read incompatible.VACUUM
where sometimes the default retention period was used to remove files instead of the retention period specified in the table properties.deltaTable.details()
Python/Scala/Java API.VACUUM table_name DRY RUN
.For this preview we have published the artifacts to a staging repository. Here’s how you can use them:
–-repositories https://oss.sonatype.org/content/repositories/iodelta-1066/
to the command line arguments. For example:
spark-submit --packages io.delta:delta-core_2.12:2.3.0rc1 --repositories https://oss.sonatype.org/content/repositories/iodelta-1066/ examples/examples.py
2.3.0rc1
by just providing the --packages io.delta:delta-core_2.12:2.3.0rc1
argument.<repositories>
<repository>
<id>staging-repo</id>
<url> https://oss.sonatype.org/content/repositories/iodelta-1066/</url>
</repository>
</repositories>
<dependency>
<groupId>io.delta</groupId>
<artifactId>delta-core_2.12</artifactId>
<version>2.3.0rc1</version>
</dependency>
libraryDependencies += "io.delta" %% "delta-core" % "2.3.0rc1"
resolvers += "Delta" at https://oss.sonatype.org/content/repositories/iodelta-1066/
pip install -i https://test.pypi.org/simple/ delta-spark==2.3.0rc1
Allison Portis, Andreas Chatzistergiou, Andrew Li, Bo Zhang, Brayan Jules, Burak Yavuz, Christos Stavrakakis, Daniel Tenedorio, Dhruv Shah, Felipe Pessoto, Fred Liu, Fredrik Klauss, Gengliang Wang, Haejoon Lee, Hussein Nagree, Jackie Zhang, Jiaheng Tang, Jintian Liang, Johan Lasperas, Jungtaek Lim, Kam Cheung Ting, Koki Otsuka, Lars Kroll, Lin Ma, Lukas Rupprecht, Ming DAI, Mitchell Riley, Ole Sasse, Paddy Xu, Prakhar Jain, Pranav, Rahul Shivu Mahadev, Rajesh Parangi, Ryan Johnson, Scott Sandre, Serge Rielau, Shixiong Zhu, Slim Ouertani, Tobias Fabritz, Tom van Bussel, Tushar Machavolu, Tyson Condie, Venki Korukanti, Vitalii Li, Wenchen Fan, Xinyi Yu, Yaohua Zhao, Yingyi Bu
We are excited to announce the release of Delta Lake 2.0.2 on Apache Spark 3.2. This release contains important bug fixes and a few high-demand usability improvements over 2.0.1 and it is recommended that users update to 2.0.2. Similar to Apache Spark™, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.
This release includes the following bug fixes and improvements:
numDeletedFiles
) will now show up in table history.spark.databricks.delta.write.txnAppId
and spark.databricks.delta.write.txnVersion
.
Support passing Hadoop configurations via DeltaTable API
from delta.tables import DeltaTable
hadoop_config = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "...",
"fs.azure.account.oauth2.client.id": "...",
"fs.azure.account.oauth2.client.secret": "...",
"fs.azure.account.oauth2.client.endpoint": "..."
}
delta_table = DeltaTable.forPath(spark, <table-path>, hadoop_config)
DeltaTableBuilder:executeZOrderBy
Java API which allows users to pass in varargs instead of a List._delta_log
were malformed. For example, an add
action with a missing }
would be skipped. Now, queries will fail fast, preventing inaccurate results.Credits: Helge Brügner, Jiaheng Tang, Mitchell Riley, Ryan Johnson, Scott Sandre, Venki Korukanti, Jintao Shen, Yann Byron