Spark Acid Versions Save

ACID Data Source for Apache Spark based on Hive ACID

3 years ago

Features/Improvements

Add MERGE command Support. (issue-26)
Performance improvements in acid writer. (issue-88)
Parallelizing Split computation for Acid Tables. (issue-78)
Add Partition Pruner as a fallback when HMS pruning fails. (issue-21)
Don't allocate writeIds for Read transactions.
Reduce wait time for lock acquisition to around 5 mins by default and make it configurable. (issue-55)

Notable Bug Fixes:

4 years ago

Major Changes:

Support for INSERT into FullACID and InsertOnly Tables in ORC.
Support for UPDATE and DELETE for Full ACID Table in ORC including the SQL support.
Support for Spark Native Reader for ACID tables in ORC
Support for Structured Streaming Writes into ACID tables and using ACID tables as streaming sink

Notable Improvements:

Adding check for Transaction validity after it acquires lock, so that it is working on consistent states of table
Support for dynamically setting version for Hive and Spark
Fix reading ACID tables with row ids.
Support for Writing Date and TimeStamp columns.

Credit: Rajkumar Iyer, Amogh Margoor, Abhishek Dixit, Megha Thakkar, Herbert Liao, Mahesh Kumar Behera