Lakehouse Engine Versions Save

The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.

v1.19.0

2 months ago

v1.18.0

3 months ago
  • Added a feature to collect Lakehouse Engine Usage Statistics
  • Improve Lakehouse Engine Tests performance by increase spark driver memory
  • Added audit-dep-safety to assess safety of library dependencies
  • Upgrade paramiko library from 2.12.0 to 3.4.0
  • Upgrade transitive dependencies

v1.17.0

7 months ago
  • Upgrade pyspark from 3.3.2 to 3.4.1
  • Upgrade delta-spark from 2.2.0 to 2.4.0
  • Upgrade ydata-profiling from 4.5.1 to 4.6.0 (and update all transitive dependencies accordingly)
  • Fix Hash Masker trasformer which was wrongly dropping the original columns
  • Fix list/delete S3 objects limitation of 1000 objects, by implementing pagination

v1.16.1

7 months ago
  • Allow both batch and streaming sensors for Delta data formats (only streaming was allowed previously)
  • Apply a fix to the expect_column_values_to_be_date_not_older_than which was not dealing properly with Timestamps

v1.16.0

7 months ago

!!!First Open Source Version!!! 🚀

  • Migrated Great Expectations from 0.16.5 to 0.17.11
  • Migrated from pandas_profiling 3.4.0 to ydata_profiling 4.5.1
  • Solved pytest warnings
  • New Custom Expectation expect_column_values_to_be_date_not_older_than
  • Added S3 Glacier archival related capabilities to the File Manager class