Kartothek Versions Save

A consistent table management library in python

v5.3.0

2 years ago

Version 5.3.0 (2021-12-10)

  • Add Deprecation warnings and migration helpers in order to facilitate the Kartothek version 6.0.0 migration.
  • Removed warning for distinct categoricals (#501)

v5.2.0

2 years ago

Version 5.2.0 (2021-11-22)

  • Remove support for Python 3.6
  • Allow pyarrow<7 as a dependency.

v5.1.0

2 years ago

Version 5.1.0 (2021-07-05)

  • Add ~kartothek.io.eager.copy_dataset{.interpreted-text role="meth"} to copy and optionally rename datasets within one store or between stores (eager only)
  • Add renaming option to ~kartothek.io.eager_cube.copy_cube{.interpreted-text role="meth"}
  • Add predicates to cube condition converter to ~kartothek.utils.predicate_converter{.interpreted-text role="meth"}

v5.0.0

2 years ago

Version 5.0.0 (2021-06-23)

This release rolls all the changes introduced with 4.x back to 3.20.0.

As the incompatibility between 4.0 and 5.0 will be an issue for some customers, we encourage you to use the very stable kartothek 3.20.0 and not version 4.x.

Please refer the Issue #471 for further information.

v4.0.3

3 years ago

Kartothek 4.0.3 (2021-06-10)

  • Pin dask to not use 2021.5.1 and 2020.6.0 (#475)

v5.0.0rc1

3 years ago

Version 5.0.0 (2021-05-xx)

This release rolls all the changes introduced with 4.x back to 3.20.0.

As the incompatibility between 4.0 and 5.0 will be an issue for some customers, we encourage you to use the very stable kartothek 3.20.0 and not version 4.x.

Please refer the Issue #471 for further information.

v4.0.1

3 years ago

Kartothek 4.0.1 (2021-04-13)

  • Fixed dataset corruption after updates when table names other than "table" are used (#445).

v4.0.0

3 years ago

Kartothek 4.0.0 (2021-03-17)

This is a major release of kartothek with breaking API changes.

  • Removal of complex user input (see gh427)
  • Removal of multi table feature
  • Removal of [kartothek.io.merge]{.title-ref} module
  • class ~kartothek.core.dataset.DatasetMetadata{.interpreted-text role="class"} now has an attribute called [schema]{.title-ref} which replaces the previous attribute [table_meta]{.title-ref} and returns only a single schema
  • All outputs which previously returned a sequence of dictionaries where each key-value pair would correspond to a table-data pair now returns only one pandas.DataFrame{.interpreted-text role="class"}
  • All read pipelines will now automatically infer the table to read such that it is no longer necessary to provide [table]{.title-ref} or [table_name]{.title-ref} as an input argument
  • All writing pipelines which previously supported a complex user input type now expose an argument [table_name]{.title-ref} which can be used to continue usage of legacy datasets (i.e. datasets with an intrinsic, non-trivial table name). This usage is discouraged and we recommend users to migrate to a default table name (i.e. leave it None / [table]{.title-ref})
  • All pipelines which previously accepted an argument [tables]{.title-ref} to select the subset of tables to load no longer accept this keyword. Instead the to-be-loaded table will be inferred
  • Trying to read a multi-tabled dataset will now cause an exception telling users that this is no longer supported with kartothek 4.0
  • The dict schema for ~kartothek.core.dataset.DatasetMetadataBase.to_dict{.interpreted-text role="meth"} and ~kartothek.core.dataset.DatasetMetadata.from_dict{.interpreted-text role="meth"} changed replacing a dictionary in [table_meta]{.title-ref} with the simple [schema]{.title-ref}
  • All pipeline arguments which previously accepted a dictionary of sequences to describe a table specific subset of columns now accept plain sequences (e.g. [columns]{.title-ref}, [categoricals]{.title-ref})
  • Remove the following list of deprecated arguments for io pipelines
    • label_filter
    • central_partition_metadata
    • load_dynamic_metadata
    • load_dataset_metadata
    • concat_partitions_on_primary_index
  • Remove [output_dataset_uuid]{.title-ref} and [df_serializer]{.title-ref} from kartothek.io.eager.commit_dataset{.interpreted-text role="func"} since these arguments didn't have any effect
  • Remove [metadata]{.title-ref}, [df_serializer]{.title-ref}, [overwrite]{.title-ref}, [metadata_merger]{.title-ref} from kartothek.io.eager.write_single_partition{.interpreted-text role="func"}
  • ~kartothek.io.eager.store_dataframes_as_dataset{.interpreted-text role="func"} now requires a list as an input
  • Default value for argument [date_as_object]{.title-ref} is now universally set to True. The behaviour for [False]{.title-ref} will be deprecated and removed in the next major release
  • No longer allow to pass [delete_scope]{.title-ref} as a delayed object to ~kartothek.io.dask.dataframe.update_dataset_from_ddf{.interpreted-text role="func"}
  • ~kartothek.io.dask.dataframe.update_dataset_from_ddf{.interpreted-text role="func"} and ~kartothek.io.dask.dataframe.store_dataset_from_ddf{.interpreted-text role="func"} now return a [dd.core.Scalar]{.title-ref} object. This enables all [dask.DataFrame]{.title-ref} graph optimizations by default.
  • Remove argument [table_name]{.title-ref} from ~kartothek.io.dask.dataframe.collect_dataset_metadata{.interpreted-text role="func"}

v3.20.0

3 years ago

Version 3.20.0 (2021-03-15)

This will be the final release in the 3.X series. Please ensure your existing codebase does not raise any DeprecationWarning from kartothek and migrate your import paths ahead of time to the new kartothek.api{.interpreted-text role="mod"} modules to ensure a smooth migration to 4.X.

  • Introduce kartothek.api{.interpreted-text role="mod"} as the public definition of the API. See also versioning{.interpreted-text role="doc"}.
  • Introduce [DatasetMetadataBase.schema]{.title-ref} to prepare deprecation of [table_meta]{.title-ref}
  • ~kartothek.io.eager.read_dataset_as_dataframes{.interpreted-text role="func"} and ~kartothek.io.iter.read_dataset_as_dataframes__iterator{.interpreted-text role="func"} now correctly return categoricals as requested for misaligned categories.

v3.19.1

3 years ago

Version 3.19.1 (2021-02-24)

  • Allow pyarrow==3 as a dependency.
  • Fix a bug in ~kartothek.io_components.utils.align_categories{.interpreted-text role="func"} for dataframes with missings and of non-categorical dtype.
  • Fix an issue with the cube index validation introduced in v3.19.0 (#413).