Kedro Versions Save

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

0.19.4

2 weeks ago

Major features and improvements

  • Kedro commands now work from any subdirectory within a Kedro project.
  • Kedro CLI now provides a better error message when project commands are run outside of a project i.e. kedro run
  • Added the --telemetry flag to kedro new, allowing the user to register consent to have user analytics collected at the same time as the project is created.
  • Improved the performance of Pipeline object creation and summing.
  • Improved suggestions to resume failed pipeline runs.
  • Dropped the dependency on toposort in favour of the built-in graphlib module.
  • Cookiecutter errors are shown in short format without the --verbose flag.

Bug fixes and other changes

  • Updated kedro pipeline create and kedro pipeline delete to read the base environment from the project settings.
  • Updated CLI command kedro catalog resolve to read credentials properly.
  • Changed the path of where pipeline tests generated with kedro pipeline create from <project root>/src/tests/pipelines/<pipeline name> to <project root>/tests/pipelines/<pipeline name>.
  • Updated .gitignore to prevent pushing Mlflow local runs folder to a remote forge when using mlflow and git.
  • Fixed error handling message for malformed yaml/json files in OmegaConfigLoader.
  • Fixed a bug in node-creation allowing self-dependencies when using transcoding, that is datasets named like name@format.
  • Improved error message when passing wrong value to node.

Breaking changes to the API

  • Methods _is_project and _find_kedro_project have been moved to kedro.utils. We recommend not using private methods in your code, but if you do, please update your code to use the new location.

Documentation changes

  • Added missing description for merge_strategy argument in OmegaConfigLoader.
  • Added documentation on best practices for testing nodes and pipelines.
  • Clarified docs around using custom resolvers without a full Kedro project.

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

0.19.3

2 months ago

Major features and improvements

  • Create the debugging line magic %load_node for Jupyter Notebook and Jupyter Lab.
  • Add better IPython, VSCode Notebook support for %load_node and minimal support for Databricks.
  • Add full Kedro Node input syntax for %load_node.

load-node-debug

Bug fixes and other changes

  • Updated CLI Command kedro catalog resolve to work with dataset factories that use PartitionedDataset.
  • Addressed arbitrary file write via archive extraction security vulnerability in micropackaging.
  • Added the _EPHEMERAL attribute to AbstractDataset and other Dataset classes that inherit from it.
  • Added new JSON Schema that works with Kedro versions 0.19.*

Breaking changes to the API

Documentation changes

  • Enable read-the-docs search when user presses Command/Ctrl + K.
  • Added documentation for kedro-telemetry and the data collected by it.

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

0.19.2

3 months ago

Bug fixes and other changes

  • Removed example pipeline requirements when examples are not selected in tools.
  • Allowed modern versions of JupyterLab and Jupyter Notebooks.
  • Removed setuptools dependency
  • Added source_dir explicitly in pyproject.toml for non-src layout project.
  • MemoryDataset entries are now included in free outputs.
  • Removed black dependency and replaced it functionality with ruff format.
  • Added logging about not using async mode in SequentiallRunner and ParallelRunner.

Breaking changes to the API

  • Changed input format for tools option obtained from --config file from numbers to short names.

Documentation changes

  • Added documentation about bootstrap_project and configure_project.
  • Added documentation about kedro run and hook execution order.

0.19.1

4 months ago

Release 0.19.1

What's Changed

0.19.0

4 months ago

:rocket: Major Features and improvements

  • Dropped Python 3.7 support.
  • Introduced project tools and example to the kedro new CLI flow.
  • The new spaceflights starters, spaceflights-pandas, spaceflights-pandas-viz, spaceflights-pyspark, and spaceflights-pyspark-viz can be used with the kedro new command with the --starter flag.
  • Added the --conf-source option to %reload_kedro, allowing users to specify a source for project configuration.
  • Added the functionality to choose a merging strategy for config files loaded with OmegaConfigLoader.
  • Modified the mechanism of importing datasets, raise more explicit error when dependencies are missing.
  • Added validation for configuration file used to override run commands via the CLI.
  • Moved the default environment base and local from config loader to _ProjectSettings. This enables the use of config loader as a standalone class without affecting existing Kedro Framework users.

:beetle: Bug fixes and other changes

  • Added a new field tools to pyproject.toml when a project is created.
  • Reduced spaceflights data to minimise waiting times during tutorial execution.
  • Added validation to node tags to be consistent with node names.
  • Removed pip-tools as a dependency.
  • Accepted path-like filepaths more broadly for datasets.

:boom: Breaking changes

  • Removed ConfigLoader and TemplatedConfigLoader.
  • Removed kedro.extras.datasets and tests (use kedro-datasets instead)
  • Removed PartitionedDataset and IncrementalDataset from kedro.io (import them from kedro-datasets instead)
  • logging is removed from OmegaConfigLoader in favour of the environment variable KEDRO_LOGGING_CONFIG.
  • Removed support for defining the layer attribute at top-level within DataCatalog.
  • Renamed data_set and DataSet to dataset and Dataset everywhere.
  • Removed the create_default_data_set() method in the Runner in favour of using dataset factories to create default dataset instances.
  • The default project template now has only one pyproject.toml at the root of the project (containing both the packaging metadata and the Kedro build config).

:writing_hand: Documentation changes

  • Added new top navigation to easily switch between Framework, Viz, and Datasets.
  • Added new search-as-you-type to improve the search experience.

New Contributors

Full Changelog: https://github.com/kedro-org/kedro/compare/0.18.14...0.19.0

:rotating_light: If you are upgrading from Kedro 0.18, have a look at the migration guide.

We welcome every community contribution, large or small. See what we're working on now and report bugs or suggest future features. Until next time, The Kedro Team :yellow_heart:

0.18.14

6 months ago

Release 0.18.14

Major features and improvements

  • Allowed using of custom cookiecutter templates for creating pipelines with --template flag for kedro pipeline create or via template/pipeline folder.
  • Allowed overriding of configuration keys with runtime parameters using the runtime_params resolver with OmegaConfigLoader.

Bug fixes and other changes

  • Updated dataset factories to resolve nested catalog config properly.
  • Updated OmegaConfigLoader to handle paths containing dots outside of conf_source.
  • Made settings.py optional.

Documentation changes

  • Added documentation to clarify execution order of hooks.
  • Added a notebook example for spaceflights to illustrate how to incrementally add Kedro features.
  • Moved documentation for the standalone-datacatalog starter into its README file.
  • Added new documentation about deploying a Kedro project with Amazon EMR.
  • Added new documentation about how to publish a Kedro-Viz project to make it shareable.
  • New TSC members added to the page and the organisation of each member is also now listed.
  • Plus some minor bug fixes and changes across the documentation.

Upcoming deprecations for Kedro 0.19.0

  • All dataset classes will be removed from the core Kedro repository (kedro.extras.datasets). Install and import them from the kedro-datasets package instead.
  • All dataset classes ending with DataSet are deprecated and will be removed in Kedro 0.19.0 and kedro-datasets 2.0.0. Instead, use the updated class names ending with Dataset.
  • The starters pandas-iris, pyspark-iris, pyspark, and standalone-datacatalog are deprecated and will be archived in Kedro 0.19.0.
  • PartitionedDataset and IncrementalDataset have been moved to kedro-datasets and will be removed in Kedro 0.19.0. Install and import them from the kedro-datasets package instead.

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

0.18.13

8 months ago

Release 0.18.13

Major features and improvements

  • Added support for Python 3.11. This includes tackling challenges like dependency pinning and test adjustments to ensure a smooth experience. Detailed migration tips are provided below for further context.
  • Added new OmegaConfigLoader features:
    • Allowed registering of custom resolvers to OmegaConfigLoader through CONFIG_LOADER_ARGS.
    • Added support for global variables to OmegaConfigLoader.
  • Added kedro catalog resolve CLI command that resolves dataset factories in the catalog with any explicit entries in the project pipeline.
  • Implemented a flat conf/ structure for modular pipelines, and accordingly, updated the kedro pipeline create and kedro catalog create command.
  • Updated new Kedro project template and Kedro starters:
    • Change Kedro starters and new Kedro projects to use OmegaConfigLoader.
    • Converted setup.py in new Kedro project template and Kedro starters to pyproject.toml and moved flake8 configuration to dedicated file .flake8.
    • Updated the spaceflights starter to use the new flat conf/ structure.

Bug fixes and other changes

  • Updated OmegaConfigLoader to ignore config from hidden directories like .ipynb_checkpoints.

Documentation changes

  • Revised the data section to restructure beginner and advanced pages about the Data Catalog and datasets.
  • Moved contributor documentation to the GitHub wiki.
  • Updated example of using generator functions in nodes.
  • Added migration guide from the ConfigLoader and the TemplatedConfigLoader to the OmegaConfigLoader. The ConfigLoader and the TemplatedConfigLoader are deprecated and will be removed in the 0.19.0 release.

Migration Tips for Python 3.11:

  • PyTables on Windows: Users on Windows with Python >=3.8 should note we've pinned pytables to 3.8.0 due to compatibility issues.
  • Spark Dependency: We've set an upper version limit for pyspark at <3.4 due to breaking changes in 3.4.
  • Testing with Python 3.10: The latest moto version now supports parallel test execution for Python 3.10, resolving previous issues.

Breaking changes to the API

Upcoming deprecations for Kedro 0.19.0

  • Renamed abstract dataset classes, in accordance with the Kedro lexicon. Dataset classes ending with "DataSet" are deprecated and will be removed in 0.19.0. Note that all of the below classes are also importable from kedro.io; only the module where they are defined is listed as the location.
Type Deprecated Alias Location
AbstractDataset AbstractDataSet kedro.io.core
AbstractVersionedDataset AbstractVersionedDataSet kedro.io.core
  • Using the layer attribute at the top level is deprecated; it will be removed in Kedro version 0.19.0. Please move layer inside the metadata -> kedro-viz attributes.

Community contributions

Thanks to Laíza Milena Scheid Parizotto and Jonathan Cohen.

0.18.12

9 months ago

Release 0.18.12

Major features and improvements

  • Added dataset factories feature which uses pattern matching to reduce the number of catalog entries.
  • Activated all built-in resolvers by default for OmegaConfigLoader except for oc.env.
  • Added kedro catalog rank CLI command that ranks dataset factories in the catalog by matching priority.

Bug fixes and other changes

  • Consolidated dependencies and optional dependencies in pyproject.toml.
  • Made validation of unique node outputs much faster.
  • Updated kedro catalog list to show datasets generated with factories.

Documentation changes

  • Recommended ruff as the linter and removed mentions of pylint, isort, flake8.

Community contributions

Thanks to Laíza Milena Scheid Parizotto and Chris Schopp.

Breaking changes to the API

Upcoming deprecations for Kedro 0.19.0

  • ConfigLoader and TemplatedConfigLoader will be deprecated. Please use OmegaConfigLoader instead.

0.18.11

10 months ago

Release 0.18.11

Major features and improvements

  • Added databricks-iris as an official starter.

Bug fixes and other changes

  • Reworked micropackaging workflow to use standard Python packaging practices.
  • Make kedro micropkg package accept --verbose.

Documentation changes

  • Significant improvements to the documentation that covers working with Databricks and Kedro, including a new page for workspace-only development, and a guide to choosing the best workflow for your use case.
  • Updated documentation for deploying with Prefect for version 2.0.

0.18.10

10 months ago

Major features and improvements

  • Rebrand across all documentation and Kedro assets.
  • Added support for variable interpolation in the catalog with the OmegaConfigLoader.