Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
kedro run --params
now updates interpolated parameters correctly when using OmegaConfigLoader
.metadata
attribute to kedro.io
datasets. This is ignored by Kedro, but may be consumed by users or external plugins.kedro.logging.RichHandler
. This replaces the default rich.logging.RichHandler
and is more flexible, user can turn off the rich
traceback if needed.OmegaConfigLoader
will return a dict
instead of DictConfig
.OmegaConfigLoader
does not show a MissingConfigError
when the config files exist but are empty.kedro package
does not produce .egg
files anymore, and now relies exclusively on .whl
files.Many thanks to the following Kedroids for contributing PRs to this release:
KEDRO_LOGGING_CONFIG
environment variable, which can be used to configure logging from the beginning of the kedro
process.kedro run
CLI command to session store to improve run reproducibility using Kedro-Viz
experiment tracking.flake8
configuration.kedro.extras.datasets
.kedro jupyter setup
to setup Jupyter Kernel for Kedro.kedro package
now includes the project configuration in a compressed tar.gz
file.OmegaConfigLoader
to load configuration from compressed files of zip
or tar
format. This feature requires fsspec>=2023.1.0
._ProjectPipeline
.s3a
or s3n
filepaths--params
flagKedro-Viz
experiment trackingA regression introduced in Kedro version 0.18.5
caused the Kedro-Viz
console to fail to show experiment tracking correctly. If you experienced this issue, you will need to:
0.18.6
<project-path>/data/session_store.db
.Thanks to Kedroids tomohiko kato, tsanikgr and maddataanalyst for very detailed reports about the bug.
NOTE: This version of Kedro introduced a bug such that the Kedro-Viz console to fail to show experiment tracking correctly. We recommend that you don't use it and prefer instead to use Kedro version
0.18.6
.
OmegaConfigLoader
which uses OmegaConf
for loading and merging configuration.--conf-source
option to kedro run
, allowing users to specify a source for project configuration for the run.omegaconf
syntax as option for --params
. Keys and values can now be separated by colons or equals signs.yield
instead of return.
yield
before proceeding with next chunk.OmegaConfigLoader
.--namespace
flag to kedro run
to enable filtering by node namespace.node
for all four dataset hooks.kedro run
flags --nodes
, --tags
, and --load-versions
to replace --node
, --tag
, and --load-version
.kedro run
options which take a list of nodes as inputs (--from-nodes
and --to-nodes
).micropkg
manifest section in pyproject.toml
isn't recognised as allowed configuration.load_ipython_extension
not to register the %reload_kedro
line magic when called in a directory that does not contain a Kedro project.anyconfig
's ac_context
parameter to kedro.config.commons
module functions for more flexible ConfigLoader
customizations.kedro.pipeline.Pipeline
object throughout test suite with kedro.modular_pipeline.pipeline
factory.after_dataset_saved
hook only to be called for one output dataset when multiple are saved in a single node and async saving is in use.WARNING
to DEBUG
.micropkg pull
to fix vulnerability caused by CVE-2007-4559.kedro run
Many thanks to the following Kedroids for contributing PRs to this release:
project_version
will be deprecated in pyproject.toml
please use kedro_init_version
instead.kedro run
flags --node
, --tag
, and --load-version
in favour of --nodes
, --tags
, and --load-versions
.kedro_datasets
with higher priority than kedro.extras.datasets
. kedro_datasets
is the namespace for the new kedro-datasets
python package.UserDict
and the configuration is accessed through conf_loader['catalog']
.settings.py
without creating a custom config loader.Type | Description | Location |
---|---|---|
svmlight.SVMLightDataSet |
Work with svmlight/libsvm files using scikit-learn library | kedro.extras.datasets.svmlight |
video.VideoDataSet |
Read and write video files from a filesystem | kedro.extras.datasets.video |
video.video_dataset.SequenceVideo |
Create a video object from an iterable sequence to use with VideoDataSet |
kedro.extras.datasets.video |
video.video_dataset.GeneratorVideo |
Create a video object from a generator to use with VideoDataSet |
kedro.extras.datasets.video |
dask.ParquetDataSet
to work with the dask.to_parquet
API.kedro micropkg pull
for packages on PyPI.format
in save_args
for SparkHiveDataSet
, previously it didn't allow you to save it as delta format.TensorFlowModelDataset
when used without versioning; previously, it wouldn't overwrite an existing model.tf.device
in TensorFlowModelDataset
.VersionNotFoundError
to handle insufficient permission issues for cloud storage.local_ns
rather than a global variable.ShelveStore
to its own module to ensure multiprocessing works with it.kedro.extras.datasets.pandas.SQLQueryDataSet
now takes optional argument execution_options
.attrs
upper bound to support newer versions of Airflow.setuptools
dependency to <=61.5.1.kedro test
and kedro lint
will be deprecated.We are grateful to the following for submitting PRs that contributed to this release: jstammers, FlorianGD, yash6318, carlaprv, dinotuku, williamcaicedo, avan-sh, Kastakin, amaralbf, BSGalvan, levimjoseph, daniel-falk, clotildeguinard, avsolatorio, and picklejuicedev for comments and input to documentation changes
Implemented autodiscovery of project pipelines. A pipeline created with kedro pipeline create <pipeline_name>
can now be accessed immediately without needing to explicitly register it in src/<package_name>/pipeline_registry.py
, either individually by name (e.g. kedro run --pipeline=<pipeline_name>
) or as part of the combined default pipeline (e.g. kedro run
). By default, the simplified register_pipelines()
function in pipeline_registry.py
looks like:
def register_pipelines() -> Dict[str, Pipeline]:
"""Register the project's pipelines.
Returns:
A mapping from pipeline names to ``Pipeline`` objects.
"""
pipelines = find_pipelines()
pipelines["__default__"] = sum(pipelines.values())
return pipelines
The Kedro IPython extension should now be loaded with %load_ext kedro.ipython
.
The line magic %reload_kedro
now accepts keywords arguments, e.g. %reload_kedro --env=prod
.
Improved resume pipeline suggestion for SequentialRunner
, it will backtrack the closest persisted inputs to resume.
False
value for rich logging show_locals
, to make sure credentials and other sensitive data isn't shown in logs.rich
.kedro run -n [some_node]
, if some_node
is missing a namespace the resulting error message will suggest the correct node name.rich
logging.delta-spark
upper bound to allow compatibility with Spark 3.1.x and 3.2.x.gdrive
to list of cloud protocols, enabling Google Drive paths for datasets.%load_ext kedro.extras.extensions.ipython
; use %load_ext kedro.ipython
instead.kedro jupyter convert
, kedro build-docs
, kedro build-reqs
and kedro activate-nbstripout
will be deprecated.abfss
to list of cloud protocols, enabling abfss paths.conf/base/logging.yml
is now optional. See our documentation for details.kedro.starters
entry point. This enables plugins to create custom starter aliases used by kedro starter list
and kedro new
.kedro new
prompts to just one question asking for the project name.pyyaml
upper bound to make Kedro compatible with the pyodide stack.myst_parser
instead of recommonmark
.INFO
to DEBUG
for low priority messages.info.log
/errors.log
files are no longer created in your project root, and running Kedro on read-only file systems such as Databricks Repos is now possible.root
logger is now set to the Python default level of WARNING
rather than INFO
. Kedro's logger is still set to emit INFO
level messages.SequentialRunner
now has consistent execution order across multiple runs with sorted nodes.kedro jupyter notebook/lab
no longer reuses a Jupyter kernel.cookiecutter>=2.1.1
to address a known command injection vulnerability.getpass.getuser
.AbstractDataSet
and AbstractVersionedDataSet
as well as typing to all datasets.kedro.config.default_logger
no longer exists; default logging configuration is now set automatically through kedro.framework.project.LOGGING
. Unless you explicitly import kedro.config.default_logger
you do not need to make any changes.kedro.extras.ColorHandler
will be removed in 0.19.0.after_context_created
that passes the KedroContext
instance as context
.after_command_run
.ParserError
exception error message.SparkDataSet
to specify a schema
load argument that allows for supplying a user-defined schema as opposed to relying on the schema inference of Spark.CONFIG_LOADER_CLASS
validation so that TemplatedConfigLoader
can be specified in settings.py. Any CONFIG_LOADER_CLASS
must be a subclass of AbstractConfigLoader
.run_params
dictionary used in pipeline hooks.Jinja2
syntax loading with TemplatedConfigLoader
using globals.yml
._active_session
, _activate_session
and _deactivate_session
. Plugins that need to access objects such as the config loader should now do so through context
in the new after_context_created
hook.config_loader
is available as a public read-only attribute of KedroContext
.hook_manager
argument optional for runner.run
.kedro docs
now opens an online version of the Kedro documentation instead of a locally built version.kedro docs
will be removed in 0.19.0.