Versatile Data Kit Versions Save

Build, run and manage your data pipelines with Python or SQL on any cloud

v0.14

11 months ago

Major features include:

VDK DAG plugin release

VDK DAG (previously vdk-meta-jobs) is the official name of the plugin allowing users to express dependencies between data jobs and is released as Beta with more stability and usability and documentation improvements.

Check out for more in the plugin page.

Versatile Data Kit UI Shareable Web links

Now users can share links with filters applied:

Data Jobs list (Manage and Explore screen) are shareable through URL, as every applied filter is persisted to URL and vice-versa
Data Job Executions screen filters and sort parameters are shareable through URL, as every applied filter or sort is persisted to URL and vice-versa

VDK UI configuration improvements and easy to get started by using quickstart-vdk

Users can now access VDK UI using quickstart-vdk. VDK UI is made to be much more configurable:

Toggleable authentication (default: enabled) using the 'skipAuth' flag.
Configuration of authentication parameters.
Ability to specify visual elements displayed, e.g., navigation button to the Explore page.

VDK Control CLI supports python version

People now can specify the python version they need their job to run when deployed in VDK Control Service runtime:

vdk deploy --python-version 3.7 ..

Or in job config.ini

[job]
python_version = 3.7

Users can also see which version of python is VDK Control Service supporting currently:

vdk info

would return something like

Getting control service information... VDK Control service version: PipelinesControlService/0.0.1-SNAPSHOT/5f078fe ... Supported python versions: 3.9 3.8

What's Changed

control-service: Clean up old data job configurations by @doks5 in https://github.com/vmware/versatile-data-kit/pull/2075
control-service: Fix backwards-compatibility issues by @doks5 in https://github.com/vmware/versatile-data-kit/pull/2022
control-service: Only CLI executions are "Manual" by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/1763
control-service: Rework supported python version logic by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1992
control-service: Swagger UI quickstart-vdk server config by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/2062
control-service: [Bug fix] Fix supported python versions helm configuration by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1964
control-service: a clear error message on how to handle the failed pipeline by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2127
control-service: add ability to check if docker image exists in ecr by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1977
control-service: allow more time to reach a complete state by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2143
control-service: append integration test name to job name by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/2093
control-service: better error logging and pull private image in private test by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2156
control-service: better error message by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2094
control-service: better error message from throwable by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2157
control-service: clarify build steps by @dakodakov in https://github.com/vmware/versatile-data-kit/pull/1959
control-service: code expected to run in transaction now runs in transaction by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2117
control-service: custom app config values can we passed to helm. by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2004
control-service: delete unused method by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2038
control-service: disable authorization on test/cicd deployment by @tozka in https://github.com/vmware/versatile-data-kit/pull/2129
control-service: disable failing test by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2086
control-service: fail tests fast by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2137
control-service: fix api declaration by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1974
control-service: fix oom tests by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2028
control-service: handle null started by value by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2151
control-service: if a test is in a bad state it fails straight away by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2098
control-service: include details in error message by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2122
control-service: increase CICD deployment resources by @tozka in https://github.com/vmware/versatile-data-kit/pull/2130
control-service: killed job was shown as successful by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/2116
control-service: latest version of gradle and spring /remove old comment by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1976
control-service: logs url can include team name by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2013
control-service: new python client. by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1983
control-service: print response body on error by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2113
control-service: remove a test that is testing behaviour that doesn't exist by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2031
control-service: remove unused parameter by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2027
control-service: remove unused parameters by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2016
control-service: see more details when there is an error by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2050
control-service: update ecr credentials integration test by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/2079
control-service: upgrade python client by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2076
control-service: use git for images by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2097
docs: add getting started section for quickstart-vdk and ui by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/2019
frontend: Bugfix in e2e plugins function and bump major versions for UI libs by @gorankokin in https://github.com/vmware/versatile-data-kit/pull/1994
frontend: Fix for e2e tests by @gorankokin in https://github.com/vmware/versatile-data-kit/pull/2030
frontend: Implement executions list enhacements by @gorankokin in https://github.com/vmware/versatile-data-kit/pull/2126
frontend: Improve visibility of 'User error' messages by @hzhristova in https://github.com/vmware/versatile-data-kit/pull/1960
frontend: Job sharable executions filter and sort by @gorankokin in https://github.com/vmware/versatile-data-kit/pull/2072
frontend: Toggleable auth by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1958
frontend: Upgrade lineage to beta version by @hzhristova in https://github.com/vmware/versatile-data-kit/pull/1991
frontend: shareable links with query params for Data Jobs grids by @hzhristova in https://github.com/vmware/versatile-data-kit/pull/2049
frontend: visibility of app components is configurable by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1978
quickstart-vdk: ignore explore page and widgets in frontend by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/2073
vdk-airflow: fix failing tests by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2078
vdk-control-cli: Add support for python_version by @doks5 in https://github.com/vmware/versatile-data-kit/pull/2002
vdk-control-cli: Add support for python_version in config by @doks5 in https://github.com/vmware/versatile-data-kit/pull/2023
vdk-control-cli: add vdk info command to list of cli commands by @dakodakov in https://github.com/vmware/versatile-data-kit/pull/2069
vdk-control-cli: import the latest version of the client into cli by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1969
vdk-control-cli: upgrade python client by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2077
vdk-control-cli: use explicit parameter names by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1975
vdk-dag: DAGs propagate their execution type to their component jobs by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/2080
vdk-dag: Drop deprecation warnings by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/2012
vdk-dag: Fix config bug by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/2029
vdk-dag: Rename vdk-meta-jobs to vdk-dag by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/1831
vdk-dag: fix plugin name of DAGs example README.md by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/1945
vdk-dag: improve DAGs docs and example by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/1984
vdk-dag: update VEP about the execution type propagation by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/2095
vdk-examples: Change Meta Jobs to DAGs in examples by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/2024
vdk-gdp-execution-id: example added by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1962
vdk-heartbeat: add test that pings the frontend url by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/2039
vdk-heartbeat: fix python3.7 issue by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/2104
vdk-impala: add udf hive2server user error classification by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/2056
vdk-jupyter: Drop faulty statement by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/2160
vdk-jupyter: Pick up REST API URL from the env by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/2142
vdk-jupyter: UI failing cell marks by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/2057
vdk-jupyter: UI failing cell marks enhancement by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/2136
vdk-jupyter: add separator to the VDK menu by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/2114
vdk-jupyter: fix broken server in one use case by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/2170
vdk-jupyter: pin lumino/widgets package to version 1 by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/2099
vdk-jupyter: pin npm packages to previous version by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/2088
vdk-meta-jobs: Undelete the package for potential future fixes by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/2106
vdk-server: logs url in quickstart vdk by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2018
vdk-server: trigger package release by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/2058
versatile-data-kit: Update CONTRIBUTING.md by @tozka in https://github.com/vmware/versatile-data-kit/pull/1988

New Contributors

@hzhristova made their first contribution in https://github.com/vmware/versatile-data-kit/pull/1960

Full Changelog: https://github.com/vmware/versatile-data-kit/compare/v0.13...v0.14

v0.13

1 year ago

Major features include:

New plugin: vdk-gdp-execution-id

An installed Generative Data Pack plugin automatically expands the data sent for ingestion.

This GDP plugin detects the execution ID of a Data Job running, and decorates your data product with it. So that, it is now possible to correlate a data record with a particular ingestion Data Job execution ID.

For more information see the plugin documentation

vdk-dag: pass arguments to jobs in a DAG

Now each job in a DAG can be passed arguments :

{
"job_name": "name-of-job",
"team_name": "team-of-job",
"fail_meta_job_on_error": false,
"arguments": <ARGUMENTS IN DICTIONARY FORMAT HERE>,
"depends_on": ["name-of-job1", "name-of-job2"]
}

vdk-notebook: VDK job input in vdk cells

Users will be able to develop jobs entirely in a Notebook file with all features of VDK available out of the box After installation of vdk-notebook users can now will have access to job_input interface to execute templates, ingest data and all else.

vdk-notebook: vdk and non-vdk cells

To enable separation of product and development code vdk-notebook integration provides a way for users to set which cells are deployable and part of their production code and which are not.

quickstart-vdk now includes the Operations UI

When installing quickstart-vdk VDK Server is available for local testing and now includes UI:

pip install quickstart-vdk
vdk server --install

For more information see here

Versatile Data Kit Frontend npm libraries release

The Versatile Data Kit Frontend provides 2 npm (angular) libraries which can be used to build integrate VDK UI with your own screens:

@versatiledatakit/data-pipelines Versatile Data Kit Data Pipelines library provides UI screens that helps to manage data jobs via Versatile Data Kit Control Service
@versatiledatakit/shared Versatile Data Kit Shared library enables reusability of shared features like: NgRx Redux, Error Handlers, Utils, Generic Components, etc.

What's Changed

control service: Add supported python version configuration by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1761
control-service: fix python api release by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1946
control service: Dynamically set job base image in builder by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1864
control-service: Add python_version to Control Service API by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1806
control-service: Add python_version to Execution API by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1878
control-service: Add python_version to GraphQL API by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1909
control-service: Add support for Python 3.11 by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1861
control-service: Dynamically set vdk image in JobImageDeployer by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1883
control-service: Expose supported python versions in helm by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1935
control-service: Remove support for very old k8s apiVersion by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1860
control-service: add the frontend to helm by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1885
control-service: enable usage of aws temporary credentials by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1787
control-service: expose supported python versions by @dakodakov in https://github.com/vmware/versatile-data-kit/pull/1841
control-service: fix failing image publisher by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1810
control-service: force job builder deploy by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1823
control-service: new helm release by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1910
control-service: revert job builder python version by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1840
control-service: update helm charts for service account credentials by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1800
control-service: update job builders for aws temporary credentials by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1799
documentation: VDK components explained by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1865
frontend: Align code formatting in frontend projects by @gorankokin in https://github.com/vmware/versatile-data-kit/pull/1863
frontend: Configurable OAuth by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1913
frontend: Update docs with build/test configuration by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1928
frontend: add build.sh by @tozka in https://github.com/vmware/versatile-data-kit/pull/1807
frontend: fix npm lint warnings by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1808
frontend: increase the amount of resources for build in cicd by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1931
frontend: prepare for official release shared and dp libs by @gorankokin in https://github.com/vmware/versatile-data-kit/pull/1795
frontend: publish docker image for ui by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1872
frontend: remove unused config in helm chart for frontend dns by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1932
frontend: Stabilization for e2e tests by @gorankokin in https://github.com/vmware/versatile-data-kit/pull/1876
frontend: Auth configurations organized by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1957
frontend: change history link in data job by @gorankokin in https://github.com/vmware/versatile-data-kit/pull/1884
specs: VEP-1739 Update status and reorganise document by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1857
specs: VEP-1739 updated API section by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1882
specs: update Multiple Python versions VEP summary by @tozka in https://github.com/vmware/versatile-data-kit/pull/1792
vdk-vep: update vep status by @dakodakov in https://github.com/vmware/versatile-data-kit/pull/1951
vdk-cicd: apply limit ranges for storage by @tozka in https://github.com/vmware/versatile-data-kit/pull/1815
vdk-cicd: set ephemeral storage request/limits by @tozka in https://github.com/vmware/versatile-data-kit/pull/1813
vdk-control-cli: fix circular import dependecy by @tozka in https://github.com/vmware/versatile-data-kit/pull/1820
vdk-control-cli: refactor output printing with printer class by @tozka in https://github.com/vmware/versatile-data-kit/pull/1819
vdk-control-cli: use assert_click_status by @tozka in https://github.com/vmware/versatile-data-kit/pull/1817
vdk-control-cli: use common output printer by @tozka in https://github.com/vmware/versatile-data-kit/pull/1852
vdk-control-cli: vdk list -mmm to return executions by @tozka in https://github.com/vmware/versatile-data-kit/pull/1818
vdk-control-service: publish python client library by @dakodakov in https://github.com/vmware/versatile-data-kit/pull/1934
vdk-dags: improve DAGs user-facing documentation by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/1892
vdk-gdp-execution-id: a Generative Data Pack expanding with execution ID by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1877
vdk-gdp-execution-id: import fix by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1961
vdk-github-workflows: ubuntu latest update by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1943
vdk-jupyter: UI test enhancements by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1783
vdk-jupyter: add UI vdk cell marks by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1891
vdk-jupyter: job run messages by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1908
vdk-jupyter: remove react-test-renderer package from package.json by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1881
vdk-lineage: support for latest version sqllineage library by @tozka in https://github.com/vmware/versatile-data-kit/pull/1816
vdk-meta-jobs: Meta Jobs DAG validation by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/1785
vdk-meta-jobs: add DAG with args example by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/1859
vdk-meta-jobs: add some configurable variable references in the VEP by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/1794
vdk-meta-jobs: exec job with arguments by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/1839
vdk-meta-jobs: fix DAG image in example by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/1920
vdk-meta-jobs: improve DAGs code documentation by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/1873
vdk-metajobs: Deprecate plugin by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/1930
vdk-notebook: add hook for saving error information into json file by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1842
vdk-plugins: add connection hook activity diagram by @tozka in https://github.com/vmware/versatile-data-kit/pull/1786
vdk-plugins: test only oldest and newest supported python version by @tozka in https://github.com/vmware/versatile-data-kit/pull/1811
vdk-server: quickstart vdk now includes the UI by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1912

New Contributors

@gorankokin made their first contribution in https://github.com/vmware/versatile-data-kit/pull/1795

Full Changelog: https://github.com/vmware/versatile-data-kit/compare/v0.12...v0.13

v0.12

1 year ago

Major features include:

Open-sourcing VDK Operations UI

VDK Operations UI would enable data practitioners to efficiently manage (operate and monitor) their data jobs. It has been used internally in VMware for some time and the team open source it last month.

Check out more details at the Operations UI VEP

Look forward to the official launch soon.

Documentation Improvements

Significantly simplified and improve the main README and the CONTRIBUTING.md thanks to @gary-tai and @zverulacis

VDK Meta Jobs Preparation for Alpha release

implemented a limit on starting jobs at once

META_JOBS_MAX_CONCURRENT_RUNNING_JOBS=<number>

Learn more about the VDK Meta Jobs features in VDK Meta Jobs VEP

Started initiative to support multiple python versions

We are working on introducing an optional python_version property to the Control Service API, which allows users to specify the Python version they want to use for their job deployment. This means users no longer have to rely on the service administrator to make changes to the configuration and can deploy their jobs with the version they need.

See more information in the Multiple Python Versions VEP

Started initiative to create Secrets Interface

So far the way VDK recommended to store secrets was to use Properties API. Though it works well, it doesn't really meet the criteria for storing properly restricted data and likely also confidential data

The team is working on providing similar to Properties interface Secrets (backed by HashiCorp Vault).

See more information in the Vault Integration For Secrets Storage VEP

What's Changed

control-service: Update helm template license headers by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1670
control-service: don't include needless service token by @tozka in https://github.com/vmware/versatile-data-kit/pull/1679
control-service: protect against v1 cron batch not being supported by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1777
docs: Documentation improvement: improve main README VDK images - gif by @zverulacis in https://github.com/vmware/versatile-data-kit/pull/1676
docs: Fix CICD build for control service by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1790
docs: Hotfix: Update the URL for gif by @zverulacis in https://github.com/vmware/versatile-data-kit/pull/1702
docs: Improve documentation: before and after image by @zverulacis in https://github.com/vmware/versatile-data-kit/pull/1696
docs: Issue 1574: Update Contributing document by @gary-tai in https://github.com/vmware/versatile-data-kit/pull/1741
docs: Update VEP-1507 with CI/CD description by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1707
docs: update contributor documentation by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1779
frontend: Create build and release jobs for vdk/shared package by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1672
frontend: Fix shebang in build_shared.sh by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1743
frontend: Rename vdk/shared to versatiledatakit/shared by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1694
frontend: bump data-pipelines dependency versions by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1706
frontend: compile fix by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1697
frontend: data-pipelines build and test running CI/CD by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1671
frontend: data-pipelines publish artifact CI/CD by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1675
frontend: data-pipelines sync to freezepoint by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1746
frontend: e2e tests (#6) by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1633
frontend: e2e tests wiring and clientid by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1754
frontend: gitlab ci variables for change locations by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1685
frontend: package naming fixes by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1695
specs: Add user journeys to VEP-1507 by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1750
specs: VDK Enhancement Proposal for Metajobs (complete) by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/1757
specs: VEP-1493: Vault Integration For Secrets Storage by @dakodakov in https://github.com/vmware/versatile-data-kit/pull/1756
specs: VEP-1739 Support for Multiple Python Versions by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1748
specs: edit vep-1507 for style and grammar by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1758
specs: vep-1416: Change status of vep by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1721
vdk-audit-plugin: expand forbidden events list by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1683
vdk-heartbeat: fix test failures by @tozka in https://github.com/vmware/versatile-data-kit/pull/1703
vdk-impala: Add optional parameter for staging table prefix by @sbuldeev in https://github.com/vmware/versatile-data-kit/pull/1666
vdk-ipython: add finalize job by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1744
vdk-ipython: change the way job_input is introduced in a notebook by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1678
vdk-jupyter: jupyter UI refactor code and handle react-dom version mismatch by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1727
vdk-jupyter: Jupyter UI change the strategy on storing the data from user input by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1624
vdk-jupyter: add deploy job component to the UI by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1704
vdk-jupyter: add py-to-ts-interfaces to build by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1765
vdk-jupyter: load data before vdk operations by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1762
vdk-jupyter: remove py-to-ts-interfaces because of build problems by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1691
vdk-jupyter: rename python module by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1784
vdk-meta-jobs: Introduce configuration module by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/1692
vdk-meta-jobs: first version of the VEP Summary and Motivation sections by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/1698
vdk-meta-jobs: implement a limit on starting jobs at once by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/1681
vdk-meta-jobs: make _start_job() limit hit log more descriptive by @yonitoo in https://github.com/vmware/versatile-data-kit/pull/1771
vdk-plugins: bump docker version and resuse common dind by @tozka in https://github.com/vmware/versatile-data-kit/pull/1701
vdk-plugins: generate test report by @tozka in https://github.com/vmware/versatile-data-kit/pull/1732
vdk-plugins: upgrade docker used by @tozka in https://github.com/vmware/versatile-data-kit/pull/1693
vdk-quickstart: remove use of global varaibles in CI by @tozka in https://github.com/vmware/versatile-data-kit/pull/1726
vdk-trino: stabilize vdk-trino tests by @tozka in https://github.com/vmware/versatile-data-kit/pull/1677
versatile-data-kit: add hosts entry to gitlab runners by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1731
versatile-data-kit: automate merging of dependabot PRs by @tozka in https://github.com/vmware/versatile-data-kit/pull/1725
versatile-data-kit: dependabot auto-merge fix by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1688
versatile-data-kit: mechanism for targetted PR notifications. by @tozka in https://github.com/vmware/versatile-data-kit/pull/1733
versatile-data-kit: pre-commit hook for (S)CSS/JS/TS/HTML formatting by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1684
versatile-data-kit: simplify release process by @tozka in https://github.com/vmware/versatile-data-kit/pull/1668

New Contributors

@yonitoo made their first contribution in https://github.com/vmware/versatile-data-kit/pull/1698

Full Changelog: https://github.com/vmware/versatile-data-kit/compare/v0.11...v0.12

v0.11

1 year ago

Major features include:

Introduce data quality checks (pre-alpha) (for scd1 template)

Allow quality checks to be made before the data is inserted into the target table. Currently, the checks done on the processing step are not covering if the semantics of the data is correct. Therefore, bad data could went into the target table which could be unwanted behavior.

Example:

    def sample_check_true(tmp_table_name):
        return False if "bad" in tmp_table_name else True 

    template_args["check"] = sample_check 
    job_input.execute_template(
        template_name="load/dimension/scd1",
        template_args=template_args,
    )

Jobs Query API (GraphQL) wildcard matching filter for team and job names

When querying information about jobs now users of the Jobs QUery API can use wildcard matches : wildcard matching for example *search* in graphQl filters for job name and team name as well as before exact matching of search strings

Provide User Agent when using VDK CLI

Users are looking to be able to determine where requests originated from when analyzing and browsing the telemetry data about VDK Control Service usage.

export VDK_CONTROL_SERVICE_USER_AGENT = foo

or in config.ini

[vdk]
vdk_control_service_user_agent=foo

If not set it would default to "vdk-control-cli/{version} ({os.name}; {sys.platform})" + {python version}

New plugin: vdk-notebook

A new VDK plugin that supports running data jobs which consists of .ipynb files. You can see VDK Notebook plugin page for more information.

vdk-ipython

This extension introduces a magic command for Jupyter. The command enables the user to load job_input for his current data job and use it freely while working with Jupyter. You can see VDK ipython plugin page for more information.

Installation

Check the installation page

What's Changed

control service: remove deprecated dependency by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1589
control-service: Remove dependency on old docker image which is not needed by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1548
control-service: Fixed data job status in case of OOM by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1586
control-service: base-job-image: automatic image cleanup by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1636
control-service: Cronjob API backwards compatibility by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1580
control-service: Fix release step in pipeline by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1550
control-service: Remove duplicated CICD job runs by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1596
control-service: cleanup tests to ease testing on control service (v2) by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1607
control-service: cleanup tests to ease testing on control service by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1604
control-service: configurable job initContainer resources by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1599
control-service: graphql revert part of the wildcard filter matching by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1615
control-service: handle init container OOM by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1658
control-service: integration tests refactoring by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1562
control-service: java 17 by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1439
control-service: only delete file in test path by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1545
control-service: remove deprecated classes in codebase. by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1611
control-service: remove old kerberous test dependency by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1539
control-service: upgrade gradle. by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1543
control-service: use latest docker image by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1538
frontend: data-pipelines (#1) bundle root by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1626
frontend: data-pipelines (#2) lib bundle root by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1629
frontend: data-pipelines (#3) lib bundle sources by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1630
frontend: data-pipelines (#4) ui bundle root by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1631
frontend: data-pipelines (#5) ui bundle sources by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1632
frontend: open source shared components package by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1618
job-base-image-secure: remove unused parameter from publication script by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1650
job-builder: address docker image vulnerabilities by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1523
job-builder: fix ci/cd steps by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1555
job-builder: introduced secure base-job-image by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1546
job-builder: remove toybox from the base job image by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1552
vdk-cicd: cleanup cicd rules by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1554
vdk-cicd: during a scheduled run publish_artifacts and release shouldn't run by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1551
vdk-control-service: fix null dereferences by @dakodakov in https://github.com/vmware/versatile-data-kit/pull/1512
vdk-control-service: fix possible NPE by @dakodakov in https://github.com/vmware/versatile-data-kit/pull/1522
vdk-control-service: potential resource leak fixes by @dakodakov in https://github.com/vmware/versatile-data-kit/pull/1513
vdk-core: track configuration sensitivity by @DeltaMichael in https://github.com/vmware/versatile-data-kit/pull/1579
vdk-frontend: docker image for running end to end tests in gitlab by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1563
vdk-frontend: include readmes for the data-pipelines folders by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1598
vdk-frontend: open source readmes by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1537
vdk-impala: Introduce checks for scd1 template by @sbuldeev in https://github.com/vmware/versatile-data-kit/pull/1472
vdk-jobs-troubleshooting: Run troubleshooting server as deamon thread by @dakodakov in https://github.com/vmware/versatile-data-kit/pull/1499
vdk-jupyter: add create job command to jupyter by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1581
vdk-jupyter: add download job command to jupyter by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1492
vdk-jupyter: create iPython extension by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1483
vdk-jupyter: fixes on tsconfig and bad file naming by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1594
vdk-jupyter: improve error handling on the UI by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1528
vdk-jupyter: make VEP more accessible and informative by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1635
vdk-jupyter: modify the way we read notebooks in notebook plugin by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1520
vdk-jupyter: modify the way we work with notebooks in notebook plugin by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1564
vdk-jupyter: ui end-to-end testing by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1617
vdk-jupyter: vdk-notebook README improvements by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1642
vdk-meta-jobs: Better error message for misspelled job name by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/1592
vdk-snowflake: upgrade to Python 3.11 by @tozka in https://github.com/vmware/versatile-data-kit/pull/1609
vdk-spec: cleanup template by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1518
vdk-spec: describe package publishing by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1536
vdk-spec: folder structure by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1525
vdk-spec: remove api section because the frontend will have no impact on the api by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1524
vdk-spec: summary, glossary, motivation by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1521
vdk-test-utils: add cli_assert_output_contains by @tozka in https://github.com/vmware/versatile-data-kit/pull/1540
versatile-data-kit: update changelog instructions by @tozka in https://github.com/vmware/versatile-data-kit/pull/1541
versatile-data-kit: Meta Job example by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/1640
versatile-data-kit: copyright notice year update by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1634
versatile-data-kit: git pre-commit hooks config by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1625
versatile-data-kit: remove unnecessary changelog files by @tozka in https://github.com/vmware/versatile-data-kit/pull/1549
versatile-data-kit: update readme with mail list by @tozka in https://github.com/vmware/versatile-data-kit/pull/1561

New Contributors

@sbuldeev made their first contribution in https://github.com/vmware/versatile-data-kit/pull/1472
@gary-tai made their first contribution in https://github.com/vmware/versatile-data-kit/pull/1535
@DeltaMichael made their first contribution in https://github.com/vmware/versatile-data-kit/pull/1579

Full Changelog: https://github.com/vmware/versatile-data-kit/compare/v0.10...v0.11

v0.10

1 year ago

Summary

Major features include:

vdk-jobs-troubleshooting - new plugin

Introduces thread-dump capabilities in the Data Jobs

See more details in the plugin home page and the VDK Enhancement Proposal

Support for Python 3.11

Introduces support for Python 3.11 in vdk-core and other plugins

Package versions

See installation instructions here. The versions of VDK components released under VDK 0.10 are:

Main components

control-service 1.5.707959356 vdk-core==0.3.723457889

Plugins

vdk-lineage-model==0.0.723435904 vdk-meta-jobs==0.1.723435904 vdk-sqlite==0.1.730902357 vdk-jobs-troubleshooting==0.2.741769066 vdk-lineage==0.3.723435904 vdk-control-cli==1.3.736732752

What's Changed

control-service: add docs on using different versions of k8s by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1473
control-service: fix secret in helm chart by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1379
control-service: graphql wildcard matching filter for team and job names by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1459
control-service: latest graphql version by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1384
control-service: migrate from springfox to springdocs by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1424
control-service: release helm chart with correct image tag by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1383
control-service: reset termination status when job is disabled by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1405
control-service: run ci on gradle version change by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1371
control-service: run release test on dependency version change by @tozka in https://github.com/vmware/versatile-data-kit/pull/1400
control-service: set registry name correctly. by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1331
control-service: use correct secret type by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1370
control-service: user-agent tag should have the correct format by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1412
examples: clarify README sample anonymize plugin by @tozka in https://github.com/vmware/versatile-data-kit/pull/1394
Update README.md by @dimirapetrova in https://github.com/vmware/versatile-data-kit/pull/1373
Update README.md for INSERT by @dimirapetrova in https://github.com/vmware/versatile-data-kit/pull/1364
vdk-control-cli and some plugins: Support for 3.11 by @tozka in https://github.com/vmware/versatile-data-kit/pull/1409
vdk-control-cli: address vulnerability in python dependency by @tozka in https://github.com/vmware/versatile-data-kit/pull/1470
vdk-control-cli: allow cli users to explicitly set the user agent tag by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1403
vdk-core: get_managed_connection should return opened connection by @tozka in https://github.com/vmware/versatile-data-kit/pull/1410
vdk-core: support for 3.11 by @tozka in https://github.com/vmware/versatile-data-kit/pull/1395
vdk-gitlab: upgrade the gitlab runner to latest version by @tozka in https://github.com/vmware/versatile-data-kit/pull/1398
vdk-gitlab-runners: increase concurrent pipelines by @tozka in https://github.com/vmware/versatile-data-kit/pull/1396
vdk-jobs-trobleshooting: Introduce plugin API and configuration by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1447
vdk-jobs-troubleshooting: add thread-dump utility by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1456
vdk-jobs-troubleshooting: improve robustness of the plugin by @dakodakov in https://github.com/vmware/versatile-data-kit/pull/1487
vdk-jobs-troubleshooting: release the plugin by @dakodakov in https://github.com/vmware/versatile-data-kit/pull/1481
vdk-jupyter: splitting functionalities of vdk-notebook Cell class by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1465
vdk-jupyter: add create job command to jupyter front-end extension by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1478
vdk-jupyter: add delete job command to jupyter by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1488
vdk-jupyter: changes on diagrams and definition in notebook-plugin section in VEP by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1427
vdk-jupyter: create notebook-plugin by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1411
vdk-jupyter: deleting the yarn.lock file because of security issue by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1382
vdk-jupyter: notebook-plugin by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1415
vdk-jupyter: python subprocess security problem by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1463
vdk-jupyter: run VDK job by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1454
vdk-jupyter: VEP - adding the definition of Notebook step by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1386
vdk-lineage, vdk-lineage-model, vdk-meta-jobs: support for Python 3.11 by @tozka in https://github.com/vmware/versatile-data-kit/pull/1448
vdk-plugins: introduce vdk-jobs-troubleshooting plugin by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1428
vdk-sqlite: support for Python 3.11 by @tozka in https://github.com/vmware/versatile-data-kit/pull/1466
vdk-trino: support for Python 3.11 by @tozka in https://github.com/vmware/versatile-data-kit/pull/1471
vep-1416: address feedback and update proposal by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1491
versatile-data-kit: VEP-1416 vdk-troubleshooting-tools by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1423

New Contributors

@dimirapetrova made their first contribution in https://github.com/vmware/versatile-data-kit/pull/1364

Full Changelog: https://github.com/vmware/versatile-data-kit/compare/v0.9...v0.10

v0.9

1 year ago

Summary

Major features include:

vdk-meta-jobs new plugin

Using this plugin you can specify dependencies between data jobs as a direct acyclic graph (DAG).

For example

def run(job_input):
    jobs = [
        {
        "job_name": "name-of-job",
        "team_name": "team-of-job",
        "fail_meta_job_on_error": True or False,
        "depends_on": [name-of-job1, name-of-job2]
        },
        ...
    ]
    MetaJobInput().run_meta_job(jobs)

See more details in the plugin home page

Control Service security hardening

Options for jobs to run in read-only file system
Provide credentials configuration for using private images during by the Control Service
Use a separate file system for storing temporary user-supplied files by Control Service
Enhanced job upload validation for zip exploits and unallowed files

Data Job Upload validation allow list

During the installation of Control Service administrators can limit what type of files can be uploaded as part of a data job. A new configuration option is added called uploadValidationFileTypesAllowList . uploadValidationFileTypesAllowList is comma separated list with file types.

For example Setting

uploadValidationFileTypesAllowList=image/png,text/plain

then only png images and plain text files can be uploaded. Otherwise, upload requests will fail.

See more details in helm chart documentation

vdk-logging-format - new plugin

This plugin allows for the configuration of the format of VDK logs.

Before there were separate plugins for each format, but they are not deprecated in favour of this one.

The plugin introduces a new configuration option LOGGING_FORMAT with possible values 'json', 'ltsv', 'text'

export LOGGING_FORMAT=JSON

Control Service helm chart support for Postgres

For embedded DB for control-service metadata storage, the Bitnami-available chart of PostgreSQL is added.

Now user can install it with

helm install vdk-control-service --postgresql.enabled=true cockroachdb.enabled=false

Package versions

See installation instructions here. The versions of VDK components released under VDK 0.7 are:

Main components

control-service 1.5.707959356 vdk-core==0.3.692414765

Plugins

vdk-logging-json==0.1.693641831 vdk-meta-jobs==0.1.684477187 vdk-postgres== 0.0.692283840 vdk-trino== 0.4.703555598

What's Changed

control-service: Container read-only file system by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/1291
control-service: Expose LOGGING_FORMAT through helm chart by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/1329
control-service: a directory can be manually set as a location to store databjobs when processing them to git. by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1290
control-service: add empty dir storage by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1293
control-service: add support for allowlist in helm chart. by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1283
control-service: add tests for some zip exploits by @tozka in https://github.com/vmware/versatile-data-kit/pull/1266
control-service: builder base image in helm by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1359
control-service: builder images load secrets from k8s by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1358
control-service: create the secret in the correct namespace. by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1318
control-service: deprecated jobsList endpoint cleanup by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1296
control-service: fix helm template by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1295
control-service: fix ingress template by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1277
control-service: helm chart for private builder by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1336
control-service: namespace can be null by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1349
control-service: postgresql embedded by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1273
control-service: refactor db query to mitigate race condition by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1269
control-service: release newer version of job builder by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1362
control-service: set registry name correctly. by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1323
control-service: test cleanup with the goal of making tests easier to run locally by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1343
control-service: upload validation by @tozka in https://github.com/vmware/versatile-data-kit/pull/1268
vdk-jupyter: Expand details on extensions design by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1304
quickstart-vdk: Include vdk-logging-format by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/1313
vdk-audit: set python requires >= 3.8 by @tozka in https://github.com/vmware/versatile-data-kit/pull/1289
vdk-control-api-auth: Fix error message formatting by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/1303
vdk-control-cli: fix cicd by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1327
vdk-control-cli: update doc for deployment of multiple jobs w/single command by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1325
vdk-core: Allow for modification of dynamic params by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1267
vdk-core: resolve library error classification on startup by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1241
vdk-events: add presentation slides of DSC event by @tozka in https://github.com/vmware/versatile-data-kit/pull/1335
vdk-jupyter: introduce JupterLab extension by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1338
vdk-logging-format: Fix path to readme in setup.py by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/1322
vdk-logging-format: Join JSON and LTSV logging plugins into one by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/1312
vdk-logging-json, vdk-logging-ltsv: Delete deprecated plugins by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/1319
vdk-meta-jobs: Initial implementation by @tozka in https://github.com/vmware/versatile-data-kit/pull/1249
vdk-postgres: add ingest plugin by @tozka in https://github.com/vmware/versatile-data-kit/pull/1314
vdk-trino: Fix typo in the documentation by @tozka in https://github.com/vmware/versatile-data-kit/pull/1340

New Contributors

@dependabot made their first contribution in https://github.com/vmware/versatile-data-kit/pull/1299

Full Changelog: https://github.com/vmware/versatile-data-kit/compare/v0.8...v0.9

v0.8

1 year ago

Summary

Major features include:

New plugin: VDK Audit

This plugin provides the ability to audit and potentially limit user operations. It requires Python 3.8 or newer. These operations can be deep within the Python runtime or standard libraries, such as dynamic code compilation, module imports, or OS command invocations.

If we want to forbid some os.* operations we can do it like this:

export AUDIT_HOOK_ENABLED=true
export AUDIT_HOOK_FORBIDDEN_EVENTS_LIST='os.removexattr;os.rename;os.rmdir;os.scandir'
export AUDIT_HOOK_EXIT_ON_FORBIDDEN_EVENT=true

vdk run <job-name>

See more details in the vdk-audit plugin page

Any version of python in VDK Control Service

Deployed jobs by Control Service can now use any version of Python and not just 3.7 automatically.

Insert only impala load template

This template can be used to load raw data from Data Lake to target Table in Data Warehouse. In summary, it appends all records from the source table to the target table. Similar to all other SQL modeling templates there is also schema validation, table refresh and statistics are computed when necessary.

Example:

def run(job_input):
    # . . .
    template_args = {
        'source_schema': 'source',
        'source_view': 'view_source',
        'target_schema': 'target',
        'target_table': 'destination_table'
    }
    job_input.execute_template('insert', template_args)

See more details in the template documentation page

Package versions

See installation instructions here. The versions of VDK components released under VDK 0.7 are:

Main components

control-service 1.5.671965442 vdk-core==0.3.662978536

Plugins

vdk-ingest-http==0.2.670842377 vdk-impala==0.4.672320306

What's Changed

control-service: CVE fix - upgrade commons-text by @tozka in https://github.com/vmware/versatile-data-kit/pull/1255
control-service: Dynamic python site-packages directory detection by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1247
control-service: fix cicd deployment by @tozka in https://github.com/vmware/versatile-data-kit/pull/1226
control-service: fix integration tests by @tozka in https://github.com/vmware/versatile-data-kit/pull/1211
control-service: fix race condition in test by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1227
control-service: refactor job cancellation method due to 404 errors by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1114
control-service: remove executables from secure job builder by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1202
control-plane: better error logging for transient error in tests by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1222
control-service: improve docs and local runability of integration tests by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1217
control-service: upgrade java client k8s version by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1216
vdk-core: errors occurred and the state (handled or not) context missing by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1182
vdk-core: errors occurred and the state (handled or not) context missing by @tozka in https://github.com/vmware/versatile-data-kit/pull/1212
vdk-core: platform error no longer logged when skipping execution steps by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1223
vdk-impala: Fix parsing while analysing profile for lineage information by @kostoww in https://github.com/vmware/versatile-data-kit/pull/1206
vdk-impala: Refactor query classifier for data lineage by @kostoww in https://github.com/vmware/versatile-data-kit/pull/1239
vdk-impala: improve explanation in readme by @tozka in https://github.com/vmware/versatile-data-kit/pull/1248
vdk-impala: stop using errors.get_exception_message by @tozka in https://github.com/vmware/versatile-data-kit/pull/1224
vdk-impala: update documentation with link by @tozka in https://github.com/vmware/versatile-data-kit/pull/1237
vdk-ingest-http: Adopt simplejson in place of json by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1229
vdk-ingest-http: Move data conversion above size calc by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1245
vdk-ingest-http: fix default value for backoff factor, add retry test by @dakodakov in https://github.com/vmware/versatile-data-kit/pull/1218
vdk-plugins: fix broken link by @tozka in https://github.com/vmware/versatile-data-kit/pull/1204
vdk-plugins: introduced vdk-audit plugin by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1221
vdk-plugins: run tests on release of vdk-core by @tozka in https://github.com/vmware/versatile-data-kit/pull/1210
vdk-plugins: set dind tempalte job for default build of plugins by @tozka in https://github.com/vmware/versatile-data-kit/pull/1225
versatile-data-kit: required approving reviewers update by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1220
versatile-data-kit: update contributing.md by @tozka in https://github.com/vmware/versatile-data-kit/pull/1214

New Contributors

@BoBaeva made their first contribution in https://github.com/vmware/versatile-data-kit/pull/1233

Full Changelog: https://github.com/vmware/versatile-data-kit/compare/v0.7...v0.8

v0.7

1 year ago

Summary

Major features include:

VDK Template running state detection capability

Since template executions are autonomous data job runs, we need to be able to determine if a template is running at any time. For example, to distinguish between root data job finalization, and template finalization

For example if we want to send telemetry somewhere:

    @hookimpl
    def finalize_job(self, context: JobContext) -> None:
        template = context.core_context.state.get(ExecutionStateStoreKeys.TEMPLATE_NAME)
        if template: 
           telemetry.send(phase="finalize_template", template_name = template) 
        else: 
           telemetry.send(phase="finalize_job", job_name=context.name)

New Logging configuration LOG_LEVEL_MODULE

Enable users to override logs per module, temporarily (e.g for debugging or prototyping reasons to increase the verbosity of certain module).

For example assuming default log level is INFO we can enable verbose logs for 2 modules "vdk.api" and "custom.module":

export LOG_LEVEL_MODULE="vdk.api=DEBUG;custom.module=DEBUG" 
vdk run job-name

Or in specific job config.ini:

[vdk]
log_level_module=vdk.api=DEBUG;custom.module=DEBUG

New plugin backend for Properties: from local file system

A simplistic plugin, that allows a developer or presenter to quickly store properties on the local FS.

It can be used to store secrets/configuration for a dev/demo session, that does not require a prerequisite of the entire Control Service installed and running. It can be used to test a job run locally only without updating the state of the deployed job.

Example:

export PROPERTIES_DEFAULT_TYPE="fs-properties-client"

or in specific job config.ini

[vdk]
properties_default_type=fs-properties-client

Now properties are stored in a local file. The file location can be further configured using FS_PROPERTIES_FILENAME and FS_PROPERTIES_DIRECTORY

Coockiecutter for new plugins

Create new plugin skeleton very easy

cookiecutter https://github.com/tozka/cookiecutter-vdk-plugin.git

and follow the instructions

Add the ability to cancel remaining job steps

Now a job (or a template) can be canceled from any step and all remaining steps in the job (or template) will be skipped. For example, it can be used if a data job depends on processing data from a source that has indicated no new entries since the last run, then we can skip the remaining steps.

Example:

def run(job_input: IJobInput): 
    data = get_last_delta()
    if not data:
        job_input.skip_remaining_steps()

Package versions

See installation instructions here. The versions of VDK components released under VDK 0.7 are:

Main components

control-service 1.5.622899758

vdk-control-cli==1.3.626767210 vdk-core==0.3.652866366

Plugins

vdk-properties-fs==0.0.651770458 vdk-kerberos-auth==0.3.631374202 vdk-impala==0.4.651849986

What's Changed

vdk-control-cli: Drop requirement pluggy to be 0.* by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/1116
vdk-core: Add log before query result fetch by @doks5 in https://github.com/vmware/versatile-data-kit/pull/1195
vdk-core: Fix issue with serializing Decimal values during payload check by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/946
vdk-core: add ability to cancel remaining job steps by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1188
vdk-core: add new configuration log_level_module by @tozka in https://github.com/vmware/versatile-data-kit/pull/1167
vdk-core: added default values to write termination message method by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1185
vdk-core: avoid circular references in print results by @tozka in https://github.com/vmware/versatile-data-kit/pull/1176
vdk-core: extend classification error test by @tozka in https://github.com/vmware/versatile-data-kit/pull/1180
vdk-core: fix error classification of vdk code by @tozka in https://github.com/vmware/versatile-data-kit/pull/1173
vdk-core: fix flakey test in test checking logs output by @murphp15 in https://github.com/vmware/versatile-data-kit/pull/1194
vdk-core: template running state detection capability by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/941
vdk-csv: Updates on vdk-csv README by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/952
vdk-impala: Add validation for queries that doesn't provide lineage info by @kostoww in https://github.com/vmware/versatile-data-kit/pull/1175
vdk-impala: fix error classification in impala by @tozka in https://github.com/vmware/versatile-data-kit/pull/1178
vdk-impala: fix impala template empty source view usr err by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/1189
vdk-impala: fixed platform error missclasified when running template by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/944
vdk-impala: improve vdk-impala documentation by @tozka in https://github.com/vmware/versatile-data-kit/pull/948
vdk-kerberos-auth: Pinned minikerberos in vdk-kerberos-auth plugin by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/1168
vdk-kerberos-auth: add KerberosClient for authenticating API calls by @tozka in https://github.com/vmware/versatile-data-kit/pull/879
vdk-plugins: improve plugin project creation with cookiecutter by @tozka in https://github.com/vmware/versatile-data-kit/pull/942
vdk-properties-fs: new plugin for local FS properties storage by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/1190
vep: Jupyter Notebook Integration Goals and Requirements by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1165
vep: Jupyter Notebook Integration by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/1113
versatile-data-kit: Without and with VDK image by @zverulacis in https://github.com/vmware/versatile-data-kit/pull/1184
versatile-data-kit: set automatic java formatter by @tozka in https://github.com/vmware/versatile-data-kit/pull/757
versatile-data-kit: simplify release process by @tozka in https://github.com/vmware/versatile-data-kit/pull/951
versatile-data-kit: update contact instructions by @tozka in https://github.com/vmware/versatile-data-kit/pull/1172

New Contributors

@murphp15 made their first contribution in https://github.com/vmware/versatile-data-kit/pull/1194

Full Changelog: https://github.com/vmware/versatile-data-kit/compare/v0.6...v0.7

v0.6

1 year ago

Summary

Major features include:

Configuration auto-wiring improvement: detect non vdk_ prefixed environment variables

Before configuration option must have been prefixed with "vdk_" when set as an environment variable in order to be recognized. This was very error prone since the options are documented without the prefix.

Now they can be set without a prefix as well.

The following are equivalent:

export VDK_DB_DEFAULT_TYPE='impala'
export DB_DEFAULT_TYPE='impala'

If both are set, the "prefixed" variable has a higher priority.

New plugin/library: vdk-lineage-model

VDK Lineage Model plugin aims to abstract emitting lineage data from VDK data jobs, so that different lineage loggers can be configured at run time in any plugin that supports emitting lineage data

Check out more at the plugin page.

New export-csv command

Alongside vdk ingest-csv which enabled users to import (or ingest) CSV data into a table. Users can now export CSV with a simple command from SQL query:

vdk export-csv -q "select * from my_table --file 'output.csv'

Checkout out more at the plugin page

In memory properties client

Until now properties required Control Service to be able to work. Sometimes for prototyping and testing purposes, you do not need to connect to external services.

New configuration value can be set.

In a specific job's config file (config.ini

[vdk]
properties_default_type = memory

Or as an environment variable

export properties_default_type="memory"

Now the properties would be entirely in memory. That means they will be "deleted" after the job's run.

New example: Ingest and anonymize

Example how to anonymize any data being ingested using VDK with a plugin.

Check out more at the example page

New example: Airflow integration

Example how to create dependencies between data job in Airflow.

Check out more at the example page

Package versions

See installation instructions here. The versions of VDK components released under VDK 0.6 are:

Main components

control-service 1.5.620438292 vdk-core==0.3.620677184

Plugins

airflow-provider-vdk==0.0.602273476 vdk-lineage-model== 0.0.581430542 vdk-kerberos-auth==0.3.584577337 vdk-ingest-http==0.2.616713987 vdk-impala==0.4.613570906 vdk-lineage== 0.3.604201902 vdk-trino== 0.4.605101952

What's Changed

airflow-provider-vdk: Add hidden fields to VDK Connection by @doks5 in https://github.com/vmware/versatile-data-kit/pull/883
control-service: Atomic job cancellation by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/860
control-service: Fluentd integration for data jobs by @mivanov1988 in https://github.com/vmware/versatile-data-kit/pull/940
control-service: Secure job builder image by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/936
control-service: add default jwt jwk uri by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/873
control-service: fix the examples in swagger by @tozka in https://github.com/vmware/versatile-data-kit/pull/945
control-service: fix vdk-server startup issues by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/908
control-service: increase integration test builder memory by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/929
control-service: upgrade docker container used in cicd by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/911
vdk-airflow: populate readme by @tozka in https://github.com/vmware/versatile-data-kit/pull/924
vdk-control-cli: remove hidden flag for CLI commands by @tozka in https://github.com/vmware/versatile-data-kit/pull/902
vdk-control-cli: use latest dependencies version during build by @tozka in https://github.com/vmware/versatile-data-kit/pull/903
vdk-core,vdk-impala,vdk-lineage,vdk-trino: Support for pluggy 1.0 by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/931
vdk-core: Add printed output to set-default and reset-default by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/884
vdk-core: BaseVdkError exception propagation flaw fix by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/917
vdk-core: Improve ingestion error logging by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/930
vdk-core: add memory properties client by @tozka in https://github.com/vmware/versatile-data-kit/pull/921
vdk-core: add option to disable version check by @tozka in https://github.com/vmware/versatile-data-kit/pull/876
vdk-core: detect non vdk_ prefixed environment values for config by @tozka in https://github.com/vmware/versatile-data-kit/pull/874
vdk-core: execution result missing exception and blamee fix by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/938
vdk-core: hide native cursor from execute hook by @tozka in https://github.com/vmware/versatile-data-kit/pull/875
vdk-core: make db_default_type case insensitive by @tozka in https://github.com/vmware/versatile-data-kit/pull/935
vdk-core: show log_level_vdk in help by @tozka in https://github.com/vmware/versatile-data-kit/pull/905
vdk-core: step loading failure misclassified as Platform error fix by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/920
vdk-core: termination message now idempotent by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/909
vdk-core: vdk_exception hook exit code fix by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/912
vdk-core: vdk_exception hook exit code fix by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/915
vdk-csv: add export-csv command by @duyguHsnHsn in https://github.com/vmware/versatile-data-kit/pull/934
vdk-examples: add ingest and anonymize example by @tozka in https://github.com/vmware/versatile-data-kit/pull/922
vdk-impala, vdk-trino: Remove deprecated use of result field by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/933
vdk-impala: Add performance logs by @VladimirPetkov1 in https://github.com/vmware/versatile-data-kit/pull/939
vdk-impala: Add support for lineage in vdk-impala by @VladimirPetkov1 in https://github.com/vmware/versatile-data-kit/pull/932
vdk-ingest-http: reduce verbosity of ingestion logs by @tozka in https://github.com/vmware/versatile-data-kit/pull/943
vdk-kerberos-auth: Separate async event loop by @doks5 in https://github.com/vmware/versatile-data-kit/pull/885
vdk-lineage-model: Extract Lineage Model in separate plugin by @VladimirPetkov1 in https://github.com/vmware/versatile-data-kit/pull/896
vdk-server: Pin kubernetes API version by @doks5 in https://github.com/vmware/versatile-data-kit/pull/919
vdk-server: fix for vdk server crashing on startup by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/907
vdk-trino, vdk-linage: Switch to vdk-lineage-model by @VladimirPetkov1 in https://github.com/vmware/versatile-data-kit/pull/898
vdk-trino: fix broken tests by @tozka in https://github.com/vmware/versatile-data-kit/pull/900
versatile-data-kit: Add Data lifecycle image and minor changes by @zverulacis in https://github.com/vmware/versatile-data-kit/pull/887
versatile-data-kit: Add getting started, ask for help, PR checklist by @zverulacis in https://github.com/vmware/versatile-data-kit/pull/881
versatile-data-kit: Add intro part to contributing.md from the template by @zverulacis in https://github.com/vmware/versatile-data-kit/pull/880
versatile-data-kit: Airflow Documentation by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/857
versatile-data-kit: add link to csv example doc by @tozka in https://github.com/vmware/versatile-data-kit/pull/893
versatile-data-kit: add logo image by @tozka in https://github.com/vmware/versatile-data-kit/pull/877
versatile-data-kit: make easier slack instructions by @tozka in https://github.com/vmware/versatile-data-kit/pull/925
versatile-data-kit: update link in examples by @tozka in https://github.com/vmware/versatile-data-kit/pull/892
versatile-data-kit: update logo for dark mode by @tozka in https://github.com/vmware/versatile-data-kit/pull/878

New Contributors

@VladimirPetkov1 made their first contribution in https://github.com/vmware/versatile-data-kit/pull/896
@duyguHsnHsn made their first contribution in https://github.com/vmware/versatile-data-kit/pull/934

Full Changelog: https://github.com/vmware/versatile-data-kit/compare/v0.5...v0.6

v0.5

1 year ago

Summary

Major features include:

New managed db_connection_execute_operation hook

The hooks enable users to add behavior to existing SQL queries without modifying the code itself. It is invoked for reach query before and after enabling to track its full execution. For example

@hookimpl(hookwrapper=True)
db_connection_execute_operation(execution_cursor: ExecutionCursor) -> Optional[int]: 
                start = time.time()
                outcome = yield # we yield the execution so that query is executed 
                end = time.time()
                log.info(f" duration: {end - start}. ")

Airflow Provider VDK release (beta)

Users can integrate with Apache Airflow to orchestrate in a DAG (workflow) Data Jobs. Check out more at airflow-provider-vdk

What's Changed

airflow-provider-vdk: Adopt auth plugin by @doks5 in https://github.com/vmware/versatile-data-kit/pull/856
airflow-provider-vdk: Example DAG by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/847
airflow-provider-vdk: Fix VDKSensor templating issue, improve example DAG by @gageorgiev in https://github.com/vmware/versatile-data-kit/pull/852
control-service: clear execution fail alert when failing with user error by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/850
control-service: fix graphql team filter not retrieving special chars by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/863
control-service: improve api message on oom job execution errors by @mrMoZ1 in https://github.com/vmware/versatile-data-kit/pull/861
documentation improvements by @zverulacis in https://github.com/vmware/versatile-data-kit/pull/853
vdk-control-cli: Adopt new auth exceptions by @doks5 in https://github.com/vmware/versatile-data-kit/pull/846
vdk-core: Add unit test for destination_table in empty queue by @doks5 in https://github.com/vmware/versatile-data-kit/pull/865
vdk-core: Fix destination_table referenced early by @doks5 in https://github.com/vmware/versatile-data-kit/pull/864
vdk-core: Split execution summary into chunks by @doks5 in https://github.com/vmware/versatile-data-kit/pull/867
vdk-core: add new managed db_connection_execute_operation hook by @tozka in https://github.com/vmware/versatile-data-kit/pull/805
vdk-core: fix buggy (false positive) connection unit test by @tozka in https://github.com/vmware/versatile-data-kit/pull/841
vdk-control-api-auth: New VDK Auth exceptions by @doks5 in https://github.com/vmware/versatile-data-kit/pull/845
vdk-heartbeat: pipelines-control-service-integration-tests image rebuild by @ivakoleva in https://github.com/vmware/versatile-data-kit/pull/848
vdk-plugins: Add Managed Database Connection cycle plugin by @tozka in https://github.com/vmware/versatile-data-kit/pull/859
vdk-test-utils: enable back tests by @tozka in https://github.com/vmware/versatile-data-kit/pull/855

New Contributors

@zverulacis made their first contribution in https://github.com/vmware/versatile-data-kit/pull/842

Full Changelog: https://github.com/vmware/versatile-data-kit/compare/v0.4...v0.5