Release Highlights
Potential Downtime
This release introduces substantial improvements to search functionality which require reindexing indices.
During the reindexing:
- a system-update job will set indices to read-only and create a backup/clone of each index
- new components will be prevented from start-up until the reindex completes
- Helm deployments will go into read-only mode and new ingestion runs will fail
This process can take anywhere from 5 minutes to multiple hours; as rough estimate, please expect it to take 1 hour for every 2.3 million entities. After the reindex is complete, please check your ingestion run to re-run any that did not complete.
If you are deploying containers yourself
If you're deploying the Docker containers yourself (without Helm or Docker-Compose Quickstart), then you'll need to ensure that you first run the acryldata/datahub-upgrade
docker image (v0.10.0 tag) with the following environment variables enabled.
Then, run the container this with the command
docker run acryldata/datahub-upgrade:v0.10.0 -u SystemUpdate
For the full set of environment variables required, check out the default docker.env provided for Docker Compose deployments.
This will run the required reindex against your elasticsearch instance, after which other DataHub components should start correctly. If you do not run the datahub-upgrade
container successfully, other components in the stack will fail to start correctly.
User Experience
We have some really exciting improvements to the DataHub user experience in this release!
Improved documentation editor, contributed by @ngamanda and the Grab Team.
This work provides a much more intuitive documentation editing experience within the UI, providing “what you see is what you get” formatting & removing the need for markdown expertise.
Additionally, you can easily:
- Add links to other entities/users within DataHub
- embed and resize tables & images
- toggle between font sizes and formats
- embed syntax-highlighted code blocks
Filter lineage graphs based on time windows
You can now easily see the full lineage graph of an entity at a specific point in time. This makes it much easier to understand how interdependencies have evolved over time and to troubleshoot data issues in the past.
Improvements in Search
As noted above, we have rolled out substantial improvements to Search functionality, making it easier than ever for end-user to find the entities that matter most. This release includes:
- Stemm & Synonyms
- Search by full or partial URN
- Autocomplete improvements
- Quoted search analyzer for exact & prefix match
Here are some of the most notable ingestion-related improvements:
- Redshift: You can now extract lineage information from unload queries – thanks for the contrib, @mmmeeedddsss
- PowerBI: Ingestion now maps Workspaces to DataHub Containers – thanks for the contrib, @looppi
- BigQuery: You can now extract lineage metadata from the Catalog API – thanks for the crontrib, @PatrickfBraz
- Glue: Ingestion now uses table name as the human-readable name – thanks for the contrib, @danielcmessias
Developer Experience
- This release introduces DataHub Lite - a new experimental lightweight implementation of DataHub. It is intended to enable local developer tooling use-cases such as simple access to metadata for scripts and other tools. DataHub Lite is compatible with the DataHub metadata format and all the ingestion connectors that DataHub supports. Checkout the docs here.
Breaking Changes
#7103 This should only impact users who have configured explicit non-default names for DataHub's Kafka topics. The environment variables used to configure Kafka topics for DataHub used in the kafka-setup docker image have been updated to be in-line with other DataHub components, for more info see our docs on Configuring Kafka in DataHub . They have been suffixed with _TOPIC where as now the correct suffix is _TOPIC_NAME. This change should not affect any user who is using default Kafka names.
What's Changed
- fix(ci): only scan on master branch by @anshbansal in https://github.com/datahub-project/datahub/pull/7047
- fix(ci): use trivy offline scanning by @anshbansal in https://github.com/datahub-project/datahub/pull/7050
- docs(get-started) Simplify copy on Get Started landing page by @maggiehays in https://github.com/datahub-project/datahub/pull/7043
- fix(ingest/kafka): fix ResourceType import error for confluent_kafka<1.9.0 by @mayurinehate in https://github.com/datahub-project/datahub/pull/7046
- docs(dbt): fix indentation in dbt meta mapping docs by @jx2lee in https://github.com/datahub-project/datahub/pull/7045
- fix(ingest): temporarily disable vertica tests by @hsheth2 in https://github.com/datahub-project/datahub/pull/7059
- feat(editor): improve documentation editor using Remirror by @ngamanda in https://github.com/datahub-project/datahub/pull/6631
- fix(bootstrap): add EDIT_LINEAGE privilege to some default policies by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/7060
- feat(ingest): add entity registry in codegen by @hsheth2 in https://github.com/datahub-project/datahub/pull/6984
- feat(ingest): extract powerbi endorsements to tags by @looppi in https://github.com/datahub-project/datahub/pull/6638
- feat(ingestion): pull metabase database, schema names from raw query and api by @remisalmon in https://github.com/datahub-project/datahub/pull/7039
- fix(ingest): support multiple entity_registry sections by @hsheth2 in https://github.com/datahub-project/datahub/pull/7066
- ci(ingest): add flag to skip tests but run codegen during release by @hsheth2 in https://github.com/datahub-project/datahub/pull/7067
- fix(ingest): preserve dbt column name casing by @hsheth2 in https://github.com/datahub-project/datahub/pull/7063
- fix(ingest/tableau): fix node limit exceeded error for workbooks query by @mayurinehate in https://github.com/datahub-project/datahub/pull/7068
- fix(build/airflow): Fixing gradlew path by @treff7es in https://github.com/datahub-project/datahub/pull/7069
- feat(ingest): support snapshots in dbt and dbt-cloud by @hsheth2 in https://github.com/datahub-project/datahub/pull/7062
- fix(ui) Fix duplicate schema field rendering with siblings by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/7057
- refactor(ingest/athena): Replace
s3_staging_dir
parameter in Athena source with query_result_location
by @bossenti in https://github.com/datahub-project/datahub/pull/7044
- feat(ingest): fix handling of unions with aliases in post restli conversion by @hsheth2 in https://github.com/datahub-project/datahub/pull/7058
- fix(ui) Make checkboxes in ingestion forms easier to see by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/7061
- fix(ingest): support git clone of non-github repos by @hsheth2 in https://github.com/datahub-project/datahub/pull/7065
- feat(ingest): reporting revamp, part 1 by @hsheth2 in https://github.com/datahub-project/datahub/pull/7031
- fix(secret-service): fix default encrypt key by @david-leifker in https://github.com/datahub-project/datahub/pull/7074
- feat(datahub-lite): introduces a new experimental lightweight impleme… by @shirshanka in https://github.com/datahub-project/datahub/pull/7052
- feat(datahub-lite): adding tab completion, small serialization fixes by @shirshanka in https://github.com/datahub-project/datahub/pull/7079
- docs: add docs for managed DataHub v0.1.72 by @anshbansal in https://github.com/datahub-project/datahub/pull/7070
- docs(readme): add inovex as adopter by @DSchmidtDev in https://github.com/datahub-project/datahub/pull/7077
- docs: add warning about clearing cookies for login by @anshbansal in https://github.com/datahub-project/datahub/pull/7084
- feat(cache): add hazelcast distributed cache option by @RyanHolstien in https://github.com/datahub-project/datahub/pull/6645
- docs(datahub-lite): small improvement for zsh tab completion by @shirshanka in https://github.com/datahub-project/datahub/pull/7085
- fix(ingest/bigquery): clear stateful ingestion correctly by @hsheth2 in https://github.com/datahub-project/datahub/pull/7075
- fix(graphql): Return with appropriate status code instead of stacktrace by @szalai1 in https://github.com/datahub-project/datahub/pull/7086
- fix(sso): Clear cookies on SSO redirect error by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/7088
- fix(docs): add missing mutation literal by @ruedigerblock in https://github.com/datahub-project/datahub/pull/7082
- fix(ui): display the correct access token expiry in AccessTokenModal by @ngamanda in https://github.com/datahub-project/datahub/pull/7078
- fix(cli/lite): fix datahub lite serve command by @hsheth2 in https://github.com/datahub-project/datahub/pull/7089
- fix(profiling): Fix syntax for APPROX_COUNT_DISTINCT on bigquery and snowflake by @feljen in https://github.com/datahub-project/datahub/pull/7087
- fix(ingest): fix logic error of google protobuf wrapper type. by @wngus606 in https://github.com/datahub-project/datahub/pull/7076
- feat(ui): Documentation Editor Improvements by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/7072
- fix(uri): marks uri field as deprecated, removes problem code, and adds coercer for usages of URI typeref by @RyanHolstien in https://github.com/datahub-project/datahub/pull/7093
- fix(build): postgres docker secret by @david-leifker in https://github.com/datahub-project/datahub/pull/7092
- fix(ingest/snowflake): handle corrupted snowflake OCSP cache file by @hsheth2 in https://github.com/datahub-project/datahub/pull/7095
- refactor(ingest): Refactoring container creation to common place by @treff7es in https://github.com/datahub-project/datahub/pull/6877
- feat(ingest): move datahub-lite to optional dep and add shim when missing by @hsheth2 in https://github.com/datahub-project/datahub/pull/7097
- fix(docker): support non amd64 dockerize in setup containers by @tonycsoka in https://github.com/datahub-project/datahub/pull/7091
- test(ingest): fix kafka admin client mocking by @hsheth2 in https://github.com/datahub-project/datahub/pull/7098
- fix(build): Fix postgres setup gha by @david-leifker in https://github.com/datahub-project/datahub/pull/7104
- fix(ingest/profile): properly quoting approx_count_distinct by @treff7es in https://github.com/datahub-project/datahub/pull/7101
- style(models): Replaces non-ASCII charactes in pdl files with ASCII c… by @nmbryant in https://github.com/datahub-project/datahub/pull/7105
- feat(ingest): hide cartesian product warnings in GE profiler by @hsheth2 in https://github.com/datahub-project/datahub/pull/7096
- feat(ingest): add removing partition pattern in spark lineage by @ssilb4 in https://github.com/datahub-project/datahub/pull/6605
- feat(redshift): Fetch lineage from unload queries by @mmmeeedddsss in https://github.com/datahub-project/datahub/pull/7041
- fix(ci): do not confirm on force for deletion by @anshbansal in https://github.com/datahub-project/datahub/pull/7106
- fix(analytics): add missing usage events causing warning in logs by @anshbansal in https://github.com/datahub-project/datahub/pull/7109
- feat(quickstart): Remove kafka-setup as a hard deployment requirement by @pedro93 in https://github.com/datahub-project/datahub/pull/7073
- fix(tests): Fixing add_users smoke test by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/7116
- chore(deps): bump ua-parser-js from 0.7.32 to 0.7.33 in /docs-website by @dependabot in https://github.com/datahub-project/datahub/pull/7122
- docs(gms): clarify behavior of soft deletion in UI by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/7117
- fix(kafka-setup): Make topic name consistent with other images by @pedro93 in https://github.com/datahub-project/datahub/pull/7103
- chore(deps): bump ua-parser-js from 0.7.32 to 0.7.33 in /datahub-web-react by @dependabot in https://github.com/datahub-project/datahub/pull/7123
- feat(ingest): powerbi # add powerbi workspaces to containers by @looppi in https://github.com/datahub-project/datahub/pull/6532
- fix(diffMode): prevent misconfiguration of diff mode by @RyanHolstien in https://github.com/datahub-project/datahub/pull/7127
- fix(ui) Display glossary term name in analytics page properly by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/7128
- fix(ui): only use visible and enabled tabs for selected tab and routing in entity profiles by @Masterchen09 in https://github.com/datahub-project/datahub/pull/6629
- fix(htrace): remove htrace jar by @szalai1 in https://github.com/datahub-project/datahub/pull/7126
- feat(datahub-lite): simplify get response by @shirshanka in https://github.com/datahub-project/datahub/pull/7131
- fix(doc/biquery): Updating bigquery capability doc by @treff7es in https://github.com/datahub-project/datahub/pull/7136
- fix(ci): do not fail fast for matrix runs by @anshbansal in https://github.com/datahub-project/datahub/pull/7132
- refactor(ui): refactor capitalization of platform name and sub types by @Masterchen09 in https://github.com/datahub-project/datahub/pull/7099
- refactor(cli): extract method, change wording by @anshbansal in https://github.com/datahub-project/datahub/pull/7134
- docs(lineage): Updating Lineage feature guide by @maggiehays in https://github.com/datahub-project/datahub/pull/6257
- removing WIP by @laulpogan in https://github.com/datahub-project/datahub/pull/7140
- docs(oidc): Updating + improving docs around OIDC configuration by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/7141
- fix(ingest): add message proto check by @tinolyu in https://github.com/datahub-project/datahub/pull/7130
- fix(ingest): use snowflake median function in profiling by @hsheth2 in https://github.com/datahub-project/datahub/pull/6987
- feat(ui): allow removing parentNodes of Glossary Nodes and Glossary Terms by @ngamanda in https://github.com/datahub-project/datahub/pull/7135
- feat(ui) Add new embedded profile to be displayed in extension by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/7113
- feat(ingest): add
--log-file
option and show CLI logs in UI report by @hsheth2 in https://github.com/datahub-project/datahub/pull/7118
- fix(misc): NPE and GraphQL case fixes by @david-leifker in https://github.com/datahub-project/datahub/pull/7149
- fix(ingest/snowflake): fix regression in approx count distinct by @hsheth2 in https://github.com/datahub-project/datahub/pull/7146
- [docs] fix typo / add missing line for docker compose / attach overwriting system action config for confluent. by @kdongho in https://github.com/datahub-project/datahub/pull/7142
- reordering sidebar and adding homepage to apis by @laulpogan in https://github.com/datahub-project/datahub/pull/7139
- fix(ingestion): powerbi # Not all arguments converted to string by @mohdsiddique in https://github.com/datahub-project/datahub/pull/7157
- fix(ui): Sort top users by their query count in datasets stats tab by @jaykadambi in https://github.com/datahub-project/datahub/pull/7148
- refactor(ui): Updates to Manual Lineage search by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/7151
- feat(ui) Build entity doesn't exist page for entity profiles by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/7150
- ci(ingest): fix broken CI workflow for metadata-ingestion by @hsheth2 in https://github.com/datahub-project/datahub/pull/7161
- fix(ingest): azuread group mapping do not stop ingestion by @anshbansal in https://github.com/datahub-project/datahub/pull/7169
- fix(docs): Fixes links to docs templates by @viniciusdsmello in https://github.com/datahub-project/datahub/pull/7171
- refactor(ui ingest): Allow enabling / disabling ingestion schedule easily by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/7162
- fix(ingest): switch various sources to
auto_stale_entity_removal
helper by @hsheth2 in https://github.com/datahub-project/datahub/pull/7158
- docs(townhall) Update Townhall History doc by @maggiehays in https://github.com/datahub-project/datahub/pull/7180
- test(ingest/delta-lake): fix spurious directory creation by @hsheth2 in https://github.com/datahub-project/datahub/pull/7179
- feat: add a linter for github actions workflows by @hsheth2 in https://github.com/datahub-project/datahub/pull/7178
- fix(quickstart): adding back kafka-setup by @szalai1 in https://github.com/datahub-project/datahub/pull/7181
- fix(docs) Fix broken links in ingestion docs by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/7183
- fix(ingest/GX): fix snowflake urn generated from connection string by @mayurinehate in https://github.com/datahub-project/datahub/pull/7173
- feat(ingest): switch dbt to use
auto_stale_entity_removal
by @hsheth2 in https://github.com/datahub-project/datahub/pull/7160
- fix(ingest): fix issue in glue tests by @hsheth2 in https://github.com/datahub-project/datahub/pull/7185
- fix(log): logging timestamp in ISO8601 format instead of time by @anshbansal in https://github.com/datahub-project/datahub/pull/7188
- feat(ingest): bigquery - extracts lineage metadata from catalog api by @PatrickfBraz in https://github.com/datahub-project/datahub/pull/7137
- fix(ingest/tableau): show warning about token expiry for PATs by @hsheth2 in https://github.com/datahub-project/datahub/pull/7187
- fix(ingest/vertica): Fixing missing container properties by @treff7es in https://github.com/datahub-project/datahub/pull/7197
- chore(deps): bump Netty from 4.1.85.Final to 4.1.86.Final by @janhicken in https://github.com/datahub-project/datahub/pull/7191
- docs(ingestion): powerbi # Add permission for DAX and mashup expressions by @mohdsiddique in https://github.com/datahub-project/datahub/pull/7195
- feat(elasticsearch): Elasticsearch improvements by @david-leifker in https://github.com/datahub-project/datahub/pull/6894
- fix(test): spark-lineage # build task as dependency of integrationTest by @mohdsiddique in https://github.com/datahub-project/datahub/pull/7189
- chore(sample): add status removed aspect for sample data by @anshbansal in https://github.com/datahub-project/datahub/pull/7203
- docs(managed datahub): release notes for v0.1.73 by @anshbansal in https://github.com/datahub-project/datahub/pull/7194
- fix(bootstrapdata): update timestamp to be in the last 1 year by @szalai1 in https://github.com/datahub-project/datahub/pull/7206
- fix(ingest/bigquery): quoting for APPROX_COUNT_DISTINCT in BigQuery by @mryorik in https://github.com/datahub-project/datahub/pull/7207
- fix(versioning): Ensure that CLI version is always dot-delimited even in minor release versions by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/7200
- fix(test): missing variables in test causing error in logs by @anshbansal in https://github.com/datahub-project/datahub/pull/7210
- feat(mlModel): mark downstream jobs as ml model downstreams lineage by @mayurinehate in https://github.com/datahub-project/datahub/pull/7205
- ci(): fix datahub-upgrade quickstart regression by @hsheth2 in https://github.com/datahub-project/datahub/pull/7217
- feat(ingest): Add custom properties to the ldap ingestion by @bda618 in https://github.com/datahub-project/datahub/pull/7125
- fix(ingest): upgrade feast to avoid build issues by @hsheth2 in https://github.com/datahub-project/datahub/pull/7218
- fix(ui) Increase the number of assertions that we query for in tab by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/7215
- fix(ci): trivy code scanning fix by @anshbansal in https://github.com/datahub-project/datahub/pull/7232
- feat(glue): Use table name as human-readable name for Glue ingestion by @danielcmessias in https://github.com/datahub-project/datahub/pull/7213
- feat(ui): Supporting display of columns and storage count in previews by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/7198
- fix(gms): Fixes delete references for single relationship aspects by @pedro93 in https://github.com/datahub-project/datahub/pull/7211
- docs(ingest/lineage): clarify name field in entity config for file based lineage by @mayurinehate in https://github.com/datahub-project/datahub/pull/7225
- fix(ui): typo 'Documenataion' by @vojtechneradatos in https://github.com/datahub-project/datahub/pull/7227
- fix(cli/delete): skip references prompt if deleting an aspect by @hsheth2 in https://github.com/datahub-project/datahub/pull/7220
- fix(ingest/tableau): implement workbook_page_size parameter by @hsheth2 in https://github.com/datahub-project/datahub/pull/7216
- fix(gms): Corrects MCP generation in async mode by @pedro93 in https://github.com/datahub-project/datahub/pull/7214
- fix(ingest): redshift # build late binding view lineage when sql written in upper case by @looppi in https://github.com/datahub-project/datahub/pull/7223
- fix(siblings) Fix editing of schema fields for siblings with unequal schemas by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/7199
- fix(ingest-idp): emit empty GroupMembership when there are no groups by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/7196
- feat(lineage): add time filtering for lineage edges by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/7159
- chore(deps): bump http-cache-semantics from 4.1.0 to 4.1.1 in /docs-website by @dependabot in https://github.com/datahub-project/datahub/pull/7230
- refactor(docs): Minor language updates for kafka source doc header by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/7237
- docs(website): fix feature availability dark mode styles by @jeffmerrick in https://github.com/datahub-project/datahub/pull/7233
- chore(log/docs): improve error log, docs by @anshbansal in https://github.com/datahub-project/datahub/pull/7239
- fix(dev.sh): Add context to kafka-setup build by @szalai1 in https://github.com/datahub-project/datahub/pull/7234
- feat(cli): improve docker quickstart by @hsheth2 in https://github.com/datahub-project/datahub/pull/7184
- fix(elasticsearch): fix orphan index clean up pattern, consistent top… by @david-leifker in https://github.com/datahub-project/datahub/pull/7242
- chore(deps): bump http-cache-semantics from 4.1.0 to 4.1.1 in /datahub-web-react by @dependabot in https://github.com/datahub-project/datahub/pull/7231
- Update data_platforms.json by @RainerGa in https://github.com/datahub-project/datahub/pull/7244
- fix(autocomplete): Use normal properties name instead of urn name in autocomplete by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/7236
- fix(frontend logs): Silencing harmless log messages (and adding path for future) by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/7254
- fix(docker): fix ability to use non-default reg by @david-leifker in https://github.com/datahub-project/datahub/pull/7250
- logging(elasticsearch): improve messaging in orphan index detection by @david-leifker in https://github.com/datahub-project/datahub/pull/7246
- chore(ci): update base image dependencies by @anshbansal in https://github.com/datahub-project/datahub/pull/7248
- docs(graphql): remove reference of non-existent gms.graphql by @mayurinehate in https://github.com/datahub-project/datahub/pull/7240
- Add graphql error and call metrics at startuptime by @szalai1 in https://github.com/datahub-project/datahub/pull/7226
- docs(ingest): update kafka connect doc, simplify starter recipe by @mayurinehate in https://github.com/datahub-project/datahub/pull/7243
- fix(cli): update message when pulling docker images by @mayurinehate in https://github.com/datahub-project/datahub/pull/7241
- fix(ingest/tableau): handle missing query in tableau views by @hsheth2 in https://github.com/datahub-project/datahub/pull/7186
- feat(ingest/s3): use latest file to infer schema metadata by @mayurinehate in https://github.com/datahub-project/datahub/pull/7202
- fix(schema-blame): check if list of ChangeTransactions is empty before processing by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/7263
- fix(change-events): guard against NPE's by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/7264
- fix(docker): add env variable to control mysql setup image, sort dock… by @shirshanka in https://github.com/datahub-project/datahub/pull/7266
- chore(logs): clean logs scanning location by @anshbansal in https://github.com/datahub-project/datahub/pull/7261
- fix(profile): use department name if available by @anshbansal in https://github.com/datahub-project/datahub/pull/7257
- fix(async ingest): Fix async ingest path by @pedro93 in https://github.com/datahub-project/datahub/pull/7269
- fix(compose): fix override file missing container by @david-leifker in https://github.com/datahub-project/datahub/pull/7270
- fix(ui): fix spacing on share buttons by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/7272
New Contributors
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.9.6...v0.10.0