The Metadata Platform for your Data Stack
Fixes MCL message deserialization bug when using internal schema registry and running specific upgrade jobs.
policyFields (enabled by default):
BOOTSTRAP_SYSTEM_UPDATE_POLICY_FIELDS_ENABLED:true
dataJobNodeCLL (disabled by default):
BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_ENABLED:false
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 1
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 13 out of bounds for length 2
at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:460)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:188)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:260)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:248)
If currently affected, please remove the topic prior to upgrading to v0.13.2 to remove the corrupted message. The default topic name is MetadataChangeLog_Versioned_v1
however if you've customized the topic name be sure to remove that topic.
If running kafka per the example Helm chart for prerequisites the following command will delete the topic.
kubectl exec -it prerequisites-kafka-broker-0 -c kafka -- kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic MetadataChangeLog_Versioned_v1
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.13.1...v0.13.2
ER_MODEL_RELATIONSHIP_FEATURE_ENABLED
env variable to use this feature!
fspath
lint error management enhance the code reliability and quality. 9960, 9976
This release introduces default settings for stateful ingestion and updates in handling dbt ingestion. For details on all breaking changes, view the full documentation here.
MASSIVE shoutout to our contributors!
akarsh991, alexs-101, AvaniSiddhapuraAPT, diegmonti, dushayntAW, filipe-caetano-ovo, HuanjieGuo, jayacryl, k7ragav, kopax-polyconseil, LePuppy, Nelvin73, pinakipb2, poorvi767, rae89, trialiya, valeral.
ANich, shubhamjagtap639, sgomezvillamor, siladitya2, skrydal, sumitappt, Masterchen09, mayurinehate, ngamanda, gaurav2733, githendrik, jayasimhankv.
anshbansal, asikowitz, chriscollins3456, darnaut, david-leifker, eboneil, ethan-cartwright, gabe-lyons, hsheth2, pedro93, RyanHolstien, treff7es, yoonhyejin.
external_base_url
for explore url generation by @k7ragav in https://github.com/datahub-project/datahub/pull/10093
convert_column_urns_to_lowercase
in mapping CLL by @hsheth2 in https://github.com/datahub-project/datahub/pull/10132
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.13.0...v0.13.1
SQLAlchemy Source Enhancements: Support for view lineage across all SQLAlchemy sources (PR #9039). Airflow Integration: Retry callback and support for ExternalTaskSensor subclasses added (PR #8514). Kafka Enhancements: Increased Kafka message size and enabled compression (PR #9038). JSONSchema Ingestion: Enabled schema-aware JsonSchemaTranslator (PR #8971). Search Bar Improvements: Added a flag to hide/display the autocomplete query (PR #9104). SQL Parser Performance: Enhancements and asyncio fixes (PR #9119). MongoDB Ingestion: Support for stateful ingestion and improved schema inference for lists (PR #9118, PR #9145). Policy Engine Updates: Refactoring and enhancements, including support for 10k+ policies (PR #9163, PR #9177). UI Enhancements: Numerous improvements including command-k icons in the search bar, updated Apollo cache, and auto-complete debounce in the search bar (PR #9194, PR #9193, PR #9205). Fivetran Integration: Connector integration for Fivetran (PR #9018). Neo4j Database Support: Connection to specific Neo4j databases now supported (PR #9179). Chart Subtypes in UI: Support for chart subtypes (PR #9186).
BigQuery Fixes: Resolved issues with lineage filter query, and fixed extracting comments from complex types (PR #9114, PR #8950). MongoDB Refactoring: Platform instance addition to MongoDB (PR #8663). Kafka Setup: Adjusted truststore settings for PEM files (PR #8656). REST API Authorization: Fixed rollback failure when authorization is enabled (PR #9092). Java Exception Handling: Addressed java.util.ConcurrentModificationException (PR #9090). UI and Documentation: Fixed filtering logic in UI, corrected documentation errors, and added feature guides (PR #9116, PR #9125, PR #9124, PR #9126, PR #9134, PR #9137, PR #9122, PR #9068). SQL Server and Snowflake Ingestion: Updated queries and fixed missing view downstream call (PR #9127, PR #8966). ClickHouse and DB2 Ingestion: Addressed column reflection regression and table properties handling (PR #9143, PR #9128). Ingestion Improvements: Numerous fixes and enhancements across various ingestion sources (PR #9153, PR #9155, PR #9141, PR #9157, PR #9123). CI and Build Process: Tweaked workflows, increased gradle retries, and addressed CI errors (PR #9052, PR #9091, PR #9160). Security Updates: Addressed a zookeeper CVE and other security concerns (PR #9190). UI Refactoring: Improved entity page loading indicators and renamed button texts (PR #9195, PR #9196). Policy and Auth Enhancements: Refactored policy locking and added roles to policy engine validation logic (PR #9178).
API Testing: Added tests for managing secrets, access token privilege, and flaky tests fix (PR #9121, PR #9167, PR #9132, PR #9175). Cypress Test Fixes: Addressed glossary navigation and download_lineage_results tests (PR #9175, PR #9132). Cleanup and Refactoring Ingestion Cleanup: Removed legacy memory_leak_detector and refactored ingestion sources (PR #9158, PR #9135, PR #9120, PR #9105). PDL Refactoring: Refactored Assertion model enums (PR #9191). Build and Deployment Release Preparation: Updated files for the 0.12.0 release (PR #9130).
JsonSchemaTranslator
by @KulykDmytro in https://github.com/datahub-project/datahub/pull/8971
entity_supports_aspect
helper by @hsheth2 in https://github.com/datahub-project/datahub/pull/9120
LineageGraphOnboardingConfig.tsx
by @walter9388 in https://github.com/datahub-project/datahub/pull/9162
@cliMajorVersion@
correctly by @hsheth2 in https://github.com/datahub-project/datahub/pull/9228
datahub put
by @hsheth2 in https://github.com/datahub-project/datahub/pull/9359
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.12.0...v0.12.1
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.12.1...v0.12.1rc2
Nested Domains are here! This provides flexibility in organizing your entities within Domains to match the unique organizational structure of your company.
The Acryl DataHub Chome extension now supports PowerBI! This is a super powerful way for your business users to gain DataHub-specific insights directly in the BI tools they use most. Additionally, we now support making edits back to DataHub Entities directly from the Chrome extension.
Shoutout to @Ramendra761 from the PayPal Team for contributing a new Access Management tab in Dataset Entity pages! The aim of this feature is to enable users to view the required roles for accessing the Dataset, as defined by Roles and/or Policies in the organization’s Access Management System. It also introduces the ability to request access directly from the page.
We are incubating CLL support for the following:
ownershipTypeUrn
referencing a customer ownership type or a (deprecated) type
. Where before adding an ownership without a concrete type was allowed, this is no longer the case. For simplicity you can use the type
parameter which will get translated to a custom ownership type internally if one exists for the type being added.incremental_lineage
is set default to off.urn:li:corpuser:datahub
owner for the Measure
, Dimension
and Temporal
tags emitted
by Looker and LookML source connectors.pip install 'acryl-datahub-airflow-plugin[plugin-v2]'
. To continue using the v1 plugin, set the DATAHUB_AIRFLOW_PLUGIN_USE_V1_PLUGIN
environment variable to true
.include_metastore
, which will cause all urns to be changed when disabled.
This is currently enabled by default to preserve compatibility, but will be disabled by default and then removed in the future.
If stateful ingestion is enabled, simply setting include_metastore: false
will perform all required cleanup.
Otherwise, we recommend soft deleting all databricks data via the DataHub CLI:
datahub delete --platform databricks --soft
and then reingesting with include_metastore: false
.RESOURCE_TYPE
became TYPE
and RESOURCE_URN
became URN
.
Any existing policies using these filters (i.e. defined for particular urns
or types
such as dataset
) need to be upgraded
manually, for example by retrieving their respective dataHubPolicyInfo
aspect and changing part using filter i.e. "resources": {
"filter": {
"criteria": [
{
"field": "RESOURCE_TYPE",
"condition": "EQUALS",
"values": [
"dataset"
]
}
]
}
into
"resources": {
"filter": {
"criteria": [
{
"field": "TYPE",
"condition": "EQUALS",
"values": [
"dataset"
]
}
]
}
for example, using datahub put
command. Policies can also be removed and re-created via UI.
match_fully_qualified_names: true
. This means that any dataset_pattern
or schema_pattern
specified will be matched on the fully qualified dataset name, i.e. <project_name>.<dataset_name>
. We attempt to support the old pattern format by prepending .*\\.
to dataset patterns lacking a period, so in most cases this should not cause any issues. However, if you have a complex dataset pattern, we recommend you manually convert it to the fully qualified format to avoid any potential issues.-base
tag, full image to head by @david-leifker in https://github.com/datahub-project/datahub/pull/8919
use_compiled_code
and test_warnings_are_errors
by @hsheth2 in https://github.com/datahub-project/datahub/pull/8956
update
s by @hsheth2 in https://github.com/datahub-project/datahub/pull/9078
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.11.0...v0.12.0
This release introduces substantial improvements to search ranking which require reindexing indices.
During the reindexing:
This process can take anywhere from 5 minutes to multiple hours; as a rough estimate, please expect it to take 1 hour for every 2.3 million entities. After the reindex is complete, please check your ingestion run to re-run any that did not complete.
We have some really exciting improvements to the DataHub user experience in this release! The new search and browse experience, which was first made available in the previous release behind a feature flag, is now on by default. Check out our release notes for v0.10.5 to get more information and documentation on this new Browse experience.
In addition to the ranking changes mentioned above, this release includes changes to the highlighting of search entities to understand why they match your query. You can also sort your results alphabetically or by last updated times, in addition to relevance. In this release, we suggest a correction if your query has a typo in it.
In this release we now enable you to create and delete pinned announcements on your DataHub homepage! If you have the “Manage Home Page Posts” platform privilege you’ll see a new section in settings called “Home Page Posts” where you can create and delete text posts and link posts that your users see on the home page.
OpenAPI entity and aspect endpoints expanded to improve developer experience when using this API with additional aspects to be added in the near future.
Added support for Confluent S3 Sink Connector, extracting stored procedures and jobs from mssql, and snowflake shares. Additionally, sql parsing source now converts query logs into CLL and usage.
The CLI now supports recursive deletes.
Starting from this release, we support versioned documentation on the datahub docs site! Select the version you’re on and browse docs specifically at that version.
entity_type_counts
and aspect_counts
by @hsheth2 in https://github.com/datahub-project/datahub/pull/8586
capture_executions
to docs by @hsheth2 in https://github.com/datahub-project/datahub/pull/8662
needs_artifact_download
output for ingestion image by @hsheth2 in https://github.com/datahub-project/datahub/pull/8695
wrap_aspect_as_workunit
method by @hsheth2 in https://github.com/datahub-project/datahub/pull/8766
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.10.5...v0.11.0
It’s here, it’s here! We are incredibly excited to roll out our re-designed, streamlined Search and Browse experience. End-users now have a one-stop-shop to search for specific data entities and browse across systems, making it easier than ever to find the most relevant and meaningful resources within DataHub.
Checkout the screenshot below and get a full walk-through in this video!
Ingestion Enhancements:
platform_instance
using project_idplatform_instance
Lineage Improvements:
platform_instance
in Dataset
by @dungdm93 in https://github.com/datahub-project/datahub/pull/8313
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.10.4...v0.10.5
You can now create and assign Custom Ownership types within DataHub; plus, we now display the owner type on an Entity Page
Various bug fixes to Column Level Lineage visualization
_entityType
filter in the application layer + frontend by @gabe-lyons in https://github.com/datahub-project/datahub/pull/8102
original_table_name
logic in sql source by @hsheth2 in https://github.com/datahub-project/datahub/pull/8130
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.10.3...v0.10.4
get_urns_by_filter
modified_since
, extract_dataset_schema
, and morelast_updated
get_urns_by_filter
by @hsheth2 in https://github.com/datahub-project/datahub/pull/7902
DataHubGraph.get_entity_semityped
method by @hsheth2 in https://github.com/datahub-project/datahub/pull/7905
env
to container properties by @hsheth2 in https://github.com/datahub-project/datahub/pull/8027
datahub ingest mcps
by @hsheth2 in https://github.com/datahub-project/datahub/pull/7871
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.10.2...v0.10.3