Cdap Versions Save

An open source framework for building data analytic applications.

v6.10.0

4 months ago

Improvements

CDAP-15361: Wrangler is schema aware.
CDAP-20799: CDAP supports multi pipeline pull and push as part of source control management with GitHub. CDAP-20831: If a task is stuck, task workers are forcefully restarted.
CDAP-20868: Added capability to run concurrent tasks in task workers.
PLUGIN-1694: Added validation for incorrect credentials in the Amazon S3 source.

Changes

CDAP-20904 and CDAP-20581: In Source Control Management, GitHub PAT was removed from CDAP web interface for repository configurations.
CDAP-20846: Improved latency when BigQuery pushdown is enabled by fetching artifacts from a local cache.
PLUGIN-1718: The BigQuery sink supports flexible table names and column names.
PLUGIN-1692: BigQuery sinks support ingesting data to JSON data type fields.
PLUGIN-1705: In BigQuery sink jobs, you can add labels in the form of key-value pairs.
PLUGIN-1729: In BigQuery execute jobs, you can add labels in the form of key-value pairs.
PLUGIN-1293: The Cloud Storage Java Client is upgraded to version 2.3 and later.

Fixes

CDAP-20521: Fixed an issue causing columns that have all null values to be dropped in Wrangler.
CDAP-20587: Fixed an issue causing slowness in API while fetching runs of all applications in a namespace.
CDAP-20815: Fixed an issue causing pipeline upgrades to not have the intended description.
CDAP-20839: Made the following fixes to Wrangler grammar:

  • The NUMERIC token type supports negative numbers.
  • The PROPERTIES token type supports one or more properties.

PLUGIN-1681: Fixed an issue in the Postgres DB plugin causing macros to be unsupported for database configuration.

Deprecated

Spark compute engine running on Scala 2.11 is not supported.

v6.9.2

8 months ago

Improvements

CDAP-19428: CDAP supports setting custom scopes when creating a Dataproc cluster.

CDAP-20698: You can set common metadata labels for Dataproc clusters and jobs using the Common Labels property in the Ephemeral Dataproc compute profile.

You can set labels for the Dataproc jobs using the Common Labels property in the Existing Dataproc compute profile.

You can set a pipeline runtime argument with the key system.profile.properties.labels and a value representing the labels in the following format: key1|value1;key2|value2. This setting overrides the common labels set in the compute profile for pipeline runs.

CDAP-20712: CDAP supports using Dataproc temp buckets in compute profiles.

Fixes

PLUGIN-1660: Added retry for Pub/Sub snapshot creation and deletion in real-time pipeline with a Pub/Sub source when a retryable internal error is thrown.

CDAP-20674: Fixed a bug causing the Dynamic Spark plugins to fail when running on Dataproc 1.5.

CDAP-20680: Fixed a discrepancy in warning and error counts reported between the pipeline summary tab and system logs.

CDAP-20759: Fixed a problem when, in rare cases, a cluster couldn't be found with Cluster Reuse.

CDAP-20778: Fixed a bug causing the JavaScript transform to fail on Dataproc 2.1.

v6.9.1

11 months ago

Improvements

CDAP-20436: Added the ability to aggregate pipeline metrics in the RuntimeClientService by setting app.program.runtime.monitor.metrics.aggregation.enabled to true in cdap-site.xml. This slightly increases the resource usage of the RuntimeClientService but decreases the load on the CDAP metrics service. The scalability of the metrics service increases with the number of spark executors per pipeline.

CDAP-20228: CDAP supports source control management with GitHub. Cloud Data Fusion supports using Source Control Management to manage pipeline versions through GitHub repositories. Source Control Management is available in Preview.

CDAP-20543: CDAP version 6.9.1 supports the Dataproc image 2.1 compute engine, which runs in Java11. If you change the Dataproc image to 2.1, the JDBC drivers that the database plugins use in those instances must be compatible with Java11.

CDAP-20455: Streaming pipelines that use Spark checkpointing can use macros if the cdap.streaming.allow.source.macros runtime argument is set to true. Note that macro evaluation will only be performed for the first run in this case, then stored in the checkpoint. It won't be reevaluated in later runs.

CDAP-20466: Added Lifecycle microservices endpoint to delete a streaming application state for Kafka Consumer Streaming and Google Cloud Pub/Sub Streaming sources.

CDAP-20488: Improved performance of replication pipelines by caching schema objects for data events.

CDAP-20500: Added a launch mode setting to the Dataproc provisioners. When set to Client mode, the program launcher runs in the Dataproc job itself, instead of as a separate YARN application. This reduces the start-up time and cluster resources required, but may cause failures if the launcher needs more memory, such as if there is an action plugin that loads data into memory.

CDAP-20504: Removed duplicate backend calls when a program reads from the secure store.

CDAP-20567: Added support to upgrade Pipeline Post-run Action (Pipeline Alerts) plugins during the pipeline upgrade process.

PLUGIN-1537: CDAP supports the following improvements and changes for real time pipelines with a single Pub/Sub streaming source and no Windower plugins: The Pub/Sub streaming source has built-in support—data is processed at least once. Enabling Spark checkpointing isn’t required Pub/Sub streaming source creates a Pub/Sub snapshot at the beginning of each batch and removes it at the end of each batch. The Pub/Sub Snapshot creation has a cost associated with it. For more information, see PubSub pricing. The snapshot creations can be monitored using Cloud Audit logs.

Fixed

CDAP-18394: Fixed an issue which checks GET permission on a namespace that doesn't exist yet during the namespace creation flow.

CDAP-20216: Fixed an issue where Dataproc continued running a job when it couldn't communicate with the CDAP instance, if the replication job or pipeline was deleted in CDAP.

CDAP-20568: Fixed an issue that caused pipelines with triggers with runtime arguments to fail after the instance was upgraded to CDAP 6.8+ and 6.9.0.

CDAP-20597: Fixed an issue where arguments set by actions don't overwrite runtime arguments. Users must add the following runtime argument: system.skip.normal.macro.evaluation=true.

PLUGIN-1594: Fixed an issue where initial offset was not considered in the Kafka batch source.

CDAP-20655: Fixed an issue that caused the Pipeline Studio page to show an incorrect count of triggers.

CDAP-20660: Fixed an issue that caused the Trigger's Payload Config to be missing in the UI for an upgraded instance.

Deprecated

CDAP-20667: All datasets except FileSet and ExternalDataset are deprecated and will be removed in a future release. All the deprecated datasets use the Table dataset in some form, which only works for programs running with the native provisioner on very old Hadoop releases.

v6.8.3

11 months ago

Feature

CDAP-20381: Added the ability to configure java options for a pipeline run by setting the system.program.jvm.opts runtime argument.

Improvement

CDAP-20567: CDAP supports upgrades in the Pipeline Post-run Action (Pipeline Alerts) plugins during the pipeline upgrade process.

Fixes

CDAP-20549: Fixed an issue where executor resource settings are not honored when app.pipeline.overwriteConfig is set.

CDAP-20568: Fixed an issue that caused pipelines with triggers with runtime arguments to fail after the instance was upgraded to CDAP 6.8+ and 6.9.0.

CDAP-20597: Fixed an issue where arguments set by actions didn't overwrite runtime arguments. To fix the issue, add the following runtime argument: system.skip.normal.macro.evaluatio=true.

CDAP-20643: Fixed security vulnerabilities by ensuring that software updates are applied regularly to the CDAP operator images.

CDAP-20655: Fixed an issue that caused the Pipeline Studio page to show an incorrect count of triggers.

CDAP-20660: Fixed an issue that caused the Trigger's Payload Config to be missing in the UI for an upgraded instance.

PLUGIN-1582: Fixed an issue in the BigQuery Sink where the absence of an ordering key was giving an exception.

PLUGIN-1594: Fixed an issue where initial offset was not considered in the Kafka Batch Source.

v6.8.2

1 year ago

Bug Fixes

CDAP-20431: Fixed an issue that sometimes caused pipelines to fail when running pipelines on Dataproc with the following error: Unsupported program type: Spark. The first time a pipeline that only contained actions ran on a newly created or upgraded instance, it succeeded. However, the next pipeline runs, which included sources or sinks, might have failed with this error.

v6.9.0

1 year ago

Features

CDAP-20454: Added support for specifying filters in SQL in Wrangler and pushdown of SQL filters in Wrangler to BigQuery. In the Wrangler transformation, added support for specifying preconditions in SQL, and added support for transformation pushdown for SQL preconditions.

CDAP-20288: Added support for Dataproc driver node groups. To use Dataproc driver node groups, when you create the Dataproc cluster, configure the following properties: yarn:yarn.nodemanager.resource.memory.enforced=false yarn:yarn.nodemanager.admin-env.SPARK_HOME=$SPARK_HOME

Note: The single quotation marks are important in the property when using gcloud CLI to create the cluster ('yarn:yarn.nodemanager.admin-env.SPARK_HOME=$SPARK_HOME') so that the shell doesn't try to resolve the $ locally before submitting.

CDAP-19628: Added support for Window Aggregation operations in Transformation Pushdown to reduce the pipeline execution time by performing SQL operations in BigQuery instead of Spark.

CDAP-19425: Added support for editing deployed pipelines.

CDAP-20228: Added support for pipeline version control with GitHub.

Improvements

CDAP-20381: Added the ability to configure Java options for a pipeline run by setting the system.program.jvm.opts runtime argument.

CDAP-20140: Replication pipelines generate logs for stats of events processed by source and target plugins at a fixed interval. Changes CDAP-20430: Fixed the pipeline stage validation API to return unevaluated macro values to prevent secure macros from being returned.

CDAP-20373: When you duplicate a pipeline, CDAP appends _copy to the pipeline name when it opens in the Pipeline Studio. In previous releases, CDAP appended _<v1, v2, v3> to the name.

Bug Fixes

CDAP-20458: Fixed an issue where the flow control running count metric (system.flowcontrol.running.count) might be stale if no new pipelines or replication jobs were started.

CDAP-20431: Fixed an issue that sometimes caused pipelines to fail when running pipelines on Dataproc with the following error: Unsupported program type: Spark. The first time a pipeline that only contained actions ran on a newly created or upgraded instance, it succeeded. However, the next pipeline runs, which included sources or sinks, might have failed with this error.

CDAP-20301: Fixed an issue where a replication job got stuck in an infinite retry when it failed to process a DDL operation.

CDAP-20276: For replication jobs, fixed an issue where retries for transient errors from BigQuery might have resulted in data inconsistency.

CDAP-19389](https://cdap.atlassian.net/browse/CDAP-19389): For SQL Server replication sources, fixed an issue on the Review assessment page, where SQL Server DATETIME and DATETIME2 columns were shown as mapped to TIMESTAMP columns in BigQuery. This was a UI bug. The replication job mapped the data types to the BigQuery DATETIME type.

PLUGIN-1516: Updated the Window Aggregation Analytics plugin to support Spark3 and remove the dependency in Scala 2.11

PLUGIN-1514: For the Database sink, fixed an issue where the pipeline didn’t fail if there was an error writing data to the database. Now, if there is an error writing data to the database, the pipeline fails and no data is written to the database.

PLUGIN-1513: For BigQuery Pushdown, fixed an issue when BigQuery Pushdown was enabled for an existing dataset, the Location where the BigQuery Sink executed jobs was the location specified in the Pushdown configuration, not the BigQuery Dataset location. The configured Location should have only been used when creating resources. Now, if the dataset already exists, the Location for the existing dataset is used.

PLUGIN-1512: Fixed an issue where pipelines failed when the output schema was overridden in certain source plugins. This was because the output schema didn’t match the order of the fields from the query. This happened when the pipeline included any of the following batch sources:

  • Database
  • Oracle
  • MySQL
  • SQL Server
  • PostgreSQL
  • DB2
  • MariaDB
  • Netezza
  • CloudSQL PostgreSQL
  • CloudSQL MySQL
  • Teradata

Pipelines no longer fail when you override the output schema in these source plugins. CDAP uses the name of the field to match the schema of the field in the result set and the field in the output schema.

PLUGIN-1503: Fixed an issue where pipelines that had a Database batch source and an Oracle sink that used a connection object (using SYSDBA) to connect to an Oracle database failed to establish a connection to the Oracle database. This was due to a package conflict between the Database batch source and the Oracle sink plugins.

PLUGIN-1494: For Oracle batch sources, fixed an issue that caused the pipeline to fail when there was a TIMESTAMP WITH LOCAL TIME ZONE column set to NULLABLE and the source had values that were NULL.

PLUGIN-1481: In the Oracle batch source, the Oracle NUMBER data type defined without precision and scale by default was mapped to CDAP string data type. If these fields were used by an Oracle Sink to insert into a NUMBER data type field in the Oracle table, the pipeline failed due to incompatibility between string and NUMBER type. Now, the Oracle Sink inserts these string types into NUMBER fields in the Oracle table.

v6.7.3

1 year ago

Bug Fixes

CDAP-19599: Fixed an issue in the BigQuery Replication Target plugin that caused replication jobs to fail when the BigQuery target table already existed. The new version of the plugin will automatically be used in new replication jobs.

CDAP-19622: Fixed upgrade for MySQL and SQL Server replication jobs. You can now upgrade MySQL and SQL Server replication jobs from CDAP 6.7.1 and 6.7.2 to CDAP 6.7.3.

CDAP-20013: Fixed upgrade for Oracle by Datastream replication jobs. You can now upgrade Oracle by Datastream replication jobs from CDAP 6.6.0 and 6.7.x to CDAP 6.7.3 or higher.

CDAP-20235: For Database plugins, fixed a security issue where the database username and password were exposed in App Fabric logs.

CDAP-20271: Fixed an issue that caused pipelines to fail when they used a connection that included a secure macro and the secure macro had JSON as the value (for example, the Service Account property).

CDAP-20392: Fixed an issue that occurred in certain upgrade scenarios, where pipelines that didn’t have the Use Connection property set, but the plugin the connection properties (such as Project ID and Service account information) were not displayed in the plugin UI.

CDAP-20394: Fixed an issue where the replication source plugin's event reader was not stopped by the Delta worker in case of errors, leading to leakage of the plugin's resources.

CDAP-20146: Fixed an issue in security-enabled instances that caused pipeline launches to fail and return a token expired error when evaluating secure macros in provisioner properties.

PLUGIN-1433: In the Oracle Batch Source, when the source data included fields with the Numeric data type (undefined precision and scale), CDAP set the precision to 38 and the scale to 0. If any values in the field had scale other than 0, CDAP truncated these values, which could have resulted in data loss. If the scale for a field was overridden in the plugin output schema, the pipeline failed.

Now, if an Oracle source has Numeric data type fields with undefined precision and scale, you must manually set the scale for these fields in the plugin output schema. When you run the pipeline, the pipeline will not fail and the new scale will be used for the field instead. However, there might be truncation if there are any Numbers present in the fields with the scale greater than the scale defined in the plugin. CDAP writes warning messages in the pipeline log indicating the presence of Numbers with undefined precision and scale in the pipeline. For more information about setting precision and scale in a plugin, see Changing the precision and scale for decimal fields in the output schema.

PLUGIN-1374: Improved performance for batch pipelines with MySQL sinks.

v6.8.1

1 year ago

Features

CDAP-19729: Added support to upgrade realtime pipelines created in CDAP 6.8.0 with a Kafka Consumer Streaming source to CDAP 6.8.1. After the CDAP platform is upgraded to 6.8.1, you can use the Lifecycle microservices to upgrade these pipelines.

Changes

CDAP-20110: When running CDAP on Kubernetes, Spark program types are now run as Kubernetes jobs instead of deployments.

CDAP-20201: CDAP now sets Spark Kubernetes connect/read timeouts based on the CDAP Kubernetes timeout settings. Previously, CDAP did not set Spark Kubernetes connection/read timeouts. Spark used its default timeout setting.

Bug Fixes

CDAP-20394: Fixed an issue where the replication source plugin's event reader was not stopped by the Delta worker in case of errors, leading to leakage of the plugin's resources.

CDAP-20392: Fixed an issue that occurred in certain upgrade scenarios, where pipelines that didn’t have the Use Connection property set, but the plugin the connection properties (such as Project ID and Service account information) were not displayed in the plugin UI.

CDAP-20271: Fixed an issue that caused pipelines to fail when they used a connection that included a secure macro that had JSON as the value (for example, the Service Account property).

CDAP-20257: For Oracle Datastream replication sources, fixed an issue where the Review Assessment page would freeze for a long time when the selected or manually entered table did not exist in the source database.

CDAP-20199: For Oracle Datastream replication sources, fixed an issue where the Select tables and transformations page timed out, failed to load the list of tables, and displayed the error deadline exceeded when the source database contained a large number of tables.

CDAP-20146: Fixed an error in security-enabled instances that caused pipeline launch to fail and return token expired to the user when evaluating secure macros in provisioner properties.

CDAP-20121: For MySQL Replication sources, fixed an issue that caused replication jobs to fail during initial snapshotting when the job included a runtime argument with the Debezium property binary-handling-mode.

CDAP-20028: For Replication jobs, increased retry duration for API calls to update state/offsets in Replication jobs.

CDAP-20013: Fixed upgrade for Oracle by Datastream replication jobs. You can now upgrade Oracle by Datastream replication jobs from CDAP 6.6.0 and 6.7.x to CDAP 6.8.1.

CDAP-19622: Fixed upgrade for MySQL and SQL Server replication jobs. You can now upgrade MySQL and SQL Server replication jobs from CDAP 6.7.x to CDAP 6.8.1.

v6.8.0

1 year ago

New Features

The Dataplex Batch Source and Dataplex Sink plugins are generally available (GA).

CDAP-19592: For Oracle (by Datastream) replication sources, added a purge policy for a GCS (Google Cloud Storage) bucket created by the plugin that Datastream will write its output to.

CDAP-19584: Added support for monitoring CDAP pipelines using an external tool.

CDAP-18450: Added support for AND triggers. Now, you can create OR and AND triggers. Previously, all triggers were OR triggers.

PLUGIN-871: Added support for BigQuery batch source pushdown.

Enhancements

CDAP-19678: Added ability to specify k8s affinity for CDAP services in CDAP custom resource.

CDAP-19605: Added ability to see the logs coming from Twill application master now in pipeline logs.

CDAP-19591: In the Datastream replication source, added the property GCS Bucket Location, which Datastream will write its output to.

CDAP-19590: In the Datastream replication source, added the list of Datastream regions to the Region property. You no longer need to manually enter the Datastream region.

CDAP-19589: For replication jobs with an Oracle (by Datastream) source, ensured data consistency when multiple CDC events are generated at the same timestamp, by ordering events reliably.

CDAP-19568: Significantly improved time it takes to start a pipeline (after provisioning).

CDAP-19555, CDAP-19554: Made the following improvements and changes for streaming pipelines with a single Kafka Consumer Streaming source and no Windower plugins:

Kafka Consumer Streaming source has native support so the data is guaranteed to be processed at least once.

CDAP-19501: For Replication jobs, improved performance for Review Assessment.

CDAP-19475: Modified /app endpoints (GET and POST) in AppLifecycleHttpHandler to include the following information in the response:

  "change": {
      "author": "joe",
      "creationTimeMillis": 1668540944833,
      "latest": true
}

The new information is included in the response for the following endpoints:

CDAP-19365: Changed the Datastream replication source to identify each row by the Primary key of the table. Previously, the plugin identified each row by the ROWID.

CDAP-19328: Splitter Transformation based plugins now have access to prepareRun, onRunFinish methods.

CDAP-18430: The Lineage page has a new look-and-feel.

Bug Fixes

CDAP-20002: Removed the CDAP Tour from the Welcome page.

CDAP-19939: Fixed an issue in the BigQuery target replication plugin that caused replication jobs to fail when replicating datetime columns from sources that are more precise than microsecond, for example datetime2 data type in SQL Server.

CDAP-19970: Google Cloud Data Loss Prevention plugins (version 1.4.0) are available in the CDAP Hub version 6.8.0 with the following changes:

  • For the Google Cloud Data Loss Prevention (DLP) PII Filter Transformation, fixed an issue where pipelines failed because the DLP client was not initialized.
  • For all of the Google Cloud Data Loss Prevention (DLP) transformations, added relevant exception details when validation of DLP Inspection template fails, rather than throwing a generic IllegalArgumentException.

CDAP-19630: For custom Dataproc compute profiles, fixed an issue where the wrong GCS bucket was used to stage data. Now, CDAP uses the GCS bucket specified in the custom compute profile.

CDAP-19599: Fixed an issue in the BigQuery Replication Target plugin that caused replication jobs to fail when the BigQuery target table already existed. The new version of the plugin will automatically be used in new replication jobs. Due to CDAP-19622, if you want to use the new plugin version in existing jobs, recreate each replication job.

CDAP-19486: In the Wrangler transformation, fixed an issue where the pipeline didn’t fail when the Error Handling property was set to Fail Pipeline. This happened when an error was returned, but no exception was thrown and there were 0 records in output. For example, this happened when one of the directive (such as parse-as-simple-date) failed because the input data was not in the correct format. This fix is under a feature flag and not available by default. If this feature flag is enabled, existing pipelines might fail if there are data issues since the default error handling property is set to Fail Pipeline.

CDAP-19481: Fixed an issue that caused Replication Assessment to hang when the Oracle (by Datastream) GCS Bucket property was empty or had an invalid bucket name. Now, CDAP returns a 400 error code during assessment when the property is empty or has an invalid bucket name.

CDAP-19455: Added user error tags to Dataproc errors returned during cluster creation and job submission. Added ability to set troubleshooting docs url in CDAP site for Dataproc API errors.

CDAP-19442: Fixed an issue that caused Replication jobs to fail when the source column name didn’t comply with BigQuery naming conventions. Now, if a source column name doesn’t comply with BigQuery naming conventions, CDAP replaces invalid characters with an underscore, prepends an underscore if the first character is a number, and truncates the name if it exceeds the maximum length.

CDAP-19266: In the File batch source, fixed an issue where Get Schema appeared only when Format was set to delimited. Now, Get Schema appears for all formats.

CDAP-18846: Fixed issue with the output schema when connecting a Splitter transformation with a Joiner transformation.

CDAP-18302: Fixed an issue where Compute Profile creation failed without showing an error message in the CDAP UI. Now, CDAP shows an error message when a Compute Profile is missing required properties.

CDAP-17619: Fixed an issue that caused imports in the CDAP UI to fail for pipelines exported through the Pipeline Microservices.

CDAP-13130: Fixed an issue where you couldn’t keep an earlier version of a plugin when you exported a pipeline and then imported it into the same version of CDAP, even though the earlier version of the plugin is deployed in CDAP. Now, if you export a pipeline with an earlier version of a plugin, when you import the pipeline, you can choose to keep the earlier version or upgrade it to the current version. For example, if you export a pipeline with a BigQuery source (version 0.21.0) and then import it into the same CDAP instance, you can choose to keep version 0.20.0 or upgrade to version 0.21.0.

PLUGIN-1433: In the Oracle Batch Source, when the source data included fields with the Numeric data type (undefined precision and scale), CDAP set the precision to 38 and the scale to 0. If any values in the field had scale other than 0, CDAP truncated these values, which could have resulted in data loss. If the scale for a field was overridden in the plugin output schema, the pipeline failed.

Now, if an Oracle source has Numeric data type fields with undefined precision and scale, you must manually set the scale for these fields in the plugin output schema. When you run the pipeline, the pipeline will not fail and the new scale will be used for the field instead. However, there might be truncation if there are any Numbers present in the fields with the scale greater than the scale defined in the plugin. CDAP writes warning messages in the pipeline log indicating the presence of Numbers with undefined precision and scale in the pipeline. For more information about setting precision and scale in a plugin, see Changing the precision and scale for decimal fields in the output schema.

PLUGIN-1325: In Wrangler, fixed an issue that caused the Wrangler UI to hang when a BigQuery table name contained characters besides alphanumeric characters and underscores (such as a dash). Now, Wrangler successfully imports BigQuery tables that comply with BigQuery naming conventions.

PLUGIN-826: In the HTTP batch source plugin, fixed an issue where validation failed when the URL property contained a macro and Pagination Type was set to Increment an index.

PLUGIN-1378: In the Dataplex Sink plugin, added a new property, Update Dataplex Metadata, which adds support for updating metadata in Dataplex for newly generated data.

PLUGIN-1374: Improved performance for batch pipelines with MySQL sinks.

PLUGIN-1333: Improved Kafka Producer Sink performance.

PLUGIN-664: In the Google Cloud Storage Delete Action plugin, added support for bulk deletion of files and folders. You can now use the (*) wildcard character to represent any character.

PLUGIN-641: In Wrangler, added the Average arithmetic function, which calculates the average of the selected columns.

In Wrangler, Numeric functions support 3 or more columns.

Security Fixes

The following vulnerabilities were found in open source libraries:

  • Arbitrary Code Execution
  • Deserialization of Untrusted Data
  • SQL Injection
  • Information Exposure
  • Hash Collision
  • Remote Code Execution (RCE)

To address these vulnerabilities, the following libraries have security fixes:

  • commons-collections:commons-collections (Deserialization of Untrusted Data). Upgraded to apply security fixes.
  • commons-fileupload:commons-fileupload (Arbitrary Code Execution). Upgraded to apply security fixes.
  • ch.qos.logback:logback-core (Arbitrary Code Execution). Upgraded to apply security fixes.
  • org.apache.hive:hive-jdbc (SQL Injection). Excluded org.apache.hive:hive-jdbc dependency
  • org.bouncycastle:bcprov-jdk16 (Hash Collision)
  • com.fasterxml.jackson.core:jackson-databind (Deserialization of Untrusted Data). Upgraded to apply security fixes.

Deprecations

CDAP-19559: For streaming pipelines, the Pipeline configuration properties Checkpointing and Checkpoint directory are deprecated. Setting these properties will no longer have any effect.

CDAP will decide automatically if checkpointing or CDAP internal state tracking is enabled. To disable at least once processing in streaming pipelines, you can set the runtime argument cdap.streaming.atleastonce.enabled. Both Spark checkpointing and state tracking will be disabled if this is set to false.​​

v6.7.2

1 year ago

Enhancements CDAP-19601: For new Dataproc compute profiles, changed the default value of Master Machine Type and Worker Machine Type from n2 to e2.

Bug Fixes CDAP-19532: Fixed an issue in the Database Batch Source plugin that caused pipelines to fail during runtime when there was a column with precision of 0 in the source returned by JDBC. Now, if a column has a precision of 0, the pipeline no longer fails. This affected CDAP 6.7.1 only. Note: In the Database Batch Source, if a column has precision 0, you must change the data type to Double in the Output Schema to ensure the pipeline runs successfully.

PLUGIN-1373: In the BigQuery Sink plugin (version 0.20.3), fixed an issue that sometimes caused a NullPointerException error when trying to update table metrics.

PLUGIN-1367: In the BigQuery Sink plugin (version 0.20.3), fixed an issue that caused a NullPointerException error when the output schema was not defined.

PLUGIN-1361: In the Send Email batch pipeline alert, fixed an issue where emails failed to send when the Protocol was set to TLS.