Data Prepper Versions Save

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.

2.8.0

3 weeks ago

2024-05-16 Version 2.8.0


Features

Enhancements

  • HTTP data chunking support for kafka buffer (#4475)
  • ENH: automatic credential refresh in kafka source (#4258)
  • Add creation and aggregation of dynamic S3 groups based on events (#4346)
  • Truncate Processor: Add support to truncate all fields in an event (#4317)
  • Provide validations of AWS accountIds (#4398)
  • Better metrics on OpenSearch document errors (#4344)
  • Better metrics for OpenSearch duplicate documents (#4343)
  • Address route and subpipeline for pipeline tranformation (#4528)
  • Add support for BigDecimal in ConvertType processor (#4316)
  • Checkpoint records at an interval for TPS case when AckSet is enabled (#4526)
  • Write stream events that timeout to write to internal buffer in separate thread (#4524)
  • Key value processor enhancements (#4521)
  • Add bucket owner support to s3 sink (#4504)
  • Initial work to support core data types in Data Prepper (#4496)
  • Changing logging level for config transformation and fixing rule (#4466)
  • Add folder-based partitioning for s3 scan source (#4455)
  • Pipeline Configuration Transformation (#4446)
  • Added support for multiple workers in S3 Scan Source (#4439)
  • Bootstrap the RuleEngine package (#4442)
  • Make s3 partition size configurable and add unit test for S3 partition creator classes (#4437)
  • Remove creating S3 prefix path partition upfront (#4432)
  • Change s3 sink client to async client (#4425)
  • Create new codec for each s3 group in s3 sink (#4410)
  • Validate the AWS account Id in the S3 source using a new annotation (#4400)
  • Add server connections metric to http and otel sources (#4393)
  • Log the User-Agent when Data Prepper shuts down from POST /shutdown (#4390)
  • Add aggregate_threshold with maximum_size to s3 sink (#4385)
  • Refactor PipelinesDataFlowModelParser to take in an InputStream instead of a file path (#4289)
  • Add support to use old ddb stream image for REMOVE events (#4275)

Bug Fixes

  • Fix count aggregation exemplar data (#4341)
  • Revert HTTP data chunking changes for kafka buffer done in PR 4266 (#4329)
  • Fix Router performance issue (#4327)
  • Do not require field_split_characters to not be empty for key_value processor (#4358)
  • Do not write empty lists of DlqObject to the DLQ (#4403)
  • Fix transient test failure for subpipelines (#4479)
  • Fix JacksonEvent to propagate ExternalOriginalTime if its set at the time of construction (#4489)
  • FIX: null certificate value should be valid in opensearch connection (#4494)
  • [BUG]Incorrect Behavior of Obfuscate Processor with Predefined Pattern "%{CREDIT_CARD_NUMBER}" (#4340)
  • [BUG] Empty DLQ entries when version conflicts occur (#4301)
  • [BUG] otel sources should show a more clear exception when receiving data that cannot be processed based on the configured compression type (#4022)
  • [BUG] : unable to set field_delimiter_regex (#2946)
  • Fix aggregate processor local mode (#4529)
  • Add long as a target type for convert_entry_type processor (#4120)
  • Fix write json basic test (#4527)
  • Fix depth field in template (#4509)
  • Fix for S3PartitionCreatorScheduler ConcurrentModification Exception (#4473)
  • Fix acknowledgements in DynamoDB (#4419)
  • Fix DocumentDB source S3PathPrefix null or empty (#4472)
  • Fix an issue that exception messages are masked (#4416)
  • Fix bug where using upsert or update without routing parameter caused… (#4397)
  • Fix bug in s3 sink dynamic bucket and catch invalid bucket message (#4413)
  • Fix flaky PipelineConfigurationFileReaderTest (#4386)
  • Aggregate Processor: local mode should work when there is no when condition (#4380)

Security

  • CVE-2024-22201 on http2-common 9.4.51 version - autoclosed (#4452)
  • CVE-2023-22102 (High) detected in mysql-connector-j-8.0.33.jar - autoclosed (#3920)

Maintenance

  • Gradle 8.7 (#4417)
  • Adds a Gradle convention plugin for Maven publication (#4421)
  • MAINT: allow latest schema version if not specified in confluent schema (#4453)
  • Publish expression and logstash-configuration to Maven (#4474)
  • Create unit test report as html (#4384)
  • Update Stream Ack Manager unit test and code refactor (#4383)
  • Grpc exception handler: Modified to return BADREQUEST for some internal errors (#4387)
  • Remove unexpected event handle message (#4388)
  • Bump parquet version to 1.14.0. (#4520)
  • Clear system property to disable s3 scan when stream worker exits, set s3 sink threshold to 15 seconds for docdb streams (#4522)
  • ExportPartitionWorkerTest testProcessPartitionSuccess(String) failure (#4298)
  • MAINT: inject external origination timestamp (#4507)
  • Updates Ameria to 1.28.2 (#4440)
  • MAINT: use authentication block in opensearch sink (#4438)
  • MAINT: use authentication for basic credentials in opensearch source (#4435)
  • MAINT: deprecate certificate_content with certificate_key (#4434)
  • MAINT: deprecate plaintext with plain under sasl in kafka (#4433)
  • MAINT: deprecate pipeline_configurations with extension (#4428)
  • Maint/renaming kafka source plugin setting (#4429)

2.7.0

2 months ago

2024-03-27 Version 2.7.0


Features

  • Add a GeoIP processor. (#253, #3941, #3942)
  • Flatten json processor (#4128)
  • Add select_entries processor (#4147)
  • Decompress processor (#4016)
  • Support parsing of XML fields in Events (#4165, #4024)
  • Processor for parsing Amazon Ion documents (#3730)
  • Append values to lists in an event (#4129)
  • MapToList processor (#3935)
  • Date processor to convert from epoch_second, epoch_milli, or epoch_nano (#2929, #4076)
  • Support reading of old image for delete events on DynamoDB source (#4261)
  • Add string truncate processor to the family of mutate string processor (#3925)
  • Add join function (#4075)

Enhancements

  • Support format expressions for routing in the opensearch sink (#3833)
  • Allow . and @ characters to be part of json pointer in expressions (#4130)
  • Support maximum request length configurations in the HTTP and OTel sources (#3931)
  • Provide a config option to do node local aggregation (#4306)
  • Allow peer forwarder to skip sending events to remote peer (#3996)
  • Include encrypted data key in Kafka buffer message. (#3655)
  • Support larger message sizes in Kafka Buffer (#3916)
  • Modify S3 Source to allow multiple SQS workers (#4239)
  • Add support for tracking performance of individual Events in the grok processor (#4196)
  • Support codec on the file source to help with testing (#4018)
  • Provide a delay processor to put a delay in the processor for debugging and testing (#3938)
  • Support ByteCount in plugin parser (#3191)
  • Add Buffer Latency Metric (#4237)
  • Adds an append mode to the file sink (#3687)

Bug Fixes

  • Attempting to evaluate if a key is null throws an Exception if the value is a List<String> for conditional expressions (#4109)
  • Data Prepper process threads stop when processors throw exceptions (#4103)
  • Upsert action requires existing document in OpenSearch (#4036)
  • Many Grok failures do not tag events (#4031)
  • Using update, upsert, or delete actions without specifying document_id crashes the pipeline with NPE (#3988)
  • OpenSearch Sink upsert action fails to create new document if it doesn't exist already (#3934)
  • DynamoDb source global state not found for export (#3579)
  • Missing Configuration details in Kafka documentation (#3157)
  • File Source fails to process large files. (#707)
  • Add key_value_when conditional to key_value processor (#4246)
  • Adds Kafka producer metrics for buffer usage (#4139)
  • Throw a more useful error when the S3 source is unable to determine bucket ownership (#4021)
  • Add sts_header_overrides to s3 dlq configuration (#3845)
  • Delay reading from the Kafka buffer as long as the circuit breaker is open (#4135)
  • Use timer for sink latency metrics (#4174)
  • Fix bug where process worker would shut down if a processor drops all events (#4262)
  • Send acknowledgements to source when events are forwarded to remote peer (#4305)
  • Injecting timestamp in index name that is not a suffix throws IllegalArgumentException (#3957)

Security

  • Fixes CVE-2024-29133 (#4314)
  • Fixes CVE-2024-29131 (#4313)
  • Fixes CVE-2023-52428 (#4296)
  • Fixes CVE-2024-23944 (#4290)
  • Fixes CVE-2023-51775 (#4282)
  • Fixes CVE-2024-22201 (#4186)
  • Fixes CVE-2024-25710 (#4164)
  • Fixes CVE-2024-26308 (#4163)
  • Fixes CVE-2024-21634 (#3926)
  • Fixes CVE-2023-50570 (#3870)
  • Fixes CVE-2023-3635 (#3068)

Maintenance

  • Create Kafka buffer integration tests for KMS (#3980, #4040)
  • Fixes Dependabot updates are not configured for all projects (#3301)

2.6.2

3 months ago

2024-02-19 Version 2.6.2


Enhancements

  • Add 4xx aggregate metric and shard progress metric for dynamodb source (#3913)

Bug Fixes

  • S3 Scan has potential to filter out objects with the same timestamp (#4123)
  • Kafka buffer attempts to create a topic when disabled (#4111)
  • Grok processor match requests continue after timeout (#4026)
  • Serialization error during peer-forwarding (#3981)
  • BlockingBuffer.bufferUsage metric does not include records in-flight (#3936)
  • Null Pointer Exception in Key Value Processor (#3928)
  • Incomplete route set leads to duplicates when E2E ack is enabled. (#3866)
  • Data Prepper is losing connections from S3 pool (#3809)
  • Key value processor will throw NPE if source key does not exist in the Event (#3496)
  • Exception in substitute string processor shuts down processor work but not pipeline (#2956)
  • Add 4xx aggregate metric and shard progress metric for dynamodb source (#3921)

Security

  • Fix GHSA-6g3j-p5g6-992f from OpenSearch jar (#3837)
  • Fix CVE-2023-41329 (Medium) detected in wiremock-3.0.1.jar (#3954)
  • Fix CVE-2023-51074 (Medium) detected in json-path-2.8.0.jar (#3919)
  • Fix CVE-2023-50572 (Medium) detected in jline-3.9.0.jar, jline-3.22.0.jar (#3871)
  • Require Mozilla Rhino 1.7.12 to fix SNYK-JAVA-ORGMOZILLA-1314295. (#3839)

2.6.1

6 months ago

2023-12-07 Version 2.6.1


Enhancements

  • Add aggregate metrics for ddb source export and stream (#3728)

Bug Fixes

  • Update and upsert bulk actions do not include changes from document_root_key, exclude_keys, etc. (#3745)
  • S3 source processes SQS notification when S3 folder is created (#3727)

Security

  • Fix CVE-2023-6378 and CVE-2023-6481 by updating logback to 1.4.14 (#3729, #3817)
  • Require nimbus-jose-jwt 9.37.1 to fix CVE-2021-31684 and CVE-2023-1370 (#3731)
  • Updates example analytics-service to Spring Boot 3.1.6 fixing CVE-2023-34055 (#3732)

2.6.0

6 months ago

2023-11-28 Version 2.6.0


Features

  • Support DynamoDB as a source. (#2932)
  • Use Kafka as a buffer (#3322)
  • Support dynamically changing the visibility timeout for S3 Source with SQS queue (#2485)
  • Create or update Amazon OpenSearch Serverless network policy (#3577)
  • Sink level metric for end to end latency (#3494)

Enhancements

  • Use Amazon Linux as base Docker image (#3505)
  • Allow the Kafka buffer (and others that do not require the heap) to bypass the heap circuit breaker (#3616)
  • Improve gRPC request exception logging (#3621)
  • Configure the delay in the random string source (#3601)
  • Add distribution_version flag to opensearch source (#3636)

Bug Fixes

  • Data Prepper is writing empty DLQ objects (#3644)
  • Bulk Operation Retry Strategy should print cause of error (#3504)
  • ISM index rollover actions fail because of missing setting for otel-v1-apm-span-* indices (#3506)
  • AWS opensearch source error: ElasticsearchVersionInfo.buildFlavor (#3640)
  • No permissions for writing to Amazon OpenSearch Serverless collection only shows errors after max_retries limit is reached (#3508)
  • Bulk Operation Retry Strategy should print cause of error (#3504)
  • NullPointer exception in DefaultKafkaClusterConfigSupplier get API (#3528)
  • Fix bug so global read-only items do not expire from TTL in DynamoDB source coordination store (#3703)
  • Check if failedDeleteCount is positive before logging an SQS error (#3686)
  • Docker image jre-jammy contains Berkeley DB (#3543)
  • Race condition in DefaultEventHandle (#3617)

Security

  • CVE-2023-44981 (Critical) detected in multiple libraries (#3491)
  • CVE-2023-36478 (High) detected in http2-hpack-11.0.12.jar, jetty-http-11.0.12.jar (#3490)
  • CVE-2023-4586 (High) detected in netty-handler-4.1.100.Final.jar (#3443)
  • CVE-2023-5072 (High) detected in json-20230618.jar (#3522)
  • CVE-2023-39410 (High) detected in avro-1.11.0.jar (#3430)
  • CVE-2023-4043 (High) detected in parsson-1.1.2.jar (#3588)
  • CVE-2023-46122 (High) detected in io_2.13-1.9.1.jar (#3547)
  • CVE-2023-46136 (High) detected in Werkzeug-2.2.3-py3-none-any.whl (#3552)
  • CVE-2023-26048 (Medium) detected in jetty-server-11.0.12.jar (#2533)
  • CVE-2023-26049 (Medium) detected in jetty-http-11.0.12.jar, jetty-server-11.0.12.jar (#2532)
  • CVE-2023-40167 (Medium) detected in jetty-http-11.0.12.jar (#3359)
  • CVE-2023-36479 (Medium) detected in jetty-servlets-11.0.12.jar (#3367)
  • WS-2023-0236 (Low) detected in jetty-xml-11.0.12.jar (#3072)

Maintenance

  • Update to the Gradle 8.x version which supports Java 21. Gradle 8.3 is supporting up to Java 20. (#3330)
  • Start building Data Prepper on Java 21 (#3329)
  • Integration tests to validate data going to OpenSearch (#3678)
  • Unit tests fail on Windows machine (#3459)
  • Fix disabled E2E ack integration tests in PipelinesWithAcksIT.java (#3472)
  • Remove the @Deprecated from Record (#3536)
  • Remove all unnecessary projects in the 2.6 branch (#3605)
  • Update end-to-end tests to run from the released Docker image (#3566)

2.5.0

8 months ago

2023-10-09 Version 2.5.0


Features

  • Support OpenSearch as source. (#1985)
  • Support translate processor. (#1914)
  • Support dissect processor. (#3362)
  • Support AWS secrets in pipeline and Data Prepper config as an experimental feature. (#2780)

Enhancements

  • Support update, upsert, delete bulk actions in OpenSearch sink. (#3109)
  • Support inline index templates in OpenSearch sink. (#3365)
  • Add retry to Kafka consumer in source. (#3399)
  • Support OpenTelemetry SeverityText for logs. (#3280)
  • Merging PipelineDataflowModel instead of pipeline YAML files. ([#3289]https://github.com/opensearch-project/data-prepper/issues/3289)
  • Support recursive feature in KeyValue processor. (#888)

Bug Fixes

  • Fix NullPointerException in S3 scan when bucket kay has null value. (#3316)
  • Fix a bug where S3 source does not stop on pipeline shutdown. (#3341)
  • Fix exemplar list in Histogram and Count aggregations. (#3364)

Security

  • Fix CVE-2023-44487, HTTP/2 reset floods. (#3474)
  • Fix CVE-2023-4586. (#3443)
  • Fix CVE-2023-39410. (#3430)

Maintenance

  • Build with Gradle 8. (#3287)
  • Remove sleep from Kafka source timeout test. (#3263)
  • Enable Gatling HTTPS support and path configuration. (#3308)
  • Support Gatling tests using AWS sigV4 signing. (#3311)
  • Support local ARM image build. (#3352)

2.4.1

8 months ago

2023-10-02 Version 2.4.1


Enhancements

  • Add support for fully async acknowledgments in source coordination. (#3391)

Bug Fixes

  • Fix NullPointerException in S3 scan partition supplier. (#3323)
  • Fix issue caused by InterruptedException in S3 source where source is polling after pipeline shutdown. (#3345)
  • Update trace analytics sample app to run again. (#3353)

Maintenance

  • Improve logging for failed documents in the OpenSearch sink. (#3389)
  • Update common-codec to 1.16.0 (#3370)
  • Update hibernate-validator to 8.0.1.Final. (#3369)
  • Update trace analytics sample app to run using the latest Spring Boot 3.1.3. (#3346)
  • Update Gradle to 8.3. (#3300)
  • Update grpcio from 1.50.0 to 1.53.0. (#3315)
  • Update certifi from 2022.12.7 to 2023.7.22. (#3314)
  • Update Bouncy Castle to 1.76. (#3307)
  • Reduce sleep times in BlockingBufferTests to speed up unit tests. (#3287)
  • Update checkstyle dependency to 10.12.3. (#3286)
  • Remove Maxmind license keys from test URLs. (#3285)
  • Remove unnecessary dependencies in the S3 sink and Parquet codecs. (#3283)

Security

  • Update armeria to 1.25.2 to fix CVE-2023-32732, CVE-2023-38493. (#3366)
  • Fix CVE fixes CVE-2022-36944, WS-2023-0116, CVE-2021-39194, CVE-2023-3635, CVE-2023-36479, CVE-2023-40167. (#3392)
  • Update commons-compress to 1.24.0 to fix CVE-2023-42503. (#3388)
  • Fix CVE-2022-45688, CVE-2023-43642. (#3409)

2.4.0

9 months ago

2023-08-28 Version 2.4.0


Features

  • Support Kafka as source. (#254)
  • Support source coordination. (#2412)
  • Support S3 scan capability in S3 source. (#1970)
  • Support ElasticSearch 6.8 in OpenSearch sink. (#3003)
  • Support custom index template in OpenSearch sink with ElasticSearch 6.8. (#3060)
  • Support filtering data in sink using include_keys and exclude_keys. (#2975)

Enhancements

  • Support generic sink codec structure for sinks. (#2403)
  • Support expressions in OpenSearch index and document ID. (#2864)
  • Support defining bucket ownership. (#2012)
  • Add exemplars to metrics generated in aggregate processor. (#3164)
  • Add cardinality key support in Anomaly detector processor using identification_keys. (#3073)
  • Support allow_duplicate_values in key-value processor. (#889)
  • Support remove_brackets in key-value processor. (#892)
  • Support exclude_keys in key-value processor. (#890)
  • Support default_keys in key-value processor. (#891)
  • Add GZip compression in S3 sink. (#3130)
  • Add Snappy compression in S3 sink. (#3154)
  • Update metric for ECF instance from counter to gauge and fix flaky tests caused by RCF variance. (#3145)
  • Support s3:// prefix in pipeline where s3 bucket names are used. (#3143)
  • Update circuit breaker configuration log message. (#3175)
  • Deprecate document_id_field in support of document_id. (#3074)

Bug Fixes

  • Fix race condition in sources using End-to-End Acknowledgments. (#3038)
  • Fix DLQ deserialization with create action. (#3040)
  • Fix IllegalArgumentException in csv processor when key does not exist. (#3053)
  • Handle RequestTimeoutException in push based sources correctly. (#3063)
  • Fix S3 sink writing to closed stream. (#3160)
  • Fix timestamp used in S3 sink to 24-hour format. (#3171)
  • Fix stale buffer data not being written to S3 sink. (#3187)
  • Fix IllegalArgumentException in convert entry processor. (#3135)
  • Fix ClassCastException in parse_json processor with OTel logs source. (#3184)
  • Fix UnexpectedTypeException in S3 select using NotBlank annotation. (#3208)

Maintenance

  • Update existing release workflow to trigger Data Prepper release jenkins job. (#2122)
  • Reduce test time which reduces build time. (#3019, #3020, #3021)
  • Updated GitHub actions to use Data Prepper in job titles. (#3104)
  • Add Apache commons-lang3 to gradle catalog. (#3120)
  • Add integration test coverage for ODFE 0.10.0 OpenSearch sink. (#3131)
  • Updated documentation process in developer guide. (#2772)
  • Fix flaky conditional routing test. (#3139)
  • Fix flaky unit tests. (#3150)
  • Add integration test for S3 sink. (#3179)
  • Update Data Prepper tar.gz to include JDK 17.0.8+7. (#3136)
  • Update S3 sink to speed up unit test time. (#3203)

2.3.2

10 months ago

2023-07-12 Version 2.3.2


Bug Fixes

  • Updated the release date (#2912)
  • Fix addTags API in EventMetadata (#2996)
  • Fix DLQ writer writing empty list (#2998)
  • Fix SqsWorker error messages (#3002)
  • Fix S3 errors around end of file behavior. (#3006)
  • Retry s3 reads on socket exceptions. (#3008)
  • Fix race condition in SqsWorker when acknowledgements are enabled (#3010)
  • Remove validation that made keys starting or ending with . - or _ invalid (#3007)

Security

  • Fix CVE-2023-35165, CVE-2023-34455, CVE-2023-34453, CVE-2023-34454, CVE-2023-2976 (#2952)
  • Fix bucket ownership validation. This was a regression introduced in Data Prepper 2.3. (#3011)

2.3.1

11 months ago

2023-06-20 Version 2.3.1


Enhancements

  • Add support for external ID when making STS AssumeRole call (#2862)
  • Remove sensitive from the error log for index name format failure in OpenSearch sink (#2894)

Bug Fixes

  • Fix errors when SQS notifications are from AWS EventBridge (#2861, #2789, #2788)
  • Fix concurrentModification in CredentialsCache (#2876)
  • Fix suppressed exception and added logs when incorrect compression is configures in S3 source (#2879, #2896)
  • Fix S3 sink metrics names as they are conflicting with S3 source (#2887)
  • Fix silent dropping data when index has null keys (#2885)

Security

  • Fix CVE-2017-1000487, CVE-2022-4244, CVE-2022-4245 (#2848)