Snowplow Versions Save

The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP

r102-afontova-gora

6 years ago

EmrEtlRunner stability improvements

EmrEtlRunner

  • Bump to 0.31.0 (#3679)
  • Add ability to skip load_manifest_check (#3680)
  • Add CI/CD to update AMI bootstrap scripts (#3683)
  • Add stream_config.yml.sample (#3685)
  • Add support for shredding from Kinesis S3 Loader's enriched event output (#3606)
  • Add bootstrap action to prepare AMI 5.x for Snowplow (#3601)
  • Recover from RestClient::ServiceUnavailable when making status checks (#3539)
  • Recover from RestClient::RequestTimeout when making status checks (#3468)
  • Launch bootstrap action for AMI 5.x (#3609)
  • Pass processing manifest config to RDB Shredder (#3619)
  • Fail fast in build script (#3684)
  • Fail fast on duplicated storage target id (#3652)
  • Do not rescue on Exception (#3577)

Redshift

  • Remove duplicate create events table comment (#3643)

r101-neapolis

6 years ago

Initial support for Google Cloud Platform

Scala Stream Collector

  • Bump to 0.13.0 (#3682)
  • Add Google Cloud PubSub sink (#3047)
  • Split into multiple artifacts according to targeted platform (#3621)
  • Expose number of requests over JMX (#3637)
  • Move cross domain configuration to enabled-style (#3556)
  • Truncate events exceeding the configured maximum size into a BadRow (#3587)
  • Remove string interpolation false positive warnings (#3623)
  • Update config.hocon.sample to support Google Cloud PubSub (#3049)
  • Customize useragent for GCP API calls (#3658)
  • Bump kafka-clients to 1.0.1 (#3660)
  • Bump aws-java-sdk to 1.11.290 (#3665)
  • Bump scala-common-enrich to 0.31.0 (#3666)
  • Bump SBT to 1.1.1 (#3629)
  • Bump sbt-assembly to 0.14.6 (#3667)
  • Use sbt-buildinfo (#3626)
  • Extend copyright notice to 2018 (#3687)

Stream Enrich

  • Bump to 0.15.0 (#3681)
  • Add Google Cloud PubSub source (#3150)
  • Add Google Cloud PubSub sink (#3149)
  • Split into multiple artifacts according to targeted platform (#3645)
  • Rename etl version from kinesis to stream-enrich (#3642)
  • Make source / sink configuration a coproduct (#3555)
  • Add ability to retrieve resolver and enrichments from Google Cloud Datastore (#3152)
  • Update config.hocon.sample to support Google Cloud PubSub (#3151)
  • Customize useragent for GCP API calls (#3193)
  • Bump kafka-clients to 1.0.1 (#3661)
  • Bump amazon-kinesis-client to 1.9.0 (#3663)
  • Bump aws-java-sdk to 1.11.290 (#3662)
  • Bump SBT to 1.1.1 (#3657)
  • Bump sbt-assembly to 0.14.6 (#3664)
  • Use sbt-buildinfo (#3627)
  • Extend copyright notice to 2018 (#3686)

Common

  • Install Ruby 2.4.3 before deploy (#3689)
  • Fix CHANGELOG entry for R97 (#3630)

r100-epidaurus

6 years ago

Adding first phase of our PII Enrichment

Scala Common Enrich

  • Add PII Enrichment (#3472)
  • Apply automated code formatting (#3532)
  • Bump commons-codec to 1.11 (#3638)
  • Bump to 0.31.0 (#3598)
  • Remove unused version member in Enrichment trait (#3541)
  • Use automated code formatting (#3496)

Stream Enrich

  • Bump scala-common-enrich to 0.31.0 (#3597)
  • Bump to 0.14.0 (#3596)
  • Use generated Settings for version in test (#3604)

Redshift

  • Widen se_label to 4,096 to support URLs etc (#196)
  • Widen sensitive columns in atomic.events to support pseudonymization (#3528)

r99-carnac

6 years ago

Google Analytics integration for the batch pipeline

Scala Common Enrich

  • Bump to 0.30.0 (#3562)
  • Add adapter for Google Analytics (#3560)
  • Extend copyright notice to 2018 (#3574)
  • Bump to 1.12.0 (#3565)

Spark Enrich

  • Bump scala-common-enrich to 0.30.0 (#3563)
  • Add tests for the Google Analytics adapter (#3561)
  • Extend copyright notice to 2018 (#3573)
  • Change Twitter repository url to https (#3593)

EmrEtlRunner

  • Update spark_enrich version in config.yml.sample to 1.12.0 (#3566)

Common

  • Extend copyright notice to 2018 in READMEs (#3575)

r98-argentomagus

6 years ago

Data quality and security improvements for the realtime pipeline as well as new features for the Scala Stream Collector

Scala Stream Collector

  • Bump to 0.12.0 (#3548)
  • Make Flash access domains and secure configurable (#2915)
  • Add URL redirect replacement macro (#3491)
  • Allow use of the originating scheme during cookie bounce (#3512)
  • Replace Location header with RawHeader to preserve double encoding (#3546)
  • Bump nsq-java-client to 1.2.0 (#3519)
  • Document the stdout sink better (#3515)
  • Fix stdout sink configuration (#3550)
  • Fix scaladoc for 'ipAndPartitionKey' (#3513)

Stream Enrich

  • Bump to 0.13.0 (#3549)
  • Bump scala-common-enrich to 0.29.0 (#3553)
  • Bump nsq-java-client to 1.2.0 (#3520)

Scala Common Enrich

  • Bump to 0.29.0 (#3552)
  • Add validation of tracker-sent timestamps (#336)
  • Add validation of collector_tstamp (#3416)

Redshift

  • Update version of atomic.events to 0.9.0 (#3517)

Common

  • Trigger the publishing of Stream Enrich when it is under test (#3557)

r97-knossos

6 years ago

Four new webhooks: Olark, StatusGator, Unbounce, Mailgun

EmrEtlRunner

  • Add ability to skip RDB Loader consistency check (#3529)
  • Bump to 0.30.0 (#3526)
  • Uncompress gzipped raw files when copying to HDFS (#3525)
  • Update spark_enrich version in config.yml.sample to 1.11.0 (#3002)

Scala Common Enrich

  • Add Adapter to pre-process Olark events (#1014)
  • Add adapter to pre-process Mailgun webhooks (#2734)
  • Add adapter to pre-process Statusgator webhooks (#2169)
  • Add adapter to pre-process Unbounce webhooks (#2615)
  • Add function to camelCase all JSON fields in Adaptor (#3538)
  • Bump user-agent-utils to 1.20 (#2930)
  • Default port to 443 if scheme is https (#3483)
  • Make enrichments.ExtractEventTypeSpec timezone-safe (#3481)
  • Remove toSecond parameter in Adapter (#3534)
  • Tolerate content type for GET requests sent to Clojure Collector (#2743)

Spark Enrich

  • Bump to 1.11.0 (#3533)
  • Add test for Mailgun Adapter (#2763)
  • Add test for Olark Adapter (#2792)
  • Add test for StatusGator Adapter (#2722)
  • Add test for Unbounce Adapter (#2745)
  • Fix tests that fail when running on an alternative iglu service (#3503)
  • Fix tests that fail with error when running on a platform that doesn't have native-lzo (#3508)
  • Improve error message in test to show index line (#3494)

Common

  • Reenable publishLocal in travis for spark enrich tests to pass (#3516)
  • Rename AWS deployment credentials in .travis.yml (close #3115)

r96-zeugma

6 years ago

Adding NSQ support to the real-time pipeline

Scala Stream Collector

  • Bump to 0.11.0 (#3433)
  • Update config.hocon.sample to support NSQ (#3294)
  • Add NSQ sink (#2093)
  • Make Kinesis, Kafka and NSQ config a coproduct (#3449)
  • Keep sending records when the Kinesis stream is resharding (#3453)

Stream Enrich

  • Bump to 0.12.0 (#3432)
  • Update config.hocon.sample to support NSQ (#3339)
  • Add NSQ sink (#3337)
  • Add NSQ source (#3336)

Common

  • Decorrelate CI/CD for Scala Stream Collector and Stream Enrich (#3441)

r95-ellora

6 years ago

ZSTD support

Redshift

  • Add migration script for 0.8.0 to 0.9.0 (#3440)
  • Widen domain_sessionidx column in atomic.events from smallint to integer (#1788)
  • Update atomic.events to use ZSTD compression (#3435)

EmrEtlRunner

  • Bump to 0.29.0 (#3469)
  • Reintroduce processing directory not empty no-op (#3458)
  • Retrieve the correct latest run ID during archive_shredded step (#3436)
  • Fix pagination issue when retrieving latest run id (#3434)
  • Update rdb_loader version in config.yml.sample to 0.14.0 (#3418)
  • Update rdb_shredder version in config.yml.sample to 0.13.0 (#3460)
  • Update spark_enrich version in config.yml.sample to 1.10.0 (#3461)
  • Bump AMI version in example config to 5.9.0 (#3465)
  • Force bundler 1.15.4 during CI/CD (#3493)

Spark Enrich

  • Overwrite output datasets (#3443)
  • Bump to 1.10.0 (#3428)
  • Add test for Cloudfront Sep 2016 (#3000)
  • Bump scala-common-enrich to 0.27.0 (#3427)
  • Bump Spark to 2.2.0 (#3466)

Scala Common Enrich

  • Bump to 0.27.0 (#3429)
  • Add support for new field in CloudFront access logs (#2933)

Web model

Misc

  • Add GCP mirror into config/iglu_resolver.json (#3430)
  • Replace example Postgres storage target configuration with 1-1-0 (#3463)
  • Replace example Redshift storage target configuration with 2-1-0 (#3462)

r94-hill-of-tara

6 years ago

Fixing the Stream Enrich data loss issue

Stream Enrich

  • Bump to 0.11.1 (#3454)
  • Keep sending records when the Kinesis stream is resharding (#3452)

r93-virunum

6 years ago

Realtime pipeline refresh

Scala Stream Collector

  • Bump to 0.10.0 (#3424)
  • Replace spray by akka-http (#3299)
  • Replace argot by scopt (#3298)
  • Add support for cookie bounce (#2697)
  • Allow raw query params (#3273)
  • Add support for the Chinese Kinesis endpoint (#3335)
  • Use the DefaultAWSCredentialsProviderChain for Kinesis Sink (#3245)
  • Use Kafka callback based API to detect failures to send messages (#3317)
  • Make Kafka sink more fault tolerant by allowing retries (#3367)
  • Fix incorrect property used for kafkaProducer.batch.size (#3173)
  • Configuration decoding with pureconfig (#3318)
  • Stop making the assembly jar executable (#3410)
  • Add config dependency (#3326)
  • Upgrade to Java 8 (#3328)
  • Bump Scala version to 2.11 (#3311)
  • Bump SBT to 0.13.16 (#3312)
  • Bump sbt-assembly to 0.14.5 (#3329)
  • Bump aws-java-sdk-kinesis to 1.11 (#3310)
  • Bump kafka-clients to 0.10.2.1 (#3325)
  • Bump scala-common-enrich to 0.26.0 (#3305)
  • Bump iglu-scala-client to 0.5.0 (#3309)
  • Bump specs2-core to 3.9.4 (#3308)
  • Bump scalaz-core to 7.0.9 (#3307)
  • Bump joda-time to 2.9 (#3323)
  • Remove commons-codec dependency (#3324)
  • Remove snowplow-thrift-raw-event dependency (#3306)
  • Remove joda-convert dependency (#3304)
  • Remove mimepull dependency (#3302)
  • Remove scalazon dependency (#3300)
  • Run the unit tests systematically in Travis (#3409)

Stream Enrich

  • Bump to 0.11.0 (#3425)
  • Support AT_TIMESTAMP as initial position (#3360)
  • Add ability to force re-download IP lookup databases on reboot (#3159)
  • Add support for the Chinese Kinesis and DynamoDB endpoints (#3344)
  • Replace argot by scopt (#3345)
  • Use Kafka callback based API to detect failures to send messages (#2974)
  • Make Kafka sink more fault tolerant by allowing retries (#2973)
  • Make partition key for enriched event stream user-configurable (#1924)
  • Fix incorrect property used for kafkaProducer.batch.size (#3380)
  • Flush Kafka producer (#3342)
  • Configuration decoding with pureconfig (#3394)
  • Stop catching fatal errors (#1455)
  • Stop making the assembly jar executable (#3411)
  • Change package name (#3340)
  • Add commons-codec dependency (#3349)
  • Add json4s dependency (#3348)
  • Upgrade to Java 8 (#3392)
  • Bump Scala version to 2.11 (#3388)
  • Bump SBT to 0.13.16 (#3382)
  • Bump sbt-assembly to 0.14.5 (#3391)
  • Bump kafka-clients to 0.10.2.1 (#3413)
  • Bump config to 1.3.1 (#3412)
  • Bump iglu-scala-client to 0.5.0 (#3387)
  • Bump scalacheck to 1.11.3 (#3386)
  • Bump scala-common-enrich to 0.26.0 (#3385)
  • Bump specs2 to 2.3.13 (#3383)
  • Bump scalaz-core to 7.0.9 (#3381)
  • Bump amazon-kinesis-client to 1.8.1 (#3379)
  • Bump aws-java-sdk to 1.11 (#3377)
  • Remove scalaz-specs2 dependency (#3347)
  • Remove scalazon dependency (#3341)
  • Remove unused dependencies (#3346)
  • Run the unit tests systematically in Travis (#3408)

Scala Common Enrich

  • Bump to 0.26.0 (#3333)
  • Drop Scala 2.10 (#3285)
  • Replace akka-http with scalaj (#3330)
  • Bump scala-uri to 0.5.0 (#2893)
  • Bump scala-weather to 0.3.0 (#3334)

Kinesis ElasticSearch Sink