Divolte Collector
This release contains the following changes relative to 0.8.0:
Mappings can now hash data in various ways. Basic and keyed hashing are supported using the builtin JDK algorithms.
Events published to Kafka now have their timestamps set to the time the event arrived on the divolte server.
Events published to Google Pub/Sub have additional meta-data to help with downstream processing:
timestamp
is the time the event arrived at the server.eventIdentifier
is the event ID (as generated by the client).A bug was fixed that affecting mapping custom parameter values to Avro enumeration.
A bug has been fixed where tildes (~
) weren't properly handled inside custom event parameters.
A bug has been fixed where the version of avro-tools
that we shipped didn't work properly due to a missing file.
The usual dependency updates, of which the most notable is upgrading from Hadoop 2.9 to 3.1.
This release contains the following changes relative to 0.7.0:
Improved shutdown/load-balancer integration. A new setting (divolte.global.server.shutdown_delay
) allows for a grace period when shutting down. During this period the health-check will fail but the server continues processing requests normally. This should prompt load balancers to remove the endpoint before requests start failing.
A bug fix for the processing pool configuration for Google Cloud Storage. (Previously the configured buffer size and thread-count were ignored and the values for HDFS used instead.)
Improvements to the way values from headers can be extracted during mapping. In particular these should make it easier to map the client's IP address when multiple load-balancer layers are in place. Improvements include:
.first()
and .last()
methods, a new .get(x)
method can be used to obtain the value at a specific index. A negative index can be used to retrieve the value relative to the end of the list.The main changes in this release relative to 0.6.0 are:
The main changes in this release relative to 0.3.0 are:
This release also introduces a new configuration format:
There are now 4 main sections:
global
: for settings that affects the entire server instance. This includes server binding settings, ip2geo configuration, HDFS and Kafka configuration, and thread settings for the various phases of event processing.
sources: the browser and JSON endpoints that events can be received on.sinks
: which HDFS directories and Kafka topics Avro data should be written to.mappings
: which sources should be connected to which sinks, and how received events should be converted to Avro records.Sources are now more configurable. The endpoint paths can now be customised.
Kafka now requires different settings because we're using the new producer instead of the old one. The biggest change is that bootstrap.servers
should be used instead of metadata.broker.list
; see the Kafka documentation for more details.
The HDFS session-binning strategy for writing files has been removed.
The maximum 'pause' time for an internal thread to wait when queuing an event for the next stage of processing has been removed. (This used to be the max_enqueue_delay
setting.) Now we drop messages immediately. In practice queues are either full or empty, and full means there's a problem which delaying isn't going to help with. In fact, it turned out that being full and waiting leads to cascading failures and problems such as thread starvation in the HTTP server.
This is a bug-fix release that includes the following fixes relative to 0.4.0:
javascript.name
property for browser sources was implemented incorrectly; only the default value worked.This release includes the following changes relative to 0.4.1:
whenCommitted()
call to register a callback that will be invoked when it is safe to leave the page without dropping pending events that have been signalled but not delivered to the server. This is intended to make it easier to signal events for click-throughs and click-outs.pageView
events are implicitly signalled when navigating forwards and backwards through browser history is now consistent across all the browsers that we test against. Previously the events did not always fire, and page-view identifiers were sometimes reused.Behind the scenes a lot of work went into improving the stability of the automated browser testing that we do. In addition to making the tests more stable we expanded the set of browsers that we test against.