Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
The only change from 2.7.2 is to update and build the project for Spark 2.4, Hadoop 3, and CDH 6.
This release fixes one important long-standing issue:
ALS app: java.lang.ClassCastException: java.lang.Object cannot be cast to java.lang.String https://github.com/OryxProject/oryx/issues/304
It was resolved by replacing Koloboke with Eclipse Collections. It's a non-trivial change but addresses a correctness issue, at the unfortunate cost of about 5% performance in the ALS app.
This release contains a single build change to resolve an important error from 2.7.0: https://github.com/OryxProject/oryx/issues/347
This release is almost entirely to support Spark 2.3, and to ensure that Kafka 2.x should work as well. There are no other notable changes.
Another quite minor release just for compatibility with Kafka. 2.5 remains available for 0.10.x.
This release is not notably different from 2.4.x, except that it's built for Spark 2.2.x and CDH 5.12.x, as well as the Kafka 0.10 parcel from Cloudera.
This release updates the Kafka client to 0.10.2.x, and contains a key fix for topics that are distributed across brokers.
The primary purpose of this release vs the 2.3.x branch is to add support for Kafka 0.10 and use its new APIs. It also requires Spark 2.1. Otherwise, there is little change from 2.3.x.
A minor bug fix and small improvement release.
See https://github.com/OryxProject/oryx/milestone/20?closed=1
The significant change from 2.2.x is that 2.3.x requires and supports Spark 2.x. Otherwise it includes, generally, minor bug fixes and small enhancements.
Note that from version 2.3.0, artifacts appear in the standard Maven repository, not the Cloudera repo. No need for custom repository declarations to access these now. See https://repo1.maven.org/maven2/com/cloudera/oryx
Closed issues: https://github.com/OryxProject/oryx/milestone/17?closed=1