Isarn Sketches Spark Versions Save

Routines and data structures for using isarn-sketches idiomatically in Apache Spark

v0.6.0

1 year ago
  • rebuild against spark 3.2

v0.5.2

3 years ago
  • adds additional fix (#25) for issue #22 Thanks to @JonathanTaws !

v0.5.1

3 years ago
  • fix #21
  • fix #22 Thanks to @JonathanTaws for issues and pull requests!

v0.5.0

3 years ago

This release is a substantial reorganization relative to previous releases, including two major changes:

  • Spark aggregators are now based on a faster java implementation of t-digest
  • The aggregators make use of improved user defined aggregation API starting in 3.0

Some of the object paths have changed to adapt to the Spark API. Refer to the examples on the main README page.

If you want to use the TDigest aggregations involving Data Frame schema with Array types, you will need Spark 3.0.1 or higher.

v0.4.0

3 years ago
  • removed python pre-compile step: python is now packaged as uncompiled ".py" files
    • python is currently working for python 2 and 3, however python 2 is EOL as of January 2020
  • updated sbt and sbt packages
  • removed python version from the package names: now only the spark version is part of the package version
  • updated README to reflect latest

v0.3.1

5 years ago
  • Fixes #12 and #13
  • Builds against scala 2.11 (scala 2.12 support anticipated for Apache Spark 2.4)
  • PySpark components built with python 2.7 and 3.6
  • sbt and sbt package deps upgraded to latest

v0.3.0

6 years ago

Feature Importance Pipelines

v0.2.0

6 years ago
  • pyspark support
  • publishing to sonatype / maven central
  • cross-publishing for spark and python versions
  • UDAFs for reducing t-digest columns and/or groups via groupBy

v0.1.0

6 years ago

Scala UDAFs for TDigest