SparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
The scalable topological data analysis package for Apache Spark. This project aims to implement the following features:
If you would like to know how to use and/or learn more the implementation details of the above mentioned features, please follow the links.
WIP and EXPERIMENTAL. This package is still a proof-of-concept of scalable topological data analysis support for Apache Spark, hence you cannot expect that this package is ready for production use.
2-skeltons of Reeb Diagram of MNIST (40 intervals on the 1st primcipal component with 50% overlap) | 2-skeltons of Reeb Diagram of MNIST (20 intervals on the 1st primcipal component with 50% overlap) |
---|---|
60k images clustered in 784 dimensions without any projection loss | 60k images clustered in 784 dimensions witout any projection loss |
This library requires Spark 2.0+
To compile this project, run sbt package
from the project home directory. This will also run the Scala unit tests.
To run the unit tests, run sbt test
from the project home directory. This project uses the
sbt-spark-package plugin, which provides the 'spPublish' and
'spPublishLocal' task. We recommend users to use this library with Apache Spark including the dependencies by
supplying a comma-delimited list of Maven coordinates with --packages
and download the package from the locally
repository or official Spark Packages repository.
$ sbt spPublishLocal
$ sbt spPublish
This package can be added to Spark using the --packages
command line option. For example, to include it when starting
the spark shell:
$ spark-shell --packages ognis1205:spark-tda:0.0.1-SNAPSHOT-spark2.2-s_2.11