Giotto Ph Versions Save

High performance implementation of Vietoris-Rips persistence.

v0.2.2

1 year ago

Minor release bringing bug fixes, performance improvements, wheels for Apple Silicon, and EOS for Python 3.6.

Major Features and Improvements

  • The dense matrix C++ backend has been extended to allow for nonzero vertex weights. This can lead to large speedups when computing weighted Rips filtrations (#61).
  • The binary search routine to find the largest-indexed vertex in a simplex [get_max_vertex in the C++ backend, as in Ripser) has been replaced with a faster floating-point routine in the case of 1-simplices (edges). This still gives exact results for all cases of interest, and can be substantially faster (#38).
  • Wheels for Apple Silicon are now available for Python versions 3.8, 3.9 and 3.10 (#62).

Bug Fixes

  • Bars in the barcode with death at numpy.inf are now explicitly treated as essential bars instead of finite bars (#53).

Backwards-Incompatible Changes

  • Python 3.6 is no longer supported, and the manylinux standard has been bumped from manylinux2010 to manylinux2014 (#62).

Thanks to our Contributors

This release contains contributions from many people:

Umberto Lupo and Julian Burella Pérez.

We are also grateful to all who filed issues or helped resolve them, asked and answered questions, and were part of inspiring discussions.

v0.2.1

2 years ago

Minor release bringing bug fixes and performance improvements.

Major Features and Improvements

  • Miscellaneous improvements including verification that coeff is a prime, improvements to the Python bindings (see "Backwards-Incompatible Changes" below), updated C++ preprocessor directives and code style fixes (#50).
  • An unnecessary binary search for 0-dimensional simplices is now avoided in the C++ backend, leading to faster runtimes (#51).
  • scikit-learn's NearestNeighbors is now used for computing sparse thresholded distance matrices, leading to large benefits on memory consumption and runtime in many cases of interest (#54).
  • General improvements to the documentation (#54 and #58).

Bug Fixes

  • A bug in computing sparse thresholded distance matrices has been fixed (#54).

Backwards-Incompatible Changes

  • Barcodes are now returned as (lists of) arrays of dtype numpy.float32 instead of numpy.float64, since single-precision floats are used internally by the C++ backend (#50).

Thanks to our Contributors

This release contains contributions from many people:

Julian Burella Pérez and Umberto Lupo.

We are also grateful to all who filed issues or helped resolve them, asked and answered questions, and were part of inspiring discussions.

v0.2.0

2 years ago

Major Features and Improvements

  • ripser_parallel can now return "flag persistence generators", i.e. vertices and edges creating/destroying bars in the barcode (#29). A new Jupyter notebook illustrates usage of this new feature.
  • Wheels for Python 3.10 have been added (#47).
  • The computation of the enclosing radius has been sped up (#28).
  • ripser_parallel can now be imported directly from gph (#36).
  • A garbage collection step which was found to negatively impact runtimes with little memory benefit has been removed (#33).

Bug Fixes

  • Sparse input with (signed) 64-bit row and column indices is now correctly dealt with by the Python interface (#34).
  • A bug causing segfaults to occur when maxdim=0 was passed to the C++ backend has been fixed (#40).
  • An algorithmic error in dealing with edges with zero weight in the 0-dimensional computation has been fixed (#43).

Backwards-Incompatible Changes

None.

Thanks to our Contributors

This release contains contributions from many people:

Umberto Lupo and Julian Burella Pérez.

We are also grateful to all who filed issues or helped resolve them, asked and answered questions, and were part of inspiring discussions.

v0.1.0

2 years ago

Major Features and Improvements

Introduction

We introduce giotto-ph, a high-performance, open-source software package for the computation of Vietoris--Rips barcodes. giotto-ph is based on Morozov and Nigmetov's lockfree (multicore) implementation of Ulrich Bauer's Ripser package. It also contains a re-implementation of Boissonnat and Pritam's "Edge Collapser", implemented so far only in the GUDHI library. Our contribution is twofold: on the one hand, we integrate existing state-of-the-art ideas coherently in a single library and provide Python bindings to the C++ code. On the other hand, we increase parallelization opportunities and improve overall performance by adopting higher performance data structures. The final implementation of our persistent homology backend establishes a new state of the art, surpassing even GPU-accelerated implementations such as Ripser++ when using as few as 5-10 CPU cores. Furthermore, our implementation of the edge collapser algorithm has reduced dependencies and significantly improved run-times.

Python API

There is one unique main function that can be called: ripser_parallel

To call the main function in python, simply do:

from gph.python import ripser_parallel
import numpy as np
# generate your data
data = np.random.rand(100, 3)
# compute the persistence diagram
dgm = ripser_parallel(data)

Here a description of the different parameters of this function:

Compute persistence diagrams for X data array using a high performance, parallel version of Ripser. If X is a point cloud, it will be converted to a distance matrix using the chosen metric.

  • Xndarray of shape (n_samples, n_features) A numpy array of either data or distance matrix. Can also be a sparse distance matrix of type scipy.sparse

  • maxdim int, optional, default: 1 Maximum homology dimension computed. Will compute all dimensions lower than or equal to this value. For 1, both H_0 and H_1 will be computed.

  • thresh float, optional, default: numpy.inf Maximum distances considered when constructing filtration. If numpy.inf, compute the entire filtration.

  • coeff int prime, optional, default: 2 Compute homology with coefficients in the prime field Z/pZ for p=coeff.

  • metric string or callable, optional, default: 'euclidean' The metric to use when calculating distance between instances in a feature array. If set to 'precomputed', input data is interpreted as a distance matrix or of adjacency matrices of a weighted undirected graph. If a string, it must be one of the options allowed by scipy.spatial.distance.pdist() for its metric parameter, or a or a metric listed in sklearn.pairwise.PAIRWISE_DISTANCE_FUNCTIONS, including 'euclidean', 'manhattan' or 'cosine'. If a callable, it should take pairs of vectors (1D arrays) as input and, for each two vectors in a pair, it should return a scalar indicating the distance/dissimilarity between them.

  • metric_params dict, optional, default: {} Additional parameters to be passed to the distance function.

  • weights "DTM", ndarray or None, optional, default: None If not None, the persistence of a weighted Vietoris-Rips filtration is computed as described in 3, and this parameter determines the vertex weights in the modified adjacency matrix. "DTM" denotes the empirical distance-to-measure function.

  • weight_params dict or None, optional, default: None Parameters to be used in the case of weighted filtrations, see weights. In this case, the key "p" determines the power to be used in computing edge weights from vertex weights. It can be one of 1, 2 or np.inf and defaults to 1. If weights is "DTM", the additional keys "r" (default: 2) and "n_neighbors" (default: 3) are available (see weights, where the latter corresponds to n).

  • collapse_edges bool, optional, default: False Whether to use the edge collapse algorithm as described in 2 prior to calling ripser_parallel.

  • n_threads int, optional, default: 1 Maximum number of threads available to use during persistent homology computation. When passing -1, it will try to use the maximal number of threads available on the host machine.

For more details about the performance and benchmarks, you can have a look here

Bug Fixes

None.

Backwards-Incompatible Changes

None.

Thanks to our Contributors

This release contains contributions from many people:

Julián Burella Pérez, Sydney Hauke, Umberto Lupo, Matteo Caorsi.

We are also grateful to all of those who filed issues or helped resolve them, asked and answered questions, and were part of inspiring discussions.