Metric learning algorithms in Python
score_pairs
refactor by @mvargas33 in https://github.com/scikit-learn-contrib/metric-learn/pull/333
Full Changelog: https://github.com/scikit-learn-contrib/metric-learn/compare/v0.6.2...v0.7.0
This release uniformizes well version numbers that were mistaken in the previous release.
This release explicitly requires python>=3.6
and scikit-learn>=0.20.3
to install it.
This release features various fixes and improvements, as well as a new triplet-based algorithm, SCML (see http://researchers.lille.inria.fr/abellet/papers/aaai14.pdf), and an associated Triplets API. Triplets-based metric learning algorithms are used in settings where we have an "anchor" sample that we want to be closer with a "positive" sample than with a "negative" sample. Consistently with related packages like scikit-learn, we have also dropped support for Python 2 and Python 3.5.
New algorithms
General updates on the package
Improvements to existing algorithms
This is a major release in which the API (in particular for weakly-supervised algorithms) was largely refurbished in order to make it more unified and largely compatible with scikit-learn. Note that for this reason, you might encounter a significant amount of DeprecationWarning
and ChangedBehaviourWarning
. These warnings will disappear in version 0.6.0. The changes are summarized below:
All algorithms:
num_dims
to n_components
for algorithms that have such a parameter. (#193)metric()
method has been renamed into get_mahalanobis_matrix
(#152)score_pairs
to score a bunch of pair of points (return the distance between them), or get_metric
to get a metric function that can be plugged into scikit-learn estimators like any scipy distance.Weakly supervised algorithms
fit
weakly supervised algorithms, users now have to provide 3d arrays of tuples (and possibly an array of labels y
). For pairs learners, instead of X
and [a, b, c, d]
as before, we should have an array pairs
such that pairs[i] = X[a[k], b[k]]
if y[i] == 1
or X[c[k], d[k]] if y[i] != 1
, where k
is some integer (you can obtain such a representation by stacking horizontally a
and b
, then c
and d
, stacking these vertically, and taking X[this array of indices]). For quadruplets learners, one should have the same form of input, instead that there is no need for y
, and that the 3d array will be an array of 4-uples instead of 2-uples. The two first elements of each quadruplet are the ones that we want to be more similar to each other than the last two.predict
on a given pair or quadruplet, i.e. predict whether the pair is similar or not, or in the case of quadruplets, whether a given new quadruplet is in the right ordering or notset_threshold
and calibrated on some data with calibrate_threshold
.score
is defined, which is the AUC (Area under the ROC Curve). For quadruplets, the default score
is the accuracy (proportion of quadruplets given in the right order).Supervised algorithms
num_labeled
parameter (#119):use_pca
in LMNN (#231).Improved documentation:
Bug fixes:
num_dims
(renamed to n_components
, see above), it will now be checked to be between 1 and n_features
, with n_features
the number of dimensions of the input spaceCovariance
instead of the plain inverse, which allows to make Covariance
work even in the case where the covariance matrix is not invertible (e.g. if the data lies on a space of smaller dimension).(#206)L
was missing). This has been fixed in #201.Constraints are now managed with a unified interface (metric_learn.Constraints
), which makes it easy to generate various input formats from (possibly) partial label information.
All classes inheriting from BaseMetricLearner
now support sklearn-style get_params
and set_params
.
We now support Python 3 alongside Python 2 in the same codebase.
This minor release adds two new methods:
The performance of the non-Shogun LMNN implementation has also been improved, and it should now consume less memory.
This release also includes the new Sphinx documentation and improved docstrings for many of the classes and methods,