A powerful and modular toolkit for record linkage and duplicate detection in Python
recordlinkage.Pairs(...)
) with all the functionality in it. The new API
is based on Tensorflow and FEBRL. With the new structure, it easier to
parallise the record linkage process. In future releases, this will be
implemented natively. See the reference page for more information and migrating. <http://recordlinkage.readthedocs.io/en/latest/ref-index.html>
_binary_comparisons
is renamed. The new name of the function
is binary_vectors
. Documentation added to RTD.autodoc_mock_imports
in the sphinx conf.py file. Maybe
a bug in sphinx.missing_values
is used to fill missing
values. Default: nothing is done. The argument shuffle
is used to
shuffle the records. Default is True.Compare
class and its algorithms. Making use
of nose-parameterized
module.max_number_of_pairs
to get the maximum number of pairs.low_memory
for compare class.binary_comparisons
in the recordlinkage.datasets.random
module.tox.ini
to test packaging and installation of package.Compare
module. Especially label handling is improved.This version includes the following updates:
__sub__
is no longer used for computing the difference of Index objects. It is now replaced by ``INDEX.difference(OTHER_INDEX).clean
function.This version contains a lot of changes to the API. Hopefully, there are no large API changes needed for now.
numerical
is now named numeric
and fuzzy
is now named string
.