Recordlinkage Versions Save

A powerful and modular toolkit for record linkage and duplicate detection in Python

v0.16

9 months ago

A new release of recordlinkage after a long time (too long, I'm sorry). This release bumps the minor version to 0.16. This version supports pandas 2 and pandas 1. It doesn't contain any structural changes or improvements to the API.

What's Changed

Fix typo by @havardox in https://github.com/J535D165/recordlinkage/pull/184
Fix usage examples by @martinhohoff in https://github.com/J535D165/recordlinkage/pull/190
Fix links by @andyjessen in https://github.com/J535D165/recordlinkage/pull/186
add threshold None and label docstrings for String by @davidggphy in https://github.com/J535D165/recordlinkage/pull/189
Add support for pandas==2 by @J535D165 in https://github.com/J535D165/recordlinkage/pull/192
Replace setup.py by pyproject.toml by @J535D165 in https://github.com/J535D165/recordlinkage/pull/195
Lint with Ruff and format with Black by @J535D165 in https://github.com/J535D165/recordlinkage/pull/196
Update CI docs generation and CI pipeline by @J535D165 in https://github.com/J535D165/recordlinkage/pull/197
Update the docs CI pipeline by @J535D165 in https://github.com/J535D165/recordlinkage/pull/198
Add pre-commit hooks by @J535D165 in https://github.com/J535D165/recordlinkage/pull/199

New Contributors

@havardox made their first contribution in https://github.com/J535D165/recordlinkage/pull/184
@martinhohoff made their first contribution in https://github.com/J535D165/recordlinkage/pull/190
@andyjessen made their first contribution in https://github.com/J535D165/recordlinkage/pull/186
@davidggphy made their first contribution in https://github.com/J535D165/recordlinkage/pull/189

Full Changelog: https://github.com/J535D165/recordlinkage/compare/v0.15...v0.16

v0.15

2 years ago

Remove deprecated recordlinkage classes (#173)
Bump min Python version to 3.6, ideally 3.8+ (#171)
Bump min pandas version to >=1
Resolve deprecation warnings for numpy and pandas
Happy lint, sort imports, format code with yapf
Remove unnecessary np.sort in SNI algorithm (#141)
Fix bug for cosine and qgram string comparisons with threshold (#135)
Fix several typos in docs (#151)(#152)(#153)(#154)(#163)(#164)
Fix random indexer (#158)
Fix various deprecation warnings and broken docs build (#170)
Fix broken docs build due to pandas depr warnings (#169)
Fix broken build and removed warning messages (#168)
Update narrative
Replace Travis by Github Actions (#132)
Fix broken test NotFittedError
Fix bug in low memory random sampling and add more tests (#130)
Add extras_require to setup.py for deps management
Add banner to README and update title
Add Binder and Colab buttons at tutorials (#174)

Special thanks to Tomasz Waleń @twalen and other contributors for their work on this release.

v0.14

4 years ago

Drop Python 2.7 and Python 3.4 support. (#91)
Upgrade minimal pandas version to 0.23.
Simplify the use of all cpus in parallel mode. (#102)
Store large example datasets in user home folder or use environment variable. Before, example datasets were stored in the package. (see issue #42) (#92)
Add support to write and read annotation files for recordlinkage ANNOTATOR. See the docs and https://github.com/J535D165/recordlinkage-annotator for more information.
Replace .labels by .codes for pandas.MultiIndex objects for newer versions of pandas (>0.24). (#103)
Fix totals for pandas.MultiIndex input on confusion matrix and accuracy metrics. (see issue #84) (#109)
Initialize Compare with (a list of) features (Bug). (#124)
Various updates in relation to deprecation warnings in third-party libraries such as sklearn, pandas and networkx.

v0.13.2

5 years ago

Fix distribution problem.

v0.13

5 years ago

v0.11.2

6 years ago

Minor installation improvement. Exclude unwanted files

v0.11.1

6 years ago

Fix installation issue. Submodule 'preprocessing' was not added to the source distribution.

v0.11.0

6 years ago

The submodule 'standardise' is renamed. The new name is 'preprocessing'. The submodule 'standardise' will get deprecated in a next version.
Deprecation errors were not visible for many users. In this version, the errors are better visible.
Improved and new logs for indexing, comparing and classification.
Faster comparing of string variables. Thanks Joel Becker.
Changes make it possible to pickle Compare and Index objects. This makes it easier to run code in parallel. Tests were added to ensure that pickling remains possible.
Important change. MultiIndex objects with many record pairs were split into pieces to lower memory usage. In this version, this automatic splitting is removed. Please split the data yourself.
Integer indexing. Blog post will follow on this.
The metrics submodule has changed heavily. This will break with the previous version.
repr() and str() will return informative information for index and compare objects.
It is possible to use abbreviations for string similarity methods. For example 'jw' for the Jaro-Winkler method.
The FEBRL dataset loaders can now return the true links as a pandas.MultIndex for each FEBRL dataset. This option is disabled by default. See the FEBRL datasets for details.
Fix issue with automatic recognision of license on Github.
Various small improvements.

Note: In the next release, the Pairs class will get removed. Migrate now.

v0.10.1

6 years ago

print statement in the geo compare algorithm removed.
String, numeric and geo compare functions now raise directly when an incorrect algorithm name is passed.
Fix unit test that failed on Python 2.7.

v0.10.0

6 years ago

A new compare API. The new Compare class no longer takes the datasets and pairs as arguments. The actual computation is now performed when calling .compute(PAIRS, DF1, DF2). The documentation is updated as well, but still needs improvement.
Two new string similarity measures are added: Smith Waterman (smith_waterman) and Longest Common Substring (lcs). Thanks to Joel Becker and Jillian Anderson from the Networks Lab of the University of Waterloo.
Added and/or updated a large amount of unit tests.
Various small improvements.