Recordlinkage Versions Save

A powerful and modular toolkit for record linkage and duplicate detection in Python

v0.2

8 years ago
  • Full Python3 support
  • Update the parameters of the Logistic Regression Classifier manually. In literature, this is often denoted as the deterministic record linkage.
  • Expectation/Conditional Maxisation algorithm completely rewritten. The performance of the algorithm is much better now. The algorithm is still experimental.
  • New string comparison metrics: Q-gram string comparing and Cosine string comparing.
  • New indexing algorithm: Q-gram indexing.
  • Several internal tests.
  • Updated documenation.
  • BernoulliNBClassifier is now named NaiveBayesClassifier. No changes to the algorithm.
  • Arguments order in compare functions corrected.
  • Function to clean phone numbers
  • Return the result of the classifier as index, numpy array or pandas series.
  • Many bug fixes

v0.1.2

8 years ago

In the version are the following things added or changed:

  • Arguments in compare functions renamed.
  • Remove exact comparing of dataframes and add efficiency tricks for exact comparing.
  • Update documentation about comparing, classifying and evaluation.

v0.1.1

8 years ago

This update includes:

  • Updated documentation about indexing, comparing and classification
  • Improved performance for some indexing methods
  • Random indexing returns now exact number of record pairs
  • Argumens renamed in comparing functions

v0.1.0

8 years ago

The is the first big release of the record linkage package. See the documentation for information about the available functions. The framework needs to be extended with more functions, but there is a stable, easily extendable, framework to do that. More information how to do that is coming.