Refinr Versions Save

Cluster and merge similar string values: an R implementation of Open Refine clustering algorithms

0.3.3

2 months ago

Updates

  • Docs patch, at the request of CRAN (#15)
  • Remove requirement on C++11. For more info, see here.

0.3.2

2 months ago

Bug Fix

  • Patch bug in if statement in which an array with length > 1 could possibly be passed in for logical evaluation.

0.3.1

5 years ago

IMPROVEMENTS

  • Package is now linking to the stringdist C API, and calling C functions in place of using stringdist::stringdistmatrix(). This change results in speed improvements in function n_gram_merge(), and requires that stringdist v0.9.5.1 or greater be installed.

0.3.0

6 years ago

NEW FEATURES

  • Rewrote some of the cpp functions to incorporate std::unordered_map(), resulting in a substantial speed improvement when passing large character vectors (length 100,000+) to either of the exported functions (#8).

PKG API CHANGES

  • In function n_gram_merge(), renamed arg edit_dist_weights to weight. The only purpose of this arg is to be passed along to function stringdistmatrix from the stringdist package (which uses the name weight, so this change is simply to match that).

BUG FIXES

  • Fixed issue in which input strings that contained accent marks were not being properly handled/clustered (#9). The fix involved adding stringi to Imports and using stringi::stri_trans_general().

  • Fixed issue in n_gram_merge() in which incorrect values were being return when input arg ignore_strings was not NULL, and arg bus_suffix = FALSE (#7).

  • Fixed issue in which input strings that contained punctuation that was NOT surrounded by spaces was returning incorrect values (#6).

  • Fixed issue in which the edit value assigned to a cluster was sometimes not the most frequent string in that cluster (#5).

0.2.0

6 years ago

0.1.0

6 years ago