Csvmatch Versions Save

🔎 Finds fuzzy matches between CSV files

v2.0.1

3 months ago
  • Updates Textmatch dependency, including fix for case plus regex ignores not being combining correctly

v2.0

3 months ago
  • Ignore options are all now all behind a single flag, --ignore. For example, to ignore case and nonalpha characters you would previously have written --ignore-case --ignore-nonalpha, or -ia for short. Now you should write --ignore case nonalpha or -i c na. Some of the ignores have been renamed too -- see --help for details.
  • The option to ignore letter order has been removed, it wasn't useful.
  • Various deprecated aliases for ignore options have now been removed: --filter (use ignore regex), --filter-titles (use ignore titles), --as-latin (use ignore nonlatin), and --sort-words (use ignore words order).
  • The fuzzy method flag has been renamed from --fuzzy to --method. It now defaults to an exact match.
  • The short option for --threshold was previously -r, but it is now -t.
  • Input is no longer accepted from STDIN.
  • Output now defaults to including all fields from both input files, instead of only the fields used for the match.
  • Added option to ignore leading and tailing words.
  • Added support for blocking.
  • Many operations will be faster.
  • Improved remaining time display.

v1.24

1 year ago
  • Fixes a bug where the wrong version number was reported

v1.23

1 year ago

v1.22

2 years ago
  • Adds some extra titles
  • Metaphone now works by-word (split by space) rather than trying to encode the whole thing

v1.21

2 years ago
  • Updates dependencies

v1.20

4 years ago
  • Updates dependencies, fixing Python 3.8 compatibility
  • Other minor fixes

v1.19

4 years ago
  • Ignore parameter names are now more consistent (old names still work for now)
  • Parameter to ignore order of letters
  • Ignoring non-alphanumeric also ignores spaces now
  • Levenshtein should be faster on Unicode text now
  • Hopefully faster naive comparisons (ignores)
  • Upgraded dependencies
  • More tests
  • Removed Python 2 support!

v1.18

5 years ago
  • Fixes multi-field comparisons -- it's now only a match if all fields pass the threshold, rather than if any did
  • Allow multiple algorithms and multiple thresholds to be specified, which are interpreted respective to the specified fields
  • Prevents error messages when piping output
  • Updated dependencies

v1.17

5 years ago
  • Updated degree help text
  • Better encoding detection
  • Better encoding errors
  • Add common installation problems to readme