Spaczz Versions Save

Fuzzy matching and more functionality for spaCy.

v0.6.1

3 months ago

What’s Changed

:beetle: Fixes

Updating readthedocs config (#86) @gandersen101
Partial regex matcher doesn't work if the found token has index 0 (#82) @adinowi

:rotating_light: Testing

Adding Test for Partial Regex Search at 0 Index (#85) @gandersen101
Updating Dependencies to Test Against (#83) @gandersen101

:construction_worker: Continuous Integration

Updating GH action versions (#84) @gandersen101
Updating Dependencies to Test Against (#83) @gandersen101

:books: Documentation

Updating readthedocs config (#86) @gandersen101

v0.6.0

1 year ago

Returning the matching pattern for all matchers, this is a breaking change as matches are now tuples of length 5 instead of 4.
Regex and token matches now return match ratios.
Support for python<=3.11,>=3.7, along with rapidfuzz>=1.0.0.
Dropped support for spaCy v2. Sorry to do this without a deprecation cycle, but I stepped away from this project for a long time.
Removed support of "spaczz_" preprended optional SpaczzRuler init arguments. Also, sorry to do this without a deprecation cycle.
Matcher.pipe methods, which were deprecated, are now removed.
spaczz_span custom attribute, which was deprecated, is now removed.

v0.5.4

2 years ago

What’s Changed

BugFix for german Combination words for RegexSearcher (#66) @JonasHablitzel
Including flake8 plugins in pre-commit (#63) @gandersen101

:books: Documentation

Updating available fuzzyfuncs in docs (#62) @gandersen101

v0.5.3

3 years ago

Fixed a "bug" in the TokenMatcher. Spaczz expects token matches returned in order of ascending match start, then descending match length. However, spaCy's Matcher does not return matches in this order by default. Added a sort in the TokenMatcher to ensure this.

v0.5.2

3 years ago

Minor updates to pre-commits and noxfile.

v0.5.1

3 years ago

Minor updates to allowed dependency versions and CI.
Switched back to using typing types instead of generic types because spaCy v3 uses Pydantic and Pydantic does not support generic types in Python < 3.9. I don't know if this would actually cause any issues but I am playing it safe. Potentially more changes for spaczz to play nicely with Pydantic to follow.

v0.5.0

3 years ago

What’s Changed

:rocket: Features

Enhancement spacy3 support (#52) @gandersen101
- Support for spaCy v3.
- If using spaCy v3, the SpaczzRuler optional arguments no longer need to be prepended with "spaczz_". This will still work in most cases offering some backwards compatibility. However, optional arguments prepended with "spaczz_" will not work with spaCy v3's new spacy.load and nlp.add_pipe config driven APIs. It is therefore recommended that users move away from using the prepended versions if using spaCy v3. It should be noted however that the prepended arguments are still necessary if using spaczz with spaCy v2.
- Matcher.pipe methods are now deprecated in accordance with spaCy v3.
- spaczz_span custom attribute is deprecated in favor of spaczz_ent. They both have the same functionality but the -spaczz_ent name makes more sense.

0.4.2

3 years ago

Fixed a bug where TokenMatcher callbacks did nothing.
Fixed a bug where spaczz_token_defaults in the SpaczzRuler did nothing.
Fixed a bug where defaults would not be added to their respective matchers when loading from bytes/disk in the SpaczzRuler.
Fixed some inconsistencies in the SpaczzRuler which will be particularly noticeable with ent_ids. See the "Known Issues" section below for more details.
Small tweaks to spaczz custom attributes.
Available fuzzy matching functions have changed in RapidFuzz and have changed in spaczz accordingly.
Preparing for spaCy v3 updates.

0.4.1

3 years ago

Spaczz's phrase searching algorithm has been further optimized so both the FuzzyMatcher and SimilarityMatcher should run considerably faster.
The FuzzyMatcher and SimilarityMatcher now include a thresh parameter that defaults to 100. When matching, if flex > 0 and the match ratio is >= thresh during the initial scan of the document, no optimization will be attempted. By default perfect matches don't need to be run through match optimization.
flex now defaults to len(pattern) // 2. This creates more meaningful difference between "default" and "max" with longer patterns.
PEP585 code updates.

v0.4.0

3 years ago

Adds the TokenMatcher to spaczz and integrates it with the SpaczzRuler. Also overhauls spaczz's custom attributes and includes some quality of life improvements and bug fixes.