Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
[Ko van der Sloot]
[Maarten van Gompel]
This release does not provide a shared library; use static linking instead.
[Maarten van Gompel]
[Ko van der Sloot]
[Maarten van Gompel]
[Ko van der Sloot]
Thanks to @kosloot, various warnings on clang were fixed in this minor release.
Implemented the ability to prune subsumed n-grams (retaining only the longer non-subsumed versions). Introduces a new PRUNESUBSUMED
variable for PatternModelOptions.
Note: This is an aggressive form of pruning that should also work for unordered models, matching is based on types rather than individual tokens (all subsumed types are pruned).
Bugfix release: Certain options from PatternModelOptions were not available to the python binding yet.
Bugfix release: Pattern size and category constraints were not working for several methods (getcooc/getleftcooc/getrightcooc/getleftneighbours/getrightneighbours) #44
Very minor update release:
Better handling of large patterns, PatternPointer size descriptor is now 64 bits (fixes #42) at cost of a small increase in memory consumption in various computations.
(The experimental and relatively unused PatternPointerModels are not backwards compatible, contact me if this is a problem)