Colibri Core Versions Save

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

v2.4.10

5 years ago

Important bugfix release:

Fixes data-clipping bug on loading large corpora in memory (used by indexed patternmodels) #41

(All users are urged to upgrade!)

v2.4.9

6 years ago

Added metadata
macOS fix

v2.4.8

6 years ago

Minor update: made setup.py more robust for manual installation mode (without compiling C++ lib) (v2.4.7 was skipped)

v2.4.6

6 years ago

fix: colibri-classencode -t (threshold) behaviour was wrong (was interpreted as +1)

v2.4.5

7 years ago

Refactored alignment model
added BasicPatternAlignmentModel
Major cleanup of warnings and possible issues (thanks to @kosloot)

v2.4.4

7 years ago

Bugfix: fixes covered token count per category/n (issue #26)
New feature: colibri-patternmodeller has a--simplereport (-r) option that generates a report without coverage information (more limited but a lot faster)

v2.4.3

7 years ago

v2.4.2 was prematurely released, one minor test was corrupt. Fixed now in this release.

v2.4.2

7 years ago

Bugfix release, fixes issue #25

v2.4.1

7 years ago

Minor fix release prior to paper publication:

Python 2.7 compatibility fix
Updated python tutorial
Added benchmarks

v2.4.0

7 years ago

Various fixes:

Speed up in ngrams() computation (issue #21)
Performance fix for processing long lines
Pattern.instanceof()should be faster and is now available from Python too
Attempt to fix compilation issue on certain platforms (issue #22), unconfirmed

New features:

Implemented new filtering mechanism that supports actively checking whether patterns are instances of a limited set of specified skipgrams, or a superset of specified ngrams.
Implemented ignorenewlines option in class encoding. Useful if you have source text split by for instance sentences (one per line), but want a model that crosses sentence boundaries.
Implemented vocabulary import for the class encoding stage (issue #2)