Colibri Core Versions Save

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

v2.4.10

5 years ago

Important bugfix release:

  • Fixes data-clipping bug on loading large corpora in memory (used by indexed patternmodels) #41

(All users are urged to upgrade!)

v2.4.9

6 years ago
  • Added metadata
  • macOS fix

v2.4.8

6 years ago
  • Minor update: made setup.py more robust for manual installation mode (without compiling C++ lib) (v2.4.7 was skipped)

v2.4.6

6 years ago
  • fix: colibri-classencode -t (threshold) behaviour was wrong (was interpreted as +1)

v2.4.5

7 years ago
  • Refactored alignment model
  • added BasicPatternAlignmentModel
  • Major cleanup of warnings and possible issues (thanks to @kosloot)

v2.4.4

7 years ago
  • Bugfix: fixes covered token count per category/n (issue #26)
  • New feature: colibri-patternmodeller has a--simplereport (-r) option that generates a report without coverage information (more limited but a lot faster)

v2.4.3

7 years ago

v2.4.2 was prematurely released, one minor test was corrupt. Fixed now in this release.

v2.4.2

7 years ago

Bugfix release, fixes issue #25

v2.4.1

7 years ago

Minor fix release prior to paper publication:

  • Python 2.7 compatibility fix
  • Updated python tutorial
  • Added benchmarks

v2.4.0

7 years ago

Various fixes:

  • Speed up in ngrams() computation (issue #21)
  • Performance fix for processing long lines
  • Pattern.instanceof()should be faster and is now available from Python too
  • Attempt to fix compilation issue on certain platforms (issue #22), unconfirmed

New features:

  • Implemented new filtering mechanism that supports actively checking whether patterns are instances of a limited set of specified skipgrams, or a superset of specified ngrams.
  • Implemented ignorenewlines option in class encoding. Useful if you have source text split by for instance sentences (one per line), but want a model that crosses sentence boundaries.
  • Implemented vocabulary import for the class encoding stage (issue #2)