LanguageMachines Frog Versions Save

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

v0.33

3 weeks ago

--KANON was not always hounored
after a change in ucto, it was needed to reset the tokenizer more often
in the lemmatizer, fuzzy matching of CGN tags is implemented

v0.32

5 months ago

[Ko van der Sloot]

ignore but warn on empty derivations, a lame fix for https://github.com/LanguageMachines/frog/issues/103

v0.31

6 months ago

[Ko van der Sloot]

use ticcutils > 0.34. NFC normalizations is standard now
use Tokenizer::config_prefix() instead of magic string 'tokconfig-'
code cleanup and quality improvement (cppcheck is very useful)

[Maarten van Gompel]

added frog demo gif

v0.30

1 year ago

finally fixed a major memory-leak in MBMA which bothered me for months
also some minor leaks are plugged

v0.29

1 year ago

[Ko van der Sloot]

added a fix for https://github.com/LanguageMachines/frog/issues/100 (where Frog created invalid FoLiA in a corner-case)
improved api_test
small code refactoring
require libfolia >= 2.15, for correct working of word correction
improved MWU code. Using Unicode strings and detecting MWU's with a starting Capital.

[Maarten van Gompel]

.gitignore: added build dir

v0.28

1 year ago

Minor bugfix release:

[Ko van der Sloot]

We no longer accept FoLiA paragraphs with both Words and Sentences.

[Maarten van Gompel]

Software metadata fix

v0.27.2

1 year ago

[Maarten van Gompel]

Software metadata fix only, no functional changes

v0.27.1

1 year ago

[Maarten van Gompel]

Software metadata update only, no functional changes

v0.27

1 year ago

[Ko van der Sloot] Major Release. Internally we always perform a 'deep' morphological analysis. This information is used for XML and JSON output. For the 'classic' Tabbed output, we maintain backward comptability. You need to specify '--deep-morph' to get the deep analysis in the output. You may also specify '--compounds' to get an extra column with compound information.

Other changes:

C++ code quality
adapted to more recent Timbl implementations (Unicode awareness)
Tokenizer:
- Better handling of --languages option.
- 'und' is now also acceptable as a "language"
- Better debugging possibility
Mbma: To many alternatives with Inverted Verbs were generated. As the Tagger doesn't help us directly, we filter on the person of the next word, and only return V/te2I when the next word is 2-nd person

v0.26

1 year ago

[Ko van der Sloot]

fix for https://github.com/LanguageMachines/frog/issues/96
code improvements, readability and fixing CppCheck warnings
needs recent ticcutils (>=0.30)
needs newest Timbl (6.8) for more Unicode awarenes
updated GigHub action

[Maarten van Gompel]

added MAINTAINERS file
updated codemeta.json