LanguageMachines Frog Versions Save

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

v0.33

3 weeks ago
  • --KANON was not always hounored
  • after a change in ucto, it was needed to reset the tokenizer more often
  • in the lemmatizer, fuzzy matching of CGN tags is implemented

v0.32

5 months ago

[Ko van der Sloot]

v0.31

6 months ago

[Ko van der Sloot]

  • use ticcutils > 0.34. NFC normalizations is standard now
  • use Tokenizer::config_prefix() instead of magic string 'tokconfig-'
  • code cleanup and quality improvement (cppcheck is very useful)

[Maarten van Gompel]

  • added frog demo gif

v0.30

1 year ago
  • finally fixed a major memory-leak in MBMA which bothered me for months
  • also some minor leaks are plugged

v0.29

1 year ago

[Ko van der Sloot]

  • added a fix for https://github.com/LanguageMachines/frog/issues/100 (where Frog created invalid FoLiA in a corner-case)
  • improved api_test
  • small code refactoring
  • require libfolia >= 2.15, for correct working of word correction
  • improved MWU code. Using Unicode strings and detecting MWU's with a starting Capital.

[Maarten van Gompel]

  • .gitignore: added build dir

v0.28

1 year ago

Minor bugfix release:

[Ko van der Sloot]

  • We no longer accept FoLiA paragraphs with both Words and Sentences.

[Maarten van Gompel]

  • Software metadata fix

v0.27.2

1 year ago

[Maarten van Gompel]

  • Software metadata fix only, no functional changes

v0.27.1

1 year ago

[Maarten van Gompel]

  • Software metadata update only, no functional changes

v0.27

1 year ago

[Ko van der Sloot] Major Release. Internally we always perform a 'deep' morphological analysis. This information is used for XML and JSON output. For the 'classic' Tabbed output, we maintain backward comptability. You need to specify '--deep-morph' to get the deep analysis in the output. You may also specify '--compounds' to get an extra column with compound information.

Other changes:

  • C++ code quality
  • adapted to more recent Timbl implementations (Unicode awareness)
  • Tokenizer:
    • Better handling of --languages option.
    • 'und' is now also acceptable as a "language"
    • Better debugging possibility
  • Mbma: To many alternatives with Inverted Verbs were generated. As the Tagger doesn't help us directly, we filter on the person of the next word, and only return V/te2I when the next word is 2-nd person

v0.26

1 year ago

[Ko van der Sloot]

[Maarten van Gompel]

  • added MAINTAINERS file
  • updated codemeta.json