WordTokenizers.jl Versions Save

High performance tokenizers for natural language processing and other related tasks

v0.5.6

3 years ago

WordTokenizers v0.5.6

Diff since v0.5.5

Merged pull requests:

  • Adopt ColPrac? (#54) (@oxinabox)
  • use a normal function in init to intialize the data deps (#56) (@KristofferC)

v0.5.5

3 years ago

WordTokenizers v0.5.5

Diff since v0.5.4

Merged pull requests:

  • Update paper.bib (#47) (@kthyng)
  • Update paper.md (#48) (@kthyng)
  • Install TagBot as a GitHub Action (#49) (@JuliaTagBot)
  • Adding support for unigram sentencepiece model (#51) (@tejasvaidhyadev)
  • Update to version 0.5.5. (#53) (@Ayushk4)

master

4 years ago

v0.5.4

4 years ago

v0.5.4 (2020-02-06)

Diff since v0.5.3

Merged pull requests:

  • Update paper based on JOSS review (#45) (oxinabox)
  • Add installation guide to README (#43) (Ayushk4)
  • Change example setting tokenizer to TinySegmenter.jl's tokenizer (#42) (Ayushk4)
  • Fixing a number of typos in paper and readme (#41) (leios)
  • Minor Fixes in JOSS paper (#40) (Ayushk4)
  • very minor grammar fixes in README (#39) (danielskatz)
  • Add plot comparing speeds of tokenizers to JOSS paper. (#36) (Ayushk4)
  • Support and Contribution guidelines (#35) (Ayushk4)
  • JOSS paper update (#34) (Ayushk4)
  • Write paper for JOSS (#7) (oxinabox)

v0.5.3

4 years ago

v0.5.3 (2019-06-28)

Diff since v0.5.2

Merged pull requests:

  • Handle final periods (#33) (Ayushk4)
  • Toktok fix patch (#31) (Ayushk4)
  • Update for Julia-1.1 (#30) (Ayushk4)
  • Fix sentence splitter: sentences ending with acronyms (#24) (nickto)

v0.5.2

4 years ago

v0.5.2 (2019-06-21)

Diff since v0.5.1

Closed issues:

  • Tokenize begins with full stop. (#28)

Merged pull requests:

v0.5.1

4 years ago

v0.5.1 (2019-06-06)

Diff since v0.5.0

Closed issues:

  • Make a release (#26)
  • Add a Twitter tokenizer (#3)

Merged pull requests:

  • Add Tweet Tokenizer (#13) (Ayushk4)

v0.5.0

4 years ago

v0.5.0 (2019-06-06)

Diff since v0.4.0

Closed issues:

  • Add TokTok tokenizer (#15)

Merged pull requests:

  • Fix inconsistency between tabs and spaces (#25) (Ayushk4)
  • appveyor badge fix (#23) (aquatiko)
  • Fix indentation in nltk_word.jl (#22) (Ayushk4)
  • Fix indentation. (#21) (Ayushk4)
  • Minor doc fixes in fast.jl (#20) (Ayushk4)
  • some cleanup, inclusing changing TokenBuffer to use replaces rather than splits (#19) (oxinabox)
  • add toktok tokenizer (#18) (aquatiko)

v0.4.0

5 years ago
  • fix typo in name of reversable_tokenize

v0.3.1

5 years ago
  • Add reversable tokenizer