🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
R package for Tokenization, Parts of Speech Tagging, Lemmatization and D...
Query Translator is a search query translator with AST representation
Text tokenization and sentence segmentation (segtok v2)
Fast, Consistent Tokenization of Natural Language Text
Determine the tokens that optimally represents a dataset at any specific...
A WHATWG-compliant HTML5 tokenizer and tag soup parser
一个微型&算法全面的中文分词引擎 | A micro tokenizer for Chinese
Replaced by foonathan/lexy
Rust re-implementation of OpenFST - library for constructing, combining,...
aim to use JapaneseTokenizer as easy as possible
A tokenizer and sentence splitter for German and English web and social ...
Collection of developer toolkits
Simple multilingual lemmatizer for Python, especially useful for speed a...
The code and models for "An Empirical Study of Tokenization Strategies f...