Cantonese Linguistics and NLP
parse_text
for analyzing Cantonese text data.characters_to_jyutping
function now has the segmenter
kwarg for
customizing word segmentation.pyproject.toml
. Related to preferring setup.cfg
for specifying
build metadata and options.characters_to_jyutping
function,
in case rime-cantonese and HKCanCor don't agree,
rime-cantonese data (more accurate) is preferred.2021.05.16
release,
improving both characters-to-Jyutping conversion and word segmentation.CHATReader
to use the new methods to_chat
, to_strs
, info
, head
, and tail
.setup.cfg
to fully specify build metadata and options,
while keeping a minimal setup.py
for backward compatibility.
Related to the new pyproject.toml
.safety
and bandit
checks at CircleCI builds.append
, append_left
, extend
, and extend_left
of the class CHATReader
through the upstream PyLangAcq package.NotImplementedError
for the method ipsyn
of CHATReader
,
since the upstream method works only for English.Note: The underlying CHAT parser, the PyLangAcq package, has been bumped to v0.13.0. All of the updates of PyLangAcq's CHAT reader apply to this PyCantonese release as well. The details are in PyLangAcq's changelog for v0.13.0. The changelog entries below only document updates specific to PyCantonese.
Jyutping
class to better represent parsed Jyutping romanization.parse_jyutping
now returns a list of Jyutping
objects,
rather than tuples of strings.The following methods in the CHATReader
class have been deprecated:
character_sents
(use characters
with by_utterances=True
instead)jyutping_sents
(use jyutping
with by_utterances=True
instead)The following arguments of the search
method of CHATReader
have been deprecated:
sent_range
(use utterance_range
instead)tagged
(use by_tokens
instead)sents
(use by_utterances
instead)pos_tag
that takes a segmented sentence or phrase
and returns its part-of-speech tags.hkcancor_to_ud
that maps a part-of-speech tag
from the original HKCanCor annotated data to one of the tags from the
Universal Dependencies v2 tagset..rst
doc files.jyutping_to_yale
and parse_jyutping
now return a null value
(rather than raise an error) when the input is null.segment
now strips all whitespace
from the input unsegmented string before segmenting it.