💙 Emoji handling and meta data for spaCy with custom extension attributes
Thanks to @buhrmann for the pull request!
Doc.retokenize
API for merging.Update spacymoji
to work with spacy>=2.0.0
instead of spacy-nightly
.
spaCy v2.0 extension and pipeline component for adding emoji meta data to Doc
objects. Detects emoji consisting of one or more unicode characters, and can optionally merge multi-char emoji (combined pictures, emoji with skin tone modifiers) into one token. Human-readable emoji descriptions are added as a custom attribute, and an optional lookup table can be provided for your own descriptions.
Disclaimer: This extension only works in spaCy v2.0 (currently in alpha) and is still experimental.