🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
transformers
up to v4.26.x.For fast tokenizers, use the offset mapping provided by the tokenizer (#338).
Using the offset mapping instead of the heuristic alignment from spacy-alignments
resolves unexpected and missing alignments such as those discussed in https://github.com/explosion/spaCy/discussions/6563, https://github.com/explosion/spaCy/discussions/10794 and https://github.com/explosion/spaCy/discussions/12023.
:warning: Slow and fast tokenizers will no longer give identical results due to potential differences in the alignments between transformer tokens and spaCy tokens. We recommend retraining all models with fast tokenizers for use with
spacy-transformers
v1.2.
Serialize the tokenizer use_fast
setting (#339).
transformers
up to v4.25.x.transformers
up to v4.20.x.HFShim
and HFWrapper
(#332).transformers
up to v4.21.x.HFShim
(#328).Transformer.initialize
(#341).transformers
up to v4.19.x.transformer
if not available, for example if the transformer
is frozen.