💫 Industrial-strength Natural Language Processing (NLP) in Python
TextCatReduce.v1
layer for text classification (#13181).TextCatParametricAttention.v1
layer for text classification (#13201).build
module for creating model packages by default (#13109).benchmark speed
command (#13247).Language.pipe
.Doc
.Tokenizer.explain
for special cases with whitespace.SparseLinear
layer.trf_data
examples and the transformer pipeline design section.@adrianeboyd, @danieldk, @evornov, @honnibal, @ines, @lise-brinck, @ridge-kimani, @rmitsch, @shadeMe, @svlandeg
__all__
fields (#13063).spacy.cli.project
API.Any
comparisons for Token
and Span
.spacy-llm
including Azure OpenAI, PaLM, and Mistral support.@adrianeboyd, @honnibal, @ines, @rmitsch, @svlandeg
This release drops support for Python 3.6 and adds support for Python 3.12.
spacy project
commands should run as before, just now they're using Weasel under the hood.transformers
extra to spacy-transformers
v1.3 (#13025).--spans-key
option for CLI evaluation with spacy benchmark accuracy
(#12981).spacy.info
(#12962).spacy.training.example
(#12801).Language.replace_listeners
: Pass the replaced listener and the tok2vec
pipe to the callback in order to support spacy-curated-transformers
(#12785).tqdm
with disable=None
to disable output in non-interactive environments (#12979).The transformer-based trf
pipelines have been updated to use our new Curated Transformers library through the Thinc model wrappers and pipeline component from spaCy Curated Transformers.
ray
extra.spacy project
has a few backwards incompatibilities due to the transition to the standalone library Weasel, which is not as tightly coupled to spaCy. Weasel produces warnings when it detects older spaCy-specific settings in your environment or project config.
spacy_version
configuration key has been dropped.check_requirements
configuration key has been dropped due to the deprecation of pkg_resources
.SPACY_CONFIG_OVERRIDES
environment variable is no longer checked. You can set configuration overrides using WEASEL_CONFIG_OVERRIDES
.SPACY_PROJECT_USE_GIT_VERSION
environment variable has been dropped.@adrianeboyd, @bdura, @connorbrinton, @danieldk, @davidberenstein1957, @denizcodeyaa, @eltociear, @evornov, @honnibal, @ines, @jmyerston, @koaning, @magdaaniol, @pdhall99, @ringohoffman, @rmitsch, @senisioi, @shadeMe, @svlandeg, @vinbo8, @wjbmattingly
find-function
CLI for finding locations of registered functions (#12757).spacy[cuda12x]
for cupy-cuda12x
(#12890).init config
and train
CLI (#12173).distutils
to setuptools
/sysconfig
(#12853).<br>
tags in displaCy.@adrianeboyd, @afriedman412, @arplusman, @bdura, @connorbrinton, @honnibal, @ines, @it176131, @pmbaumgartner, @rmitsch, @shadeMe, @svlandeg, @thomashacker, @victorialslocum, @x-tabdeveloping
span_finder
pipeline component to identify overlapping, unlabeled spans (#12507).spacy evaluate --per-component
, Language.evaluate(per_component=True)
and Scorer.score(per_component=True)
(#12540).spancat_singlelabel
in spacy debug data
CLI (#12749).PhraseMatcher
and SpanGroup
(#12642, #12714).SpanGroup
spans come from the current doc.We have added new pipelines for Slovenian that use the trainable lemmatizer and floret vectors.
Package | UPOS | Parser LAS | NER F |
---|---|---|---|
sl_core_news_sm |
96.9 | 82.1 | 62.9 |
sl_core_news_md |
97.6 | 84.3 | 73.5 |
sl_core_news_lg |
97.7 | 84.3 | 79.0 |
sl_core_news_trf |
99.0 | 91.7 | 90.0 |
The English pipelines have been updated to improve handling of contractions with various apostrophes and to lemmatize "get" as a passive auxiliary.
The Danish pipeline da_core_news_trf
has been updated to use vesteinn/DanskBERT
with performance improvements across the board.
SpanGroup
spans are now required to be from the same doc. When initializing a SpanGroup
, there is a new check to verify that all added spans refer to the current doc. Without this check, it was possible to run into string store or other errors.@adrianeboyd, @bdura, @danieldk, @davidberenstein1957, @diyclassics, @essenmitsosse, @honnibal, @ines, @isabelizimm, @jmyerston, @kadarakos, @KennethEnevoldsen, @khursani8, @ljvmiranda921, @rmitsch, @shadeMe, @svlandeg, @tomaarsen, @victorialslocum, @vin-ivar, @ZiadAmerr
This bug fix release is primarily to address Pydantic incompatibility with typing_extensions>=4.6.0
.
spancat
, in particular on GPU (~10x-30x faster) (#12577).typing_extensions
requirement due to Pydantic incompatibility with typing_extensions>=4.6.0
.#egg
from download URLs due to future deprecation in pip
.@adrianeboyd, @honnibal, @ines, @kadarakos, @svlandeg
This bug fix release is primarily to address Pydantic incompatibility with typing_extensions>=4.6.0
.
spancat
, in particular on GPU (~10x-30x faster) (#12577).typing_extensions
requirement due to Pydantic incompatibility with typing_extensions>=4.6.0
.#egg
from download URLs due to future deprecation in pip
.@adrianeboyd, @honnibal, @ines, @kadarakos, @svlandeg
spancat
, in particular on GPU (~10x-30x faster) (#12577).>+
, >-
, >++
, >--
) for the dependency matcher (#12528).doc.spans
for displaCy output in spacy benchmark accuracy
/ spacy evaluate
(#12575).MorphAnalysis.get(default=)
argument for user-provided default values similar to dict
(#12545).#egg
from download URLs due to future deprecation in pip
.@adrianeboyd, @andyjessen, @bdura, @davidberenstein1957, @diyclassics, @honnibal, @ines, @kadarakos, @KennethEnevoldsen, @ljvmiranda921, @moxley01, @royashcenazi, @svlandeg, @tanloong, @victorialslocum