💫 Industrial-strength Natural Language Processing (NLP) in Python
spacy pretrain
(#12435).model-last.bin
for spacy pretrain
(#12459).Span
input for displacy.parse_deps
(#12477).cupy
install extras.Span.sents
.spancat_singlelabel
.Span.sents
when the final sentence is the last token in a Doc
.Span.kb_id
and Span.id
strings in Doc
and DocBin
serialization.@adrianeboyd, @BLKSerene, @honnibal, @ines, @kadarakos, @prajakta-1527, @rmitsch, @shadeMe, @sloev, @svlandeg, @thomashacker, @willfrey
💥 We'd love to hear more about your experience with spaCy! Take our survey here.
spancat_singlelabel
pipeline component for multi-class and non-overlapping span classification. The spancat_singlelabel
component predicts at most one label for each suggested span and adds a new setting allow_overlap
to restrict the output to non-overlapping spans (#11365).transformer
+ CNN for efficient GPU textcat
with spacy init config
(#11900).spacy debug data
(#11419).>+
, >-
, <+
, <-
) (#12334).spacy.PlainTextCorpusReader.v1
for plain text input (#12122).alignment_mode
and span_id
to Span.char_span()
(#12145, #12196).top_k>1
in trainable lemmatizer.test_cli_find_threshold()
test more robust.registry.find()
.Matcher
patterns with extension attributes.grc
to languages with lexeme norms in spacy-lookups-data
.KnowledgeBase
instances configurable.auto_select_port
.InMemoryLookupKB.is_empty
.Lexeme.orth
and Lexeme.lower
.PretrainVectors
.pkg_resources
.@adrianeboyd, @andyjessen, @danieldk, @essenmitsosse, @honnibal, @ines, @itssimon, @kadarakos, @kwhumphreys, @ljvmiranda921, @pmbaumgartner, @polm, @richardpaulhudson, @rmitsch, @shadeMe, @svlandeg, @tanloong, @thomashacker, @victorialslocum
apply
CLI command to annotate new documents with a trained pipeline (#11376).benchmark
CLI command to benchmark pipelines. The new benchmark speed
subcommand measures the speed of a pipeline, the benchmark accuracy
subcommand is a new alias for evaluate
(#11902).find-threshold
CLI command to identify an optimal threshold for classification models (#11280).FUZZY
Matcher
operator for fuzzy matches based on Levenshtein edit distance. In addition, the FUZZY
and REGEX
operators are now supported in combination with IN
/NOT_IN
. (#11359).typer
v0.7.x (#11720), mypy
0.990 (#11801) and typing_extensions
v4.4.x (#12036).spacy.ConsoleLogger.v3
with expanded progress tracking (#11972).textcat
with spacy.textcat_scorer.v2
(#11696 and #11971) and spacy.textcat_multilabel_scorer.v2
(#11820).InMemoryLookupKB
(#11268).before_update
callback that is invoked at the start of each training step (#11739).SpanGroup
(#11380).displacy.serve
when the default port is in use (#11948).tok2vec
version (#11618).tok2vec
or transformer
layer.textcat
.Vocab.to_disk
respects the exclude setting for lookups
and vectors
.SpanGroup
and Span
objects.The following changes may require you to update code that is using the relevant functionality:
textcat
or textcat_multilabel
model - ensure that values are 0.0 or 1.0 as explained in the docs.KnowledgeBase
is now an abstract class, you should call the constructor of the new InMemoryLookupKB
instead when you want to use spaCy's default KB implementation. If you've written a custom KB that inherits from KnowledgeBase
, you'll need to implement its abstract methods, or alternatively inherit from InMemoryLookupKB
instead.The following changes may influence the output of your language pipeline or trained models:
pymorphy3
(#11345, #11811).tok2vec
defaults in all components (#11618).textcat
and textcat_multilabel
components (#11698).textcat
and textcat_multilabel
to fix a bug related to threshold
for textcat
and to make it possible to score multiple textcat
/textcat_multilabel
components in a single pipeline with custom scorers. If no custom scorers are used, the cat_p/r/f
scores will now only reflect the final component's labels and performance (#11696, #11820).token_acc
score to report the intended measure (# correct tokens / # predicted tokens
, the same as in spaCy v2). The token_acc
scores for v3.5 will be lower for the same performance because they were incorrectly inflated in v3.0-v3.4. The token_p/r/f
scores should remain unchanged (#12073).The following functionality will be changed in the near future - so it's best to start updating your scripts now to make them more generic:
master
branch to main
.IS_SPACE
as a tok2vec
feature for tagger
and morphologizer
components to improve tagging of non-whitespace vs. whitespace tokens.spacy-transformers
v1.2, which uses the exact alignment from tokenizers
for fast tokenizers instead of the heuristic alignment from spacy-alignments
. For all trained pipelines except ja_core_news_trf
, the alignments between spaCy tokens and transformer tokens may be slightly different. More details about the spacy-transformers
changes in the v1.2.0 release notes.biluo_to_iob
and iob_to_biluo
functions.@aaronzipp, @adrianeboyd, @albertvillanova, @ArchiDevil, @cfuerbachersparks, @damian-romero, @danieldk, @darigovresearch, @DSLituiev, @essenmitsosse, @gremur, @honnibal, @ines, @jmyerston, @JosPolfliet, @kadarakos, @koaning, @kwhumphreys, @ljvmiranda921, @MarcoGorelli, @orglce, @pmbaumgartner, @polm, @richardpaulhudson, @rmitsch, @ryndaniels, @shadeMe, @svlandeg, @thomashacker, @TrellixVulnTeam, @wannaphong, @zhiiw, @zrpxx
This bug fix release is primarily to avoid deprecation warnings and future incompatibility with NumPy v1.24+.
smart_open
requirement and update deprecated options.spacy init config --gpu
for environments without spacy-transformers
.@adrianeboyd, @honnibal, @ines, @polm, @svlandeg
This bug fix release is primarily to avoid deprecation warnings and future incompatibility with NumPy v1.24+.
spancat
for docs with zero suggestions.smart_open
requirement and update deprecated options.spacy init config --gpu
for environments without spacy-transformers
.@adrianeboyd, @honnibal, @ines, @polm, @svlandeg
This bug fix release is primarily to avoid deprecation warnings and future incompatibility with NumPy v1.24+.
spancat
for docs with zero suggestions.smart_open
requirement and update deprecated options.spacy init config --gpu
for environments without spacy-transformers
.@adrianeboyd, @honnibal, @ines, @polm, @svlandeg
This bug fix release is primarily to avoid deprecation warnings and future incompatibility with NumPy v1.24+.
precomputable_biaffine
by avoiding concatenation.spancat
for docs with zero suggestions.smart_open
requirement and update deprecated options.spacy init config --gpu
for environments without spacy-transformers
.EditTreeLemmatizer
.@adrianeboyd, @danieldk, @honnibal, @ines, @polm, @svlandeg
This bug fix release is primarily to avoid deprecation warnings and future incompatibility with NumPy v1.24+.
spancat
for docs with zero suggestions.smart_open
requirement and update deprecated options.spacy init config --gpu
for environments without spacy-transformers
.EditTreeLemmatizer
.@adrianeboyd, @danieldk, @honnibal, @ines, @polm, @svlandeg
EntityLinker
.Doc.to_json()
for attributes set by getters.pipeline_package.load()
.spacy project
requirements checks for unsupported specifiers and requirements lines.spacy.load(disable=)
that could enable currently disabled components.@aaronzipp, @adrianeboyd, @honnibal, @ines, @polm, @rmitsch, @ryndaniels, @svlandeg, @thomashacker