SpaCy Versions Save

💫 Industrial-strength Natural Language Processing (NLP) in Python

v3.7.4

2 months ago

✨ New features and improvements

  • Improve NumPy 2.0 compatibility (#13103).
  • Added language extensions for Faroese and Norwegian Nynorsk (#13116).
  • Add new TextCatReduce.v1 layer for text classification (#13181).
  • Add new TextCatParametricAttention.v1 layer for text classification (#13201).
  • Use build module for creating model packages by default (#13109).
  • Add support for code loading to the benchmark speed command (#13247).
  • Extend lexical attributes for English with more numericals (#13106).
  • Warn about reloading dependencies after downloading models (#13081).

🔴 Bug fixes

📖 Documentation and examples

👥 Contributors

@adrianeboyd, @danieldk, @evornov, @honnibal, @ines, @lise-brinck, @ridge-kimani, @rmitsch, @shadeMe, @svlandeg

v3.7.2

6 months ago

✨ New features and improvements

  • Update __all__ fields (#13063).

🔴 Bug fixes

  • #13035: Remove Pathy requirement.
  • #13053: Restore spacy.cli.project API.
  • #13057: Support Any comparisons for Token and Span.

📖 Documentation and examples

  • Many updates for spacy-llm including Azure OpenAI, PaLM, and Mistral support.
  • Various documentation corrections.

👥 Contributors

@adrianeboyd, @honnibal, @ines, @rmitsch, @svlandeg

v3.7.1

6 months ago

🔴 Bug fixes

  • Revert lazy loading of CLI module for spacy.info to fix availability of spacy.cli following import spacy (#13040).

👥 Contributors

@adrianeboyd, @honnibal, @ines, @svlandeg

v3.7.0

6 months ago

This release drops support for Python 3.6 and adds support for Python 3.12.

✨ New features and improvements

  • Add support for Python 3.12 (#12979).
  • Use the new library Weasel for spaCy projects functionality (#12769).
    • All spacy project commands should run as before, just now they're using Weasel under the hood.
    • ⚠️ Remote storage is not yet supported for Python 3.12. Use Python 3.11 or earlier for remote storage.
  • Extend to Thinc v8.2 (#12897).
  • Extend transformers extra to spacy-transformers v1.3 (#13025).
  • Support registered vectors (#12492).
  • Add --spans-key option for CLI evaluation with spacy benchmark accuracy (#12981).
  • Load the CLI module lazily for spacy.info (#12962).
  • Add type stubs for spacy.training.example (#12801).
  • Warn for unsupported pattern keys in dependency matcher (#12928).
  • Language.replace_listeners: Pass the replaced listener and the tok2vec pipe to the callback in order to support spacy-curated-transformers (#12785).
  • Always use tqdm with disable=None to disable output in non-interactive environments (#12979).
  • Language updates:
    • Add left and right pointing angle brackets as punctuation to ancient Greek (#12829).
    • Update example sentences for Turkish (#12895).
  • Package setup updates:
    • Update NumPy build constraints for NumPy 1.25+ (#12839). For Python 3.9+, it is no longer necessary to set build constraints while building binary wheels.
    • Refactor Cython profiling in order to disable profiling for Python 3.12 in the package setup, since Cython does not currently support profiling for Python 3.12 (#12979).

📦 Trained pipelines updates

The transformer-based trf pipelines have been updated to use our new Curated Transformers library through the Thinc model wrappers and pipeline component from spaCy Curated Transformers.

⚠️ Backwards incompatibilities

  • Drop support for Python 3.6.
  • Drop mypy checks for Python 3.7.
  • Remove ray extra.
  • spacy project has a few backwards incompatibilities due to the transition to the standalone library Weasel, which is not as tightly coupled to spaCy. Weasel produces warnings when it detects older spaCy-specific settings in your environment or project config.
    • Support for the spacy_version configuration key has been dropped.
    • Support for the check_requirements configuration key has been dropped due to the deprecation of pkg_resources.
    • The SPACY_CONFIG_OVERRIDES environment variable is no longer checked. You can set configuration overrides using WEASEL_CONFIG_OVERRIDES.
    • Support for SPACY_PROJECT_USE_GIT_VERSION environment variable has been dropped.
    • Error codes are now Weasel-specific and do not follow spaCy error codes.

📖 Documentation and examples

👥 Contributors

@adrianeboyd, @bdura, @connorbrinton, @danieldk, @davidberenstein1957, @denizcodeyaa, @eltociear, @evornov, @honnibal, @ines, @jmyerston, @koaning, @magdaaniol, @pdhall99, @ringohoffman, @rmitsch, @senisioi, @shadeMe, @svlandeg, @vinbo8, @wjbmattingly

v3.6.1

8 months ago

✨ New features and improvements

  • Allow Pydantic v2 using transitional v1 support (#12888).
  • Add find-function CLI for finding locations of registered functions (#12757).
  • Add extra spacy[cuda12x] for cupy-cuda12x (#12890).
  • Extend tests for init config and train CLI (#12173).
  • Switch from distutils to setuptools/sysconfig (#12853).

🔴 Bug fixes

  • #12817: Escape annotated HTML tags in displaCy span renderer.
  • #12857: Display model's full base version string in incompatibility warning.
  • #12882: Update <br> tags in displaCy.

📖 Documentation and examples

  • Various documentation corrections and updates.
  • New additions to spaCy Universe:

👥 Contributors

@adrianeboyd, @afriedman412, @arplusman, @bdura, @connorbrinton, @honnibal, @ines, @it176131, @pmbaumgartner, @rmitsch, @shadeMe, @svlandeg, @thomashacker, @victorialslocum, @x-tabdeveloping

v3.6.0

9 months ago

✨ New features and improvements

  • NEW: span_finder pipeline component to identify overlapping, unlabeled spans (#12507).
  • Language updates:
    • Add initial support for Malay (#12602).
    • Update Latin defaults to support noun chunks, update lexical/tokenizer defaults and add example sentences (#12538).
  • Add option to return scores separately keyed by component name with spacy evaluate --per-component, Language.evaluate(per_component=True) and Scorer.score(per_component=True) (#12540).
  • Support custom token/lexeme attribute for vectors (#12625).
  • Support spancat_singlelabel in spacy debug data CLI (#12749).
  • Typing updates for PhraseMatcher and SpanGroup (#12642, #12714).

🔴 Bug fixes

  • #12569: Require that all SpanGroup spans come from the current doc.

📦 Trained pipelines updates

We have added new pipelines for Slovenian that use the trainable lemmatizer and floret vectors.

Package UPOS Parser LAS NER F
sl_core_news_sm 96.9 82.1 62.9
sl_core_news_md 97.6 84.3 73.5
sl_core_news_lg 97.7 84.3 79.0
sl_core_news_trf 99.0 91.7 90.0
  • 🙏 Special thanks to @orglce for help with the new pipelines!

The English pipelines have been updated to improve handling of contractions with various apostrophes and to lemmatize "get" as a passive auxiliary.

The Danish pipeline da_core_news_trf has been updated to use vesteinn/DanskBERT with performance improvements across the board.

⚠️ Backwards incompatibilities

  • SpanGroup spans are now required to be from the same doc. When initializing a SpanGroup, there is a new check to verify that all added spans refer to the current doc. Without this check, it was possible to run into string store or other errors.

📖 Documentation and examples

👥 Contributors

@adrianeboyd, @bdura, @danieldk, @davidberenstein1957, @diyclassics, @essenmitsosse, @honnibal, @ines, @isabelizimm, @jmyerston, @kadarakos, @KennethEnevoldsen, @khursani8, @ljvmiranda921, @rmitsch, @shadeMe, @svlandeg, @tomaarsen, @victorialslocum, @vin-ivar, @ZiadAmerr

v3.5.4

9 months ago

✨ New features and improvements

  • Extend Typer support to v0.9 (#12631).

🔴 Bug fixes

  • #12701: Fix issues with component names and listeners for sourced components.
  • #12623: Support overrides for registered functions in configs.

👥 Contributors

@adrianeboyd, @bdura, @honnibal, @ines, @svlandeg

v3.2.6

10 months ago

This bug fix release is primarily to address Pydantic incompatibility with typing_extensions>=4.6.0.

✨ New features and improvements

  • Huge speed improvements for spancat, in particular on GPU (~10x-30x faster) (#12577).

🔴 Bug fixes

  • Add typing_extensions requirement due to Pydantic incompatibility with typing_extensions>=4.6.0.
  • Remove #egg from download URLs due to future deprecation in pip.

👥 Contributors

@adrianeboyd, @honnibal, @ines, @kadarakos, @svlandeg

v3.3.3

10 months ago

This bug fix release is primarily to address Pydantic incompatibility with typing_extensions>=4.6.0.

✨ New features and improvements

  • Huge speed improvements for spancat, in particular on GPU (~10x-30x faster) (#12577).

🔴 Bug fixes

  • Add typing_extensions requirement due to Pydantic incompatibility with typing_extensions>=4.6.0.
  • Remove #egg from download URLs due to future deprecation in pip.

👥 Contributors

@adrianeboyd, @honnibal, @ines, @kadarakos, @svlandeg

v3.5.3

11 months ago

✨ New features and improvements

  • Huge speed improvements for spancat, in particular on GPU (~10x-30x faster) (#12577).
  • Improve speed for child operators (>+, >-, >++, >--) for the dependency matcher (#12528).
  • Improve loading speed for tokenizers with a large number of exceptions (#12553).
  • Support doc.spans for displaCy output in spacy benchmark accuracy / spacy evaluate (#12575).
  • Add MorphAnalysis.get(default=) argument for user-provided default values similar to dict (#12545).
  • Only perform vectors checks during initialization if there are sourced components (#12607).

🔴 Bug fixes

  • #12567: Remove #egg from download URLs due to future deprecation in pip.

📖 Documentation and examples

👥 Contributors

@adrianeboyd, @andyjessen, @bdura, @davidberenstein1957, @diyclassics, @honnibal, @ines, @kadarakos, @KennethEnevoldsen, @ljvmiranda921, @moxley01, @royashcenazi, @svlandeg, @tanloong, @victorialslocum