Stanza Versions Save

Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages


3 weeks ago

Add an Old English pipeline, improve the handling of MWT for cases that should be easy, and improve the memory management of our usage of transformers with adapters.

Old English

MWT improvements

Peft memory management

Other bugfixes and minor upgrades

Other upgrades


2 months ago

Integrating PEFT into several different annotators

We integrate PEFT into our training pipeline for several different models. This greatly reduces the size of models with finetuned transformers, letting us make the finetuned versions of those models the default_accurate model.

The biggest gains observed are with the constituency parser and the sentiment classifier.

Previously, the default_accurate package used transformers where the head was trained but the transformer itself was not finetuned.

Model improvements



Additional 1.8.1 Bugfixes


2 months ago

Integrating PEFT into several different annotators

We integrate PEFT into our training pipeline for several different models. This greatly reduces the size of models with finetuned transformers, letting us make the finetuned versions of those models the default_accurate model.

The biggest gains observed are with the constituency parser and the sentiment classifier.

Previously, the default_accurate package used transformers where the head was trained but the transformer itself was not finetuned.

Model improvements




5 months ago

Neural coref processor added!

Conjunction-Aware Word-Level Coreference Resolution original implementation:

Updated form of Word-Level Coreference Resolution original implementation:

If you use Stanza's coref module in your work, please be sure to cite both of the above papers.

Special thanks to vdobrovolskii, who graciously agreed to allow for integration of his work into Stanza, to @KarelDO for his support of his training enhancement, and to @Jemoka for the LoRA PEFT integration, which makes the finetuning of the transformer based coref annotator much less expensive.

Currently there is one model provided, a transformer based English model trained from OntoNotes. The provided model is currently based on Electra-Large, as that is more harmonious with the rest of our transformer architecture. When we have LoRA integration with POS, depparse, and the other processors, we will revisit the question of which transformer is most appropriate for English.

Future work includes ZH and AR models from OntoNotes, additional language support from UD-Coref, and lower cost non-transformer models

Interface change: English MWT

English now has an MWT model by default. Text such as won't is now marked as a single token, split into two words, will and not. Previously it was expected to be tokenized into two pieces, but the Sentence object containing that text would not have a single Token object connecting the two pieces. See and for more information.

Code that used to operate with for word in sentence.words will continue to work as before, but for token in sentence.tokens will now produce one object for MWT such as won't, cannot, Stanza's, etc.

Pipeline creation will not change, as MWT is automatically (but not silently) added at Pipeline creation time if the language and package includes MWT.

Other updates

Updated requirements

  • Support dropped for python 3.6 and 3.7. The peft module used for finetuning the transformer used in the coref processor does not support those versions.
  • Added peft as an optional dependency to transformer based installations
  • Added networkx as a dependency for reading enhanced dependencies. Added toml as a dependency for reading the coref config.


7 months ago

V1.6.1 is a patch of a bug in the Arabic POS tagger.

We also mark Python 3.11 as supported in the classifiers. This will be the last release that supports Python 3.6

Multiple model levels

The package parameter for building the Pipeline now has three default settings:

  • default, the same as before, where POS, depparse, and NER use the charlm, but lemma does not
  • default-fast, where POS and depparse are built without the charlm, making them substantially faster on CPU. Some languages currently have non-charlm NER as well
  • default-accurate, where the lemmatizer also uses the charlm, and other models use transformers if we have one for that language. Suggestions for more transformers to use are welcome

Furthermore, package dictionaries are now provided for each UD dataset which encompass the default versions of models for that dataset, although we do not further break that down into -fast and -accurate versions for each UD dataset.


addresses and

Multiple output heads for one NER model

The NER models now can learn multiple output layers at once.

Theoretically this could be used to save a bit of time on the encoder while tagging multiple classes at once, but the main use case was to crosstrain the OntoNotes model on the WorldWide English newswire data we collected. The effect is that the model learns to incorporate some named entities from outside the standard OntoNotes vocabulary into the main 18 class tagset, even though the WorldWide training data is only 8 classes.

Results of running the OntoNotes model, with charlm but not transformer, on the OntoNotes and WorldWide test sets:

original ontonotes on worldwide:   88.71  69.29
simplify-separate                  88.24  75.75
simplify-connected                 88.32  75.47

We also produced combined models for nocharlm and with Electra as the input encoding. The new English NER models are the packages ontonotes-combined_nocharlm, ontonotes-combined_charlm, and ontonotes-combined_electra-large.

Future plans include using multiple NER datasets for other models as well.

Other features



7 months ago

Multiple model levels

The package parameter for building the Pipeline now has three default settings:

  • default, the same as before, where POS, depparse, and NER use the charlm, but lemma does not
  • default-fast, where POS and depparse are built without the charlm, making them substantially faster on CPU. Some languages currently have non-charlm NER as well
  • default-accurate, where the lemmatizer also uses the charlm, and other models use transformers if we have one for that language. Suggestions for more transformers to use are welcome

Furthermore, package dictionaries are now provided for each UD dataset which encompass the default versions of models for that dataset, although we do not further break that down into -fast and -accurate versions for each UD dataset.


addresses and

Multiple output heads for one NER model

The NER models now can learn multiple output layers at once.

Theoretically this could be used to save a bit of time on the encoder while tagging multiple classes at once, but the main use case was to crosstrain the OntoNotes model on the WorldWide English newswire data we collected. The effect is that the model learns to incorporate some named entities from outside the standard OntoNotes vocabulary into the main 18 class tagset, even though the WorldWide training data is only 8 classes.

Results of running the OntoNotes model, with charlm but not transformer, on the OntoNotes and WorldWide test sets:

original ontonotes on worldwide:   88.71  69.29
simplify-separate                  88.24  75.75
simplify-connected                 88.32  75.47

We also produced combined models for nocharlm and with Electra as the input encoding. The new English NER models are the packages ontonotes-combined_nocharlm, ontonotes-combined_charlm, and ontonotes-combined_electra-large.

Future plans include using multiple NER datasets for other models as well.

Other features



8 months ago


depparse can have transformer as an embedding

Lemmatizer can remember word,pos it has seen before with a flag

Scoring scripts for Flair and spAcy NER models (requires the appropriate packages, of course)

SceneGraph connection for the CoreNLP client

Update constituency parser to reduce the learning rate on plateau. Fiddling with the learning rates significantly improves performance

Tokenize [] based on () rules if the original dataset doesn't have [] in it

Attempt to finetune the charlm when building models (have not found effective settings for this yet)

Add the charlm to the lemmatizer - this will not be the default, since it is slower, but it is more accurate


Forgot to include the lemmatizer in CoreNLP 4.5.3, now in 4.5.4

prepare_ner_dataset was always creating an Armenian pipeline, even for non-Armenian langauges

Fix an empty bulk_process throwing an exception

Unroll the recursion in the Tarjan part of the Chuliu-Edmonds algorithm - should remove stack overflow errors

Minor updates

Put NER and POS scores on one line to make it easier to grep for:

Switch all pretrains to use a name which indicates their source, rather than the dataset they are used for: and many others

Pipeline uses torch.no_grad() for a slight speed boost

Generalize save names, which eventually allows for putting transformer, charlm or nocharlm in the save name - this lets us distinguish different complexities of model for constituency, and others for the other models

Add the model's flags to the --help for the run scripts, such as

Remove the dependency on six (thank you @BLKSerene )

New Models

VLSP constituency

VLSP constituency -> tagging

CTB 5.1 constituency

Add support for CTB 9.0, although those models are not distributed yet

Added an Indonesian charlm

Indonesian constituency from ICON treebank

All languages with pretrained charlms now have an option to use that charlm for dependency parsing

French combined models out of GSD, ParisStories, Rhapsodie, and Sequoia

UD 2.12 support


1 year ago

Ssurgeon interface

Headlining this release is the initial release of Ssurgeon, a rule-based dependency graph editing tool. Along with the existing Semgrex integration with CoreNLP, Ssurgeon allows for rewriting of dependencies such as in the UD datasets. More information is in the GURT 2023 paper,

In addition to this addition, there are two other CoreNLP integrations, a long list of bugfixes, a few other minor features, and a long list of constituency parser experiments which were somewhere between "ineffective" and "small improvements" and are available for people to experiment with.

CoreNLP integration:



New models:

Conparser experiments:


1 year ago

Stanza v1.4.2: Minor version bump to improve (python) dependencies


1 year ago

Stanza v1.4.1: Improvements to pos, conparse, and sentiment, jupyter visualization, and wider language coverage


We improve the quality of the POS, constituency, and sentiment models, add an integration to displaCy, and add new models for a variety of languages.

New NER models

Other new models

Model improvements

Pipeline interface improvements


Improved training tools