🪐 End-to-end NLP workflows from prototype to production
Weasel, previously spaCy projects, lets you manage and share end-to-end workflows for different use cases and domains, and orchestrate training, packaging and serving your custom pipelines. You can start off by cloning a pre-defined project template, adjust it to fit your needs, load in your data, train a pipeline, export it as a Python package, upload your outputs to a remote storage and share your results with your team.
⚠️ Weasel project templates require Weasel, which is also included by default with spaCy v3.7+. You can install it from pip with
pip install weasel
or conda withconda install weasel -c conda-forge
. Make sure to use a fresh virtual environment.See the
master
branch for the previous version of this repo.
Name | Description |
---|---|
pipelines |
Templates for training NLP pipelines with different components on different corpora. |
tutorials |
Templates that work through a specific NLP use case end-to-end. |
integrations |
Templates showing integrations with third-party libraries and tools for managing your data and experiments, iterating on demos and prototypes and shipping your models into production. |
benchmarks |
Templates to reproduce our benchmarks and produce quantifiable results that are easy to compare against other systems or versions of spaCy. |
experimental |
Experimental workflows and other cutting-edge stuff to use at your own risk. |
Projects can be used via the
weasel
CLI, or
through the spacy project
alias. To find
out more about a command, add --help
. For detailed instructions, see the
Weasel documentation
or spaCy projects usage guide.
python -m weasel clone tutorials/ner_fashion_brands
cd ner_fashion_brands
python -m pip install -r requirements.txt
project.yml
.
python -m weasel assets
project.yml
.
python -m weasel run preprocess
python -m weasel run all
To keep the project templates and their documentation up to date, this repo contains several scripts:
Script | Description |
---|---|
update_docs.py |
Update all auto-generated docs in the given root. Calls into spacy project document and only replaces the auto-generated sections, not any custom content before or after. |
update_category_docs.py |
Update the auto-generated README.md in the category directories listing the available project templates. |
update_configs.py |
Update and auto-fill all config.cfg files included in the repo, similar to spacy init fill-config . Can be used to keep the configs up to date with changes in spaCy. |
update_projects_jsonl.py |
Update projects.jsonl file in the given root. Should be used at the root level of the repo. |