Topic Inference with Zeroshot models
.. contents:: Table of Contents :backlinks: none
Hand-labelled training sets are expensive and time consuming to create usually. Some datasets call for domain expertise (eg: medical/finance datasets etc). Given these factors around costs and inflexibility of hand-labelling it would be nice if there are tools which can help us get started quickly with minimal labelled dataset - enter weak supervision.
But what if you do not have any labelled data at all? is there a way to still label your data automatically in some way? That's where zeroshot_topics might be useful! to help you to be up and running quickly.
zeroshot_topics let's you do exactly that! it leverages the power of zeroshot-classifiers, transformers & knowledge graphs to automatically suggest labels/topics from your text data. all you need to do is point it towards your data.
The algorithm contains, 4 stages:
.. image:: assets/zstm.png
KeyBERT <https://github.com/MaartenGr/KeyBERT>
_. but really any sort of keyword extractor can be used.Note: Currently, this tends to work well on short-texts in general, in the future I intend to experiment and see how we can support long texts as well.
zeroshot_topics is distributed on PyPI <https://pypi.org>
_ as a universal
wheel and is available on Linux/macOS and Windows and supports
Python 3.7+ and PyPy.
.. code-block:: bash
$ pip install zeroshot_topics
.. code-block:: python
from zeroshot_topics import ZeroShotTopicFinder
zsmodel = ZeroShotTopicFinder()
text = """can you tell me anything else okay great tell me everything you know about George_Washington.
he was the first president he was well he I'm trying to well he fought in the Civil_War he was a general
in the Civil_War and chopped down his father's cherry tree when he was a little boy he that's it."""
zsmodel.find_topic(text, n_topic=2)
# Output - Topics: ['War', 'Head Of State']
Some things that i plan to add in the coming days, if there's some interest in this work by the community.
zeroshot_topics is distributed under the terms of
MIT License <https://choosealicense.com/licenses/mit>
_Apache License, Version 2.0 <https://choosealicense.com/licenses/apache-2.0>
_