Zemberek Python Examples Save Abandoned

Zemberek Turkish NLP examples written in Python using the JPype package. Battle-tested in hackathons and r&d projects by many students and researchers across Turkey!

Project README

Zemberek Python Examples

Zemberek Turkish NLP examples written in Python using the JPype package.

Zemberek is a Java-based natural language processing (NLP) tool created for the Turkish language. This repository contains the Python implementations of the official Zemberek examples for learning purposes.

Folder	Description
classification	fastText examples
core	histogram
morphology	stemming, lemmatization, diacritics analysis, POS tag analysis, morphological analysis, word generation, sentence disambiguation, informal word analysis, adding dictionary items
named-entitiy-recognition	on hold
normalization	document correction, noisy text normalization, spell checking
tokenization	sentence boundary detection, turkish tokenization

Requirements

Python 3.6+

Getting Started

Clone this library and cd into it.

$ git clone https://github.com/ozturkberkay/Zemberek-Python-Examples.git
$ cd Zemberek-Python-Examples

Install the required packages. Using virtualenv is highly encouraged!

$ python -m pip install --upgrade pip virtualenv
$ python -m virtualenv .env
$ # Windows: .env\Scripts\activate
$ source .env/bin/activate
$ python -m pip install -r requirements.txt

Download the required Zemberek files:

$ python -m downloader

Optionally, you can manually download all the data and version 0.17.1 of Zemberek distribution from the official Zemberek Drive folder and put the files in the corresponding folders:

 .
 +-- bin
 |   +-- zemberek-full.jar
 +-- data
 |   +-- classification
 |       +-- news-title-category-set
 |       +-- news-title-category-set.lemmas
 |       +-- news-title-category-set.tokenized
 |   +-- dictionaries
 |   +-- lm
 |       +-- lm.2gram.slm
 |   +-- ner
 |   +-- normalization
 |       +-- ascii-map
 |       +-- lookup-from-graph
 |       +-- split

Usage

Run python -m main category.example args.

$ python -m main classification.simple_classification "Fenerbahçe bu maçı galibiyet ile sonlandırdı."
...

    News classification example. Trains a new model if there is no model
    available.

    Args:
        sentence (str): Sentence to classify.

Sentence: Fenerbahçe bu maçı galibiyet ile sonlandırdı.

Item 1: __label__spor 
Score 1: -0.009194993413984776

Item 2: __label__magazin 
Score 2: -6.12613582611084

Item 3: __label__kültür_sanat 
Score 3: -6.226541996002197

Known Bugs

During the model training, fastText will print errors. It still works, just ignore them.

Changelog

2020-12-05
- Automatic downloader for Zemberek files.
- Simple CLI entry-point to run the examples with custom data.
- JPype1 v1.2.0 upgrade. This should fix some memory leak issues.
- Code quality improvements.
- Fixes for broken links.
2019-10-29
- Zemberek v0.17.1 upgrade.
- JPype1 v0.7.0 upgrade.
- Code style changes.
- Bug-fixes.
- License is now the same with Zemberek (Apache v2.0).
2018-12-01
- Classification, morphology, normalization and tokenization examples.

Open Source Agenda is not affiliated with "Zemberek Python Examples" Project. README Source: ozturkberkay/Zemberek-Python-Examples

Stars

117

Open Issues

Last Commit

2 years ago

License

Apache-2.0

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/zemberek-python-examples"><img src="https://www.opensourceagenda.com/projects/zemberek-python-examples/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog