Trankit Versions Save

Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing

v1.1.0

2 years ago
  • The issue #17 of loading customized pipelines has been fixed in this new release. Please check it out here.
  • In this new release, trankit supports conversion of trankit outputs in json format to CoNLL-U format. The conversion is done via the new function trankit2conllu, which can be used as belows:
from trankit import Pipeline, trankit2conllu

p = Pipeline('english')

# document level
json_doc = p('''Hello! This is Trankit.''')
conllu_doc = trankit2conllu(json_doc)
print(conllu_doc)
#1       Hello   hello   INTJ    UH      _       0       root    _       _
#2       !       !       PUNCT   .       _       1       punct   _       _
#
#1       This    this    PRON    DT      Number=Sing|PronType=Dem        3       nsubj   _       _
#2       is      be      AUX     VBZ     Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   3       cop     _       _
#3       Trankit Trankit PROPN   NNP     Number=Sing     0       root    _       _
#4       .       .       PUNCT   .       _       3       punct   _       _

# sentence level
json_sent = p('''This is Trankit.''', is_sent=True)
conllu_sent = trankit2conllu(json_sent)
print(conllu_sent)
#1       This    this    PRON    DT      Number=Sing|PronType=Dem        3       nsubj   _       _
#2       is      be      AUX     VBZ     Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   3       cop     _       _
#3       Trankit Trankit PROPN   NNP     Number=Sing     0       root    _       _
#4       .       .       PUNCT   .       _       3       punct   _       _

v1.0.1

3 years ago

v1.0.0

3 years ago

:boom: :boom: :boom: Trankit v1.0.0 is out:

  • 90 new pretrained transformer-based pipelines for 56 languages. The new pipelines are trained with XLM-Roberta large, which further boosts the performance significantly over 90 treebanks of the Universal Dependencies v2.5 corpus. Check out the new performance here. This page shows you how to use the new pipelines.

  • Auto Mode for multilingual pipelines. In the Auto Mode, the language of the input will be automatically detected, enabling the multilingual pipelines to process the input without specifying its language. Check out how to turn on the Auto Mode here. Thank you loretoparisi for your suggestion on this.

  • Command-line interface is now available to use. This helps users who are not familiar with Python programming language can use Trankit easily. Check out the tutorials on this page.