The official tool for transforming doccano format into common dataset formats.
Doccano Transformer helps you to transform an exported dataset into the format of your favorite machine learning library.
Doccano Transformer supports the following formats:
To install doccano-transformer
, simply use pip
:
pip install doccano-transformer
The following formats are supported:
from doccano_transformer.datasets import NERDataset
from doccano_transformer.utils import read_jsonl
dataset = read_jsonl(filepath='example.jsonl', dataset=NERDataset, encoding='utf-8')
dataset.to_conll2003(tokenizer=str.split)
dataset.to_spacy(tokenizer=str.split)
We encourage you to contribute to doccano transformer! Please check out the Contributing to doccano transformer guide for guidelines about how to proceed.