An Integrated Corpus Tool With Multilingual Support for the Study of Lan...
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa...
My book list
A list of Indonesian NLP resources.
A web-based engine for creating and annotating textual corpora
Crawler for linguistic corpora
:spider: The pipeline for the OSCAR corpus
Kanji usage frequency data collected from various sources
Data for the quantitative study of (Vedic) Sanskrit
An asynchronous concurrent pipeline for classifying Common Crawl based o...
An advanced, extensible web front-end for the Manatee-open corpus search...
Large silver standart Russian corpus with NER, morphology and syntax markup
A textual corpus database for the digital humanities.
SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.g...
A set of workflows for corpus building through OCR, post-correction and ...