Python & command-line tool to gather text on the Web: web crawling/scrap...
An Integrated Corpus Tool With Multilingual Support for the Study of Lan...
Bitextor generates translation memories from multilingual websites
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainia...
Python library for handling audio datasets.
Simple multilingual lemmatizer for Python, especially useful for speed a...
OpusFilter - Parallel corpus processing toolkit
Utilities for Processing the Switchboard Dialogue Act Corpus
An open source reimplementation of Benny Brodda's BETA in Python
An advanced, extensible web front-end for the Manatee-open corpus search...
SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.g...
Multi-Language Dataset Cleaner/Creator for Mozilla's DeepSpeech Framework
A set of workflows for corpus building through OCR, post-correction and ...
Tools for filtering and cleaning parallel and monolingual corpora for ma...
Python library for extracting quantitative, reproducible metrics of mult...