개인적으로 수집한 한국어 NLP용 말뭉치 모음
Convert a PDF via OCR to a TXT file in UTF-8 encoding
A list of ~100,000 German nouns and their grammatical properties compile...
Colibri core is an NLP tool as well as a C++ and Python library for work...
Kanji usage frequency data collected from various sources
总结了一些可以用作聊天机器人训练实作的文字语聊,包含中英文不同语言
lists of text corpus and more (mainly Japanese)
中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.
An Open-Source Package for Chinese Open-domain Conversational Chatbot (...
Japanese text8 corpus for word embedding.
The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for C...
Linguistic search for large annotated text corpora, based on Apache Lucene
A General Purpose NLP library for Turkish
MEV Data Corpus