ChatGPT 中文语料库 对话语料 小说语料 客服语料 用于训练大模型
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainia...
Cornell NLVR and NLVR2 are natural language grounding datasets. Each exa...
PTT 八卦版問答中文語料
Helsinki Prosody Corpus and A System for Predicting Prosodic Prominence ...
My fuzzing corpus
Preprocessed Python functions and docstrings for automated code document...
PubMed 200k RCT dataset: a large dataset for sequential sentence classif...
A command-line toolkit to extract text content and category data from Wi...
This repository contains code and metadata of How2 dataset
近代汉语语料库数据集 自然语言处理 语料库 古代汉语 古汉语 文言文 数字人文...
Corpus of Russian news articles collected from Lenta.Ru
Korean sejong corpus download and simple analysis
中文文本分类实践,基于搜狗新闻语料库,采用传统机器学习方法以及预训练模型...
A list of ~100,000 German nouns and their grammatical properties compile...