10kGNAD Save

Ten Thousand German News Articles Dataset for Topic Classification

Project README

Ten Thousand German News Articles Dataset

For more information visit the detailed project page.

  1. Install the required python packages pip install -r requirements.txt.
  2. Download the corpus.sqlite3 file into the project root from here (compressed) or directly from here.
  3. Run python code/extract_dataset_from_sqlite.py corpus.sqlite3 articles.csv to extract the articles.
  4. Run python code/split_articles_into_train_test.py to split the dataset.

License

All code in this repository is licensed under a MIT License.

The dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Open Source Agenda is not affiliated with "10kGNAD" Project. README Source: tblock/10kGNAD
Stars
79
Open Issues
1
Last Commit
1 year ago
Repository
License
MIT

Open Source Agenda Badge

Open Source Agenda Rating