Clips Hades Save

Repository for the CLiPS HAte speech DEtection System [HADES].

Project README

HADES

This is a work-in-progress repository for the CLiPS HAte speech DEtection System (HADES).

Currently, the repository contains the supplementary materials from the paper: "A Dictionary-based Approach to Racism Detection in Dutch Social Media", presented at the TA-COS workshop at LREC 2016.

license

The dictionaries in this repository are available under a CC BY-SA 4.0 License. If you use the dictionaries in your work, please cite:

@inproceedings{tulkens2016a,
  title={A Dictionary-based Approach to Racism Detection in {Dutch} Social Media},
  author={Tulkens, St\'{e}phan and Hilte, Lisa and Lodewyckx, Elise and Verhoeven, Ben and Daelemans, Walter},
  booktitle={Proceedings of the LREC 2016 Workshop on Text Analytics for Cybersecurity and Online Safety (TA-COS)},
  year={2016},
  organization={European Language Resources Association (ELRA)}
}

Note that we expanded the TA-COS submission into a journal paper, which was published in the CLIN Journal.

If you use the dictionary expansion techniques from this paper, please also consider citing it:

@article{tulkens2016automated,
  title={The automated detection of racist discourse in dutch social media},
  author={Tulkens, St{\'e}phan and Hilte, Lisa and Lodewyckx, Elise and Verhoeven, Ben and Daelemans, Walter},
  journal={Computational Linguistics in the Netherlands Journal},
  volume={6},
  number={1},
  pages={3--20},
  year={2016}
}

usage

The dictionaries are in .csv format. The first word of each line is the category name, while the other words are the words in that category. Included is a python (2.7 & 3.x) script which reads in the dictionaries and outputs relative frequencies. It can be used for similar dictionaries, such as the LIWC dictionaries.

example

from dictfeaturizer import DictFeaturizer

# Load from csv
d = DictFeaturizer.load("expanded.csv")
text = "this is an example text".split()
score = d.transform(text)

# Direct initialization
direct = {"good": ["good", "splendid"], "bad": ["bad", "useless"]}
d = DictFeaturizer(direct, relative=False)
text = "This stuff is splendid".split()
score_2 = d.transform(text)

Open Source Agenda is not affiliated with "Clips Hades" Project. README Source: clips/hades

Stars

Open Issues

Last Commit

6 years ago

Repository

clips/hades

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/clips-hades"><img src="https://www.opensourceagenda.com/projects/clips-hades/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog

From the blog

Dec 11, 2022

How to Choose Which Programming Language to Learn First?

From the blog

Dec 11, 2022