This repository contains EmoBank, a large-scale text corpus manually annotated with emotion according to the psychological Valence-Arousal-Dominance scheme. It was build at JULIE Lab, Jena University and is described in detail in our papers from EACL 2017 and LAW 2017 (see Citation). The repository contains two folders: "corpus" which contains the actual Emobank data (described in the EACL paper) and "pilot" which contains the data from our pilot study (described in the LAW paper). See the readme files in the respective folders for more detailed information regarding the data format.
EmoBank/corpus/individual_writer_ratings.csv, respectively. We also included a notebook (
EmoBank/corpus/aggregation.ipynb) illustrating how the individual ratings were aggregated.
EmoBank/corpus/emobank.csv. The data split is stratified with respect to text category (fiction, letters, newspaper,...). The code for creating the split can be found in
EmoBank/corpus/adding_data_split.ipynb. We recommend using this split for model evaluation to increase comparability.
EmoBank comprises 10k sentences balancing multiple genres. It is special for having two kinds of double annotations: Each sentence was annotated according to both the emotion which is expressed by the writer, and the emotion which is perceived by the readers. Also, a subset of the corpus have been previously annotated according to Ekmans 6 Basic Emotions (Strapparava and Mihalcea, 2007) so that mappings between both representation formats become possible.
The raw data of EmoBank is gathered from MASC, the manually annotated subcorpus of the ANC (Ide et al., 2010) and the SemEval 2007 Task 14 (Strapparava & Mihalcea, 2007). The raw data of the pilot studies is taken from MASC and the Standford Sentiment Treebank (Socher et al., 2013), originally collected by Pang and Lee (2005).
This work is licensed under CC-BY-SA 4.0: https://creativecommons.org/licenses/by-sa/4.0/
Please cite the following papers if you use EmoBank:
Sven Buechel and Udo Hahn. 2017. EmoBank: Studying the Impact of Annotation Perspective and Representation Format on Dimensional Emotion Analysis. In EACL 2017 - Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia, Spain, April 3-7, 2017. Volume 2, Short Papers, pages 578-585. Available: http://aclweb.org/anthology/E17-2092
Sven Buechel and Udo Hahn. 2017. Readers vs. writers vs. texts: Coping with different perspectives of text understanding in emotion annotation. In LAW 2017 - Proceedings of the 11th Linguistic Annotation Workshop @ EACL 2017. Valencia, Spain, April 3, 2017, pages 1-12. Available: https://sigann.github.io/LAW-XI-2017/papers/LAW01.pdf
I am happy answer questions and give additional information via email: [email protected]