SER Datasets Save

A collection of datasets for the purpose of emotion recognition/detection in speech.

Project README

Spoken Emotion Recognition Datasets: A collection of datasets (count=43) for the purpose of emotion recognition/detection in speech. The table is chronologically ordered and includes a description of the content of each dataset along with the emotions included. The table can be browsed, sorted and searched under https://superkogito.github.io/SER-datasets/

Dataset	Year	Content	Emotions	Format	Size	Language	Paper	Access	License
_MESD	₂₀₂₂	_{864 audio files of single-word emotional utterances with Mexican cultural shaping.}	_{6 emotions provides single-word utterances for anger, disgust, fear, happiness, neutral, and sadness.}	_Audio	_{0,097 GB}	_{Spanish (Mexican)}	_{The Mexican Emotional Speech Database (MESD): elaboration and assessment based on machine learning}	_Open	_{CC BY 4.0}
_SyntAct	₂₀₂₂	_{Synthesized database of three basic emotions and neutral expression based on rule-based manipulation for a diphone synthesizer which we release to the public}	_{997 utterances including 6 emotions: angry, bored, happy, neutral, sad and scared}	_Audio	_{941 MB}	_German	_{SyntAct: A Synthesized Database of Basic Emotions}	_Open	_{CC BY-SA 4.0}
_MLEnd	₂₀₂₁	_{~32700 audio recordings files produced by 154 speakers. Each audio recording corresponds to one English numeral (from "zero" to "billion")}	_{Intonations: neutral, bored, excited and question}	_Audio	_{2.27 GB}	_--	_--	_Open	_Unknown
_ASVP-ESD	₂₀₂₁	_{~13285 audio files collected from movies, tv shows and youtube containing speech and non-speech.}	_{12 different natural emotions (boredom, neutral, happiness, sadness, anger, fear, surprise, disgust, excitement, pleasure, pain, disappointment) with 2 levels of intensity.}	_Audio	_{2 GB}	_{Chinese, English, French, Russian and others}	_--	_Open	_Unknown
_ESD	₂₀₂₁	_{29 hours, 3500 sentences, by 10 native English speakers and 10 native Chinese speakers.}	_{5 emotions: angry, happy, neutral, sad, and surprise.}	_{Audio, Text}	_{2.4 GB (zip)}	_{Chinese, English}	_{Seen And Unseen Emotional Style Transfer For Voice Conversion With A New Emotional Speech Dataset}	_Open	_{Academic License}
_MuSe-CAR	₂₀₂₁	_{40 hours, 6,000+ recordings of 25,000+ sentences by 70+ English speakers (see db link for details).}	_{continuous emotion dimensions characterized using valence, arousal, and trustworthiness.}	_{Audio, Video, Text}	_{15 GB}	_English	_{The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements}	_Restricted	_{Academic License & Commercial License}
_{MSP-Podcast corpus}	₂₀₂₀	_{100 hours by over 100 speakers (see db link for details).}	_{This corpus is annotated with emotional labels using attribute-based descriptors (activation, dominance and valence) and categorical labels (anger, happiness, sadness, disgust, surprised, fear, contempt, neutral and other).}	_Audio	_--	_--	_{The MSP-Conversation Corpus}	_Restricted	_{Academic License & Commercial License}
_{emotiontts open db}	₂₀₂₀	_{Recordings and their associated transcriptions by a diverse group of speakers.}	_{4 emotions: general, joy, anger, and sadness.}	_{Audio, Text}	_--	_Korean	_--	_{Partially open}	_{CC BY-NC-SA 4.0}
_URDU-Dataset	₂₀₂₀	_{400 utterances by 38 speakers (27 male and 11 female).}	_{4 emotions: angry, happy, neutral, and sad.}	_Audio	_{0.072 GB}	_Urdu	_{Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages}	_Open	_--
_BAVED	₂₀₂₀	_{1935 recording by 61 speakers (45 male and 16 female).}	_{3 levels of emotion.}	_Audio	_{0.195 GB}	_Arabic	_--	_Open	_--
_VIVAE	₂₀₂₀	_{non-speech, 1085 audio file by 12 speakers.}	_{non-speech 6 emotions: achievement, anger, fear, pain, pleasure, and surprise with 3 emotional intensities (low, moderate, strong, peak).}	_Audio	_--	_--	_--	_Restricted	_{CC BY-NC-SA 4.0}
_SEWA	₂₀₁₉	_{more than 2000 minutes of audio-visual data of 398 people (201 male and 197 female) coming from 6 cultures.}	_{emotions are characterized using valence and arousal.}	_{Audio, Video}	_--	_{Chinese, English, German, Greek, Hungarian and Serbian}	_{SEWA DB: A Rich Database for Audio-Visual Emotion and Sentiment Research in the Wild}	_Restricted	_{SEWA EULA}
_MELD	₂₀₁₉	_{1400 dialogues and 14000 utterances from Friends TV series by multiple speakers.}	_{7 emotions: Anger, disgust, sadness, joy, neutral, surprise and fear. MELD also has sentiment (positive, negative and neutral) annotation for each utterance.}	_{Audio, Video, Text}	_{10.1 GB}	_English	_{MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations}	_Open	_{MELD: GPL-3.0 License}
_ShEMO	₂₀₁₉	_{3000 semi-natural utterances, equivalent to 3 hours and 25 minutes of speech data from online radio plays by 87 native-Persian speakers.}	_{6 emotions: anger, fear, happiness, sadness, neutral and surprise.}	_Audio	_{0.101 GB}	_Persian	_{ShEMO: a large-scale validated database for Persian speech emotion detection}	_Open	_--
_DEMoS	₂₀₁₉	_{9365 emotional and 332 neutral samples produced by 68 native speakers (23 females, 45 males).}	_{7/6 emotions: anger, sadness, happiness, fear, surprise, disgust, and the secondary emotion guilt.}	_Audio	_--	_Italian	_{DEMoS: An Italian emotional speech corpus. Elicitation methods, machine learning, and perception}	_Restricted	_{EULA: End User License Agreement}
_AESDD	₂₀₁₈	_{around 500 utterances by a diverse group of actors (over 5 actors) siumlating various emotions.}	_{5 emotions: anger, disgust, fear, happiness, and sadness.}	_Audio	_{0.392 GB}	_Greek	_{Speech Emotion Recognition for Performance Interaction}	_Open	_--
_Emov-DB	₂₀₁₈	_{Recordings for 4 speakers- 2 males and 2 females.}	_{The emotional styles are neutral, sleepiness, anger, disgust and amused.}	_Audio	_{5.88 GB}	_English	_{The emotional voices database: Towards controlling the emotion dimension in voice generation systems}	_Open	_--
_RAVDESS	₂₀₁₈	_{7356 recordings by 24 actors.}	_{7 emotions: calm, happy, sad, angry, fearful, surprise, and disgust}	_{Audio, Video}	_{24.8 GB}	_English	_{The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English}	_Open	_{CC BY-NC-SA 4.0}
_{JL corpus}	₂₀₁₈	_{2400 recording of 240 sentences by 4 actors (2 males and 2 females).}	_{5 primary emotions: angry, sad, neutral, happy, excited. 5 secondary emotions: anxious, apologetic, pensive, worried, enthusiastic.}	_Audio	_--	_English	_{An Open Source Emotional Speech Corpus for Human Robot Interaction Applications}	_Open	_{CC0 1.0}
_CaFE	₂₀₁₈	_{6 different sentences by 12 speakers (6 fmelaes + 6 males).}	_{7 emotions: happy, sad, angry, fearful, surprise, disgust and neutral. Each emotion is acted in 2 different intensities.}	_Audio	_{2 GB}	_{French (Canadian)}	_--	_Open	_{CC BY-NC-SA 4.0}
_EmoFilm	₂₀₁₈	_{1115 audio instances sentences extracted from various films.}	_{5 emotions: anger, contempt, happiness, fear, and sadness.}	_Audio	_--	_{English, Italian & Spanish}	_{Categorical vs Dimensional Perception of Italian Emotional Speech}	_Restricted	_{EULA: End User License Agreement}
_ANAD	₂₀₁₈	_{1384 recording by multiple speakers.}	_{3 emotions: angry, happy, surprised.}	_Audio	_{2 GB}	_Arabic	_{Arabic Natural Audio Dataset}	_Open	_{CC BY-NC-SA 4.0}
_EmoSynth	₂₀₁₈	_{144 audio file labelled by 40 listeners.}	_{Emotion (no speech) defined in regard of valence and arousal.}	_Audio	_{0.1034 GB}	_--	_{The Perceived Emotion of Isolated Synthetic Audio: The EmoSynth Dataset and Results}	_Open	_{CC BY 4.0}
_CMU-MOSEI	₂₀₁₈	_{65 hours of annotated video from more than 1000 speakers and 250 topics.}	_{6 Emotion (happiness, sadness, anger,fear, disgust, surprise) + Likert scale.}	_{Audio, Video}	_--	_English	_{Multi-attention Recurrent Network for Human Communication Comprehension}	_Open	_{CMU-MOSEI License}
_VERBO	₂₀₁₈	_{14 different phrases by 12 speakers (6 female + 6 male) for a total of 1167 recordings.}	_{7 emotions: Happiness, Disgust, Fear, Neutral, Anger, Surprise, Sadness}	_Audio	_--	_Portuguese	_{VERBO: Voice Emotion Recognition dataBase in Portuguese Language}	_Restricted	_{Available for research purposes only}
_CMU-MOSI	₂₀₁₇	_{2199 opinion utterances with annotated sentiment.}	_{Sentiment annotated between very negative to very positive in seven Likert steps.}	_{Audio, Video}	_--	_English	_{Multi-attention Recurrent Network for Human Communication Comprehension}	_Open	_{CMU-MOSI License}
_MSP-IMPROV	₂₀₁₇	_{20 sentences by 12 actors.}	_{4 emotions: angry, sad, happy, neutral, other, without agreement}	_{Audio, Video}	_--	_English	_{MSP-IMPROV: An Acted Corpus of Dyadic Interactions to Study Emotion Perception}	_Restricted	_{Academic License & Commercial License}
_CREMA-D	₂₀₁₇	_{7442 clip of 12 sentences spoken by 91 actors (48 males and 43 females).}	_{6 emotions: angry, disgusted, fearful, happy, neutral, and sad}	_{Audio, Video}	_--	_English	_{CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset}	_Open	_{Open Database License & Database Content License}
_{Example emotion videos used in investigation of emotion perception in schizophrenia}	₂₀₁₇	_{6 videos:Two example videos from each emotion category (angry, happy and neutral) by one female speaker.}	_{3 emotions: angry, happy and neutral.}	_{Audio, Video}	_{0.063 GB}	_English	_--	_Open	_{Permitted Non-commercial Re-use with Acknowledgment}
_EMOVO	₂₀₁₄	_{6 actors who played 14 sentences.}	_{6 emotions: disgust, fear, anger, joy, surprise, sadness.}	_Audio	_{0.355 GB}	_Italian	_{EMOVO Corpus: an Italian Emotional Speech Database}	_Open	_--
_RECOLA	₂₀₁₃	_{3.8 hours of recordings by 46 participants.}	_{negative and positive sentiment (valence and arousal).}	_{Audio, Video}	_--	_--	_{Introducing the RECOLA Multimodal Corpus of Remote Collaborative and Affective Interactions}	_Restricted	_{Academic License & Commercial License}
_{GEMEP corpus}	₂₀₁₂	_{Videos10 actors portraying 10 states.}	_{12 emotions: amusement, anxiety, cold anger (irritation), despair, hot anger (rage), fear (panic), interest, joy (elation), pleasure(sensory), pride, relief, and sadness. Plus, 5 additional emotions: admiration, contempt, disgust, surprise, and tenderness.}	_{Audio, Video}	_--	_French	_{Introducing the Geneva Multimodal Expression Corpus for Experimental Research on Emotion Perception}	_Restricted	_--
_OGVC	₂₀₁₂	_{9114 spontaneous utterances and 2656 acted utterances by 4 professional actors (two male and two female).}	_{9 emotional states: fear, surprise, sadness, disgust, anger, anticipation, joy, acceptance and the neutral state.}	_Audio	_--	_Japanese	_{Naturalistic emotional speech collectionparadigm with online game and its psychological and acoustical assessment}	_Restricted	_--
_{LEGO corpus}	₂₀₁₂	_{347 dialogs with 9,083 system-user exchanges.}	_{Emotions classified as garbage, non-angry, slightly angry and very angry.}	_Audio	_{1.1 GB}	_--	_{A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System}	_Open	_{License available with the data. Free of charges for research purposes only.}
_SEMAINE	₂₀₁₂	_{95 dyadic conversations from 21 subjects. Each subject converses with another playing one of four characters with emotions.}	_{5 FeelTrace annotations: activation, valence, dominance, power, intensity}	_{Audio, Video, Text}	_{104 GB}	_English	_{The SEMAINE Database: Annotated Multimodal Records of Emotionally Colored Conversations between a Person and a Limited Agent}	_Restricted	_{Academic EULA}
_SAVEE	₂₀₁₁	_{480 British English utterances by 4 males actors.}	_{7 emotions: anger, disgust, fear, happiness, sadness, surprise and neutral.}	_{Audio, Video}	_--	_{English (British)}	_{Multimodal Emotion Recognition}	_Restricted	_{Free of charges for research purposes only.}
_TESS	₂₀₁₀	_{2800 recording by 2 actresses.}	_{7 emotions: anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral.}	_Audio	_--	_English	_{BEHAVIOURAL FINDINGS FROM THE TORONTO EMOTIONAL SPEECH SET}	_Open	_{CC BY-NC-ND 4.0}
_EEKK	₂₀₀₇	_{26 text passage read by 10 speakers.}	_{4 main emotions: joy, sadness, anger and neutral.}	_--	_{0.352 GB}	_Estonian	_{Estonian Emotional Speech Corpus}	_Open	_{CC-BY license}
_IEMOCAP	₂₀₀₇	_{12 hours of audiovisual data by 10 actors.}	_{5 emotions: happiness, anger, sadness, frustration and neutral.}	_--	_--	_English	_{IEMOCAP: Interactive emotional dyadic motion capture database}	_Restricted	_{IEMOCAP license}
_Keio-ESD	₂₀₀₆	_{A set of human speech with vocal emotion spoken by a Japanese male speaker.}	_{47 emotions including angry, joyful, disgusting, downgrading, funny, worried, gentle, relief, indignation, shameful, etc.}	_Audio	_--	_Japanese	_{EMOTIONAL SPEECH SYNTHESIS USING SUBSPACE CONSTRAINTS IN PROSODY}	_Restricted	_{Available for research purposes only.}
_EMO-DB	₂₀₀₅	_{800 recording spoken by 10 actors (5 males and 5 females).}	_{7 emotions: anger, neutral, fear, boredom, happiness, sadness, disgust.}	_Audio	_--	_German	_{A Database of German Emotional Speech}	_Open	_--
_eNTERFACE05	₂₀₀₅	_{Videos by 42 subjects, coming from 14 different nationalities.}	_{6 emotions: anger, fear, surprise, happiness, sadness and disgust.}	_{Audio, Video}	_{0.8 GB}	_German	_--	_Open	_{Free of charges for research purposes only.}
_DES	₂₀₀₂	_{4 speakers (2 males and 2 females).}	_{5 emotions: neutral, surprise, happiness, sadness and anger}	_--	_--	_Danish	_{Documentation of the Danish Emotional Speech Database}	_--	_--

Swain, Monorama & Routray, Aurobinda & Kabisatpathy, Prithviraj, Databases, features and classifiers for speech emotion recognition: a review, International Journal of Speech Technology, paper
Dimitrios Ververidis and Constantine Kotropoulos, A State of the Art Review on Emotional Speech Databases, Artificial Intelligence & Information Analysis Laboratory, Department of Informatics Aristotle, University of Thessaloniki, paper
A. Pramod Reddy and V. Vijayarajan, Extraction of Emotions from Speech-A Survey, VIT University, International Journal of Applied Engineering Research, paper
Emotional Speech Databases, document
Expressive Synthetic Speech, website
Towards a standard set of acoustic features for the processing of emotion in speech, Technical university Munich, document

Contribution

All contributions are welcome! If you know a dataset that belongs here (see criteria) but is not listed, please feel free to add it. For more information on Contributing, please refer to CONTRIBUTING.md.
If you notice a typo or a mistake, please report this as an issue and help us improve the quality of this list.

Disclaimer

The mainter and the contributors try their best to keep this list up-to-date, and to only include working links (using automated verification with the help of the urlchecker-action). However, we cannot guarantee that all listed links are up-to-date. Read more in DISCLAIMER.md.

Open Source Agenda is not affiliated with "SER Datasets" Project. README Source: SuperKogito/SER-datasets

Stars

247

Open Issues

Last Commit

1 week ago

Repository

SuperKogito/SER-datasets

License

MIT

Homepage

https://superkogito.github.io/SER-datasets

Open Source Agenda Badge

<a href="https://www.opensourceagenda.com/projects/ser-datasets"><img src="https://www.opensourceagenda.com/projects/ser-datasets/reviews/badge.svg" alt="Open Source Agenda"></a>

Submit Review Review Your Favorite Project

Submit Resource Articles, Courses, Videos

Submit Article Submit a post to our blog