The code for our INTERSPEECH 2020 paper - Jointly Fine-Tuning "BERT-like'" Self Supervised Models to Improve Multimodal Speech Emotion Recognition
This repositary consist the pytorch code for Multimodal Emotion Recogntion with pretreined Roberta and Speech-BERT.
This can be bit tricky in the beggining. First it is important to udnestand that Fairseq has built in a way that all architectures can be access through the terminal commands (args).
Since our architecture has lot of properties in tranformer architecture, we followed the a tutorial that describe to use Roberta for the custom classification task.
We build over archtiecture by inserting new stuff to following directories in Fairseq interfeace.
Custom dataloader for load raw audio, faceframes and text is in the fairseq/data/raw_audio_text_dataset.py
The task of the emotion prediction similar to other tasks such as translation is in the fairseq/tasks/emotion_prediction.py
The custom architecture of our model similar to roberta,wav2vec is in the fairseq/models/mulT_emo.py
The cross-attention was implemted by modifying the self attentional scripts in original fairseq repositary. They can be found in fairseq/modules/transformer_multi_encoder.py and fairseq/modules/transformer_layer.py
Finally the cutom loss function and ebaluation scripts can be found it fairseq/criterions/emotion_prediction_cri.py
We followed the Fairseq terminal commands to train and validate our models.
CUDA_VISIBLE_DEVICES=8,7 python train.py --data ./T_data/iemocap --restore-file None --task emotion_prediction --reset-optimizer --reset-dataloader --reset-meters --init-token 0 --separator-token 2 --arch robertEMO_large --criterion emotion_prediction_cri --num-classes 8 --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.1 --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-06 --clip-norm 0.0 --lr-scheduler polynomial_decay --lr 1e-05 --total-num-update 2760 --warmup-updates 165 --max-epoch 10 --best-checkpoint-metric loss --encoder-attention-heads 2 --batch-size 1 --encoder-layers-cross 1 --no-epoch-checkpoints --update-freq 8 --find-unused-parameters --ddp-backend=no_c10d --binary-target-iemocap --a-only --t-only --pooler-dropout 0.1 --log-interval 1 --data-raw ./iemocap_data/
CUDA_VISIBLE_DEVICES=1 python validate.py --data ./T_data/iemocap --path './checkpoints/checkpoint_best.pt' --task emotion_prediction --valid-subset test --batch-size 4
If you want to pre-process data again please refer to this repositary.