Research Article
A Deep Multimodal Model for Predicting Affective Responses Evoked by Movies Based on Shot Segmentation
Table 4
Comparison of state-of-the-art results for experienced emotion prediction.
| | Features | Arousal (loss1) | Valence (loss2) | | MSE | PCC | MSE | PCC |
| | All features | 0.0275 | 0.6187 | 0.0632 | 0.3443 | | −Action features | 0.0291 | 0.6038 | 0.0673 | 0.3259 | | −Face features | 0.0277 | 0.6136 | 0.0637 | 0.3667 | | −Person features | 0.0280 | 0.6181 | 0.0653 | 0.3726 | | −Place features | 0.0280 | 0.5981 | 0.0663 | 0.3315 | | −VGGish features | 0.0290 | 0.5952 | 0.0669 | 0.3444 | | −OpenSMILE features | 0.0295 | 0.6003 | 0.0666 | 0.3345 | | All_visual_features | 0.0316 | 0.4931 | 0.0751 | 0.2694 | | All_audio_features | 0.0297 | 0.6141 | 0.0726 | 0.3356 |
|
|
“−” indicates without the feature.
|