Research Article
A Deep Multimodal Model for Predicting Affective Responses Evoked by Movies Based on Shot Segmentation
Table 5
With or without capture changes in audio and visual feature sequences using LSTM.
| Model (with Features6) | Experienced arousal (loss1) | Experienced valence (loss2) | MSE | PCC | MSE | PCC |
| Ours without LSTM | 0.0288 | 0.5826 | 0.0751 | 0.3276 | Ours | 0.0275 | 0.6187 | 0.0632 | 0.3443 |
|
|