Research Article
Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition
Table 3
Accuracy of emotion recognition of different modals.
| Convolution layers | Face features accuracy (%) | Scene features accuracy (%) | Image features accuracy (%) |
| L1 | 55.14 | 44.39 | 46.03 | L2 | 57.94 | 44.62 | 42.99 | L3 | 54.67 | 43.92 | 44.62 |
|
|