Research Article

Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition

Table 3

Accuracy of emotion recognition of different modals.

Convolution layersFace features accuracy (%)Scene features accuracy (%)Image features accuracy (%)

L155.1444.3946.03
L257.9444.6242.99
L354.6743.9244.62