Research Article

Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks

Table 2

Results on MOSI. Note: (B) Means the language features are based on BERT; model with represents the best results for recurrence under the same conditions. ○ is from[10], and ◇ is from [11]. In indicators Acc-2 and F1-score, the left side of “/” is calculated for negative and non-negative sentiment, while the right side of “/” is calculated for negative and positive sentiment.

ModelsMOSIData setting
MAE (↓)Corr (↑)Acc-7 (↑)Acc-2 (↑)F1-score (↑)

TFN (B)0.9010.69834.9−/80.8−/80.7Unaligned
LMF (B)0.9170.69533.2−/82.5−/82.4Unaligned
MFM (B)0.8770.70635.4−/81.7−/81.6Aligned
MULT0.9180.68036.4777.93/79.377.91/79.34Aligned
ICCN (B)0.8600.71039.0−/83.0−/83.0Unaligned
MISA (B)0.7830.76142.381.8/83.481.7/83.6Aligned
MAG-BERT (B)0.7310.78982.54/84.382.59/84.3Aligned
Self-MM (B)0.7130.79884.42/85.9584.42/85.95Unaligned
MISA (B)0.7590.78742.5781.05/82.9381.03/82.97Aligned
Self-MM (B)0.7180.79645.7783.09/84.0983.10/84.96Aligned
MGHF (B)0.7090.80245.1983.38/85.2183.32/85.21Aligned