Research Article

Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks

Table 3

Results on MOSEI. Note: (B) Means the language features are based on BERT; model with represents the best results for recurrence under the same conditions. ○ is from [10], and ◇ is from [11]. In indicators Acc-2 and F1-score, the left side of “/” is calculated for negative and non-negative sentiment, while the right side of “/” is calculated for negative and positive sentiment.

ModelsMOSEIData setting
MAE (↓)Corr (↑)Acc-7 (↑)Acc-2 (↑)F1-score (↑)

TFN (B)0.5930.70050.2−/82.5−/82.1Unaligned
LMF (B)0.6230.67748.0−/82.0−/82.1Unaligned
MFM (B)0.5680.71751.3−/84.4−/84.3Aligned
MULT0.5800.70351.8−/82.5−/82.3Aligned
ICCN (B)0.5650.71351.6−/84.2−/84.2Unaligned
MISA (B)0.5550.75652.283.6/85.583.8/85.3Aligned
MAG-BERT (B)0.5390.75383.79/85.2383.74/85.08Aligned
Self-MM (B)0.5300.76582.81/85.1782.53/85.30Unaligned
MISA (B)0.5580.74851.4582.14/85.0982.44/84.94Aligned
Self-MM (B)0.5340.76453.3284.37/85.2884.42/85.06Aligned
MGHF (B)0.5280.76753.7085.25/85.3085.09/84.86Aligned