Computational Intelligence and Neuroscience

Research Article

Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks

Table 3

Results on MOSEI. Note: (B) Means the language features are based on BERT; model with represents the best results for recurrence under the same conditions. ○ is from [10], and ◇ is from [11]. In indicators Acc-2 and F1-score, the left side of “/” is calculated for negative and non-negative sentiment, while the right side of “/” is calculated for negative and positive sentiment.


Models	MOSEI					Data setting
Models	MAE (↓)	Corr (↑)	Acc-7 (↑)	Acc-2 (↑)	F1-score (↑)	Data setting

TFN (B)^○	0.593	0.700	50.2	−/82.5	−/82.1	Unaligned
LMF (B)^○	0.623	0.677	48.0	−/82.0	−/82.1	Unaligned
MFM (B)^○	0.568	0.717	51.3	−/84.4	−/84.3	Aligned
MULT^○	0.580	0.703	51.8	−/82.5	−/82.3	Aligned
ICCN (B)^○	0.565	0.713	51.6	−/84.2	−/84.2	Unaligned
MISA (B)^◇	0.555	0.756	52.2	83.6/85.5	83.8/85.3	Aligned
MAG-BERT (B)^◇	0.539	0.753	—	83.79/85.23	83.74/85.08	Aligned
Self-MM (B)^◇	0.530	0.765	—	82.81/85.17	82.53/85.30	Unaligned
MISA (B)	0.558	0.748	51.45	82.14/85.09	82.44/84.94	Aligned
Self-MM (B)	0.534	0.764	53.32	84.37/85.28	84.42/85.06	Aligned
MGHF (B)	0.528	0.767	53.70	85.25/85.30	85.09/84.86	Aligned