Computational Intelligence and Neuroscience

Research Article

Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks

Table 2

Results on MOSI. Note: (B) Means the language features are based on BERT; model with represents the best results for recurrence under the same conditions. ○ is from[10], and ◇ is from [11]. In indicators Acc-2 and F1-score, the left side of “/” is calculated for negative and non-negative sentiment, while the right side of “/” is calculated for negative and positive sentiment.


Models	MOSI					Data setting
Models	MAE (↓)	Corr (↑)	Acc-7 (↑)	Acc-2 (↑)	F1-score (↑)	Data setting

TFN (B)^○	0.901	0.698	34.9	−/80.8	−/80.7	Unaligned
LMF (B)^○	0.917	0.695	33.2	−/82.5	−/82.4	Unaligned
MFM (B)^○	0.877	0.706	35.4	−/81.7	−/81.6	Aligned
MULT	0.918	0.680	36.47	77.93/79.3	77.91/79.34	Aligned
ICCN (B)^◇	0.860	0.710	39.0	−/83.0	−/83.0	Unaligned
MISA (B)^◇	0.783	0.761	42.3	81.8/83.4	81.7/83.6	Aligned
MAG-BERT (B)^◇	0.731	0.789	—	82.54/84.3	82.59/84.3	Aligned
Self-MM (B)^◇	0.713	0.798	—	84.42/85.95	84.42/85.95	Unaligned
MISA (B)	0.759	0.787	42.57	81.05/82.93	81.03/82.97	Aligned
Self-MM (B)	0.718	0.796	45.77	83.09/84.09	83.10/84.96	Aligned
MGHF (B)	0.709	0.802	45.19	83.38/85.21	83.32/85.21	Aligned