Complexity

Research Article

Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks

Table 2

The accuracy results of various preprocessing techniques for Vietnamese sentiment analysis based on machine learning classifiers.


Datasets	Classifiers	(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)	(9)

Dataset 1	LR	0.822	0.823	0.822	0.823	0.828	0.823	0.843	0.852	0.829
	SVM	0.832	0.833	0.832	0.833	0.830	0.835	0.852	0.856	0.837
	OVO	0.826	0.827	0.826	0.826	0.829	0.828	0.848	0.852	0.833
	OVR	0.826	0.827	0.826	0.826	0.829	0.828	0.848	0.852	0.833

Dataset 2	LR	0.707	0.707	0.707	0.707	0.703	0.712	0.740	0.742	0.707
	SVM	0.673	0.672	0.673	0.673	0.664	0.686	0.707	0.698	0.673
	OVO	0.715	0.715	0.715	0.715	0.705	0.718	0.748	0.745	0.714
	OVR	0.697	0.699	0.697	0.698	0.696	0.708	0.735	0.738	0.698

Dataset 3	LR	0.798	0.797	0.798	0.798	0.790	0.806	0.826	0.802	0.800
	SVM	0.799	0.797	0.799	0.800	0.789	0.806	0.822	0.811	0.801
	OVO	0.800	0.799	0.800	0.801	0.792	0.807	0.822	0.813	0.803
	OVR	0.800	0.799	0.800	0.801	0.792	0.807	0.822	0.813	0.803

Dataset 4	LR	0.805	0.806	0.805	0.805	0.798	0.808	0.828	0.815	0.811
	SVM	0.808	0.810	0.808	0.810	0.804	0.813	0.830	0.822	0.812
	OVO	0.806	0.807	0.806	0.806	0.804	0.812	0.830	0.824	0.813
	OVR	0.806	0.807	0.806	0.806	0.804	0.812	0.830	0.824	0.813

(1) Without preprocessing (the baseline result); (2) number removal; (3) punctuation removal; (4) elongated characters removal; (5) POS tagging selection; (6) intensifier handling; (7) negation handling (replacing the lexicons); (8) negation handling (using pretrained models); (9) emoji icons substitution.