Complexity

Research Article

Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks

Table 3

The accuracy results of various data augmentation techniques for Vietnamese sentiment analysis based on machine learning classifiers.


Datasets	Classifiers	(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)

Dataset 1	LR	0.850	0.864	0.848	0.865	0.863	0.856	0.858	0.858
	SVM	0.859	0.852	0.812	0.860	0.865	0.856	0.861	0.831
	OVO	0.854	0.857	0.854	0.863	0.864	0.856	0.862	0.858
	OVR	0.854	0.857	0.854	0.863	0.864	0.856	0.862	0.858

Dataset 2	LR	0.742	0.743	0.752	0.754	0.752	0.751	0.749	0.748
	SVM	0.721	0.741	0.711	0.754	0.736	0.746	0.720	0.732
	OVO	0.751	0.745	0.745	0.756	0.759	0.747	0.751	0.750
	OVR	0.734	0.731	0.738	0.741	0.747	0.738	0.739	0.738

Dataset 3	LR	0.826	0.831	0.832	0.838	0.831	0.825	0.828	0.826
	SVM	0.824	0.829	0.825	0.832	0.831	0.820	0.826	0.826
	OVO	0.823	0.829	0.827	0.831	0.829	0.819	0.825	0.826
	OVR	0.823	0.829	0.827	0.831	0.829	0.819	0.825	0.826

Dataset 4	LR	0.816	0.830	0.829	0.829	0.820	0.814	0.822	0.821
	SVM	0.828	0.826	0.833	0.831	0.834	0.817	0.826	0.827
	OVO	0.826	0.826	0.833	0.829	0.834	0.816	0.826	0.827
	OVR	0.826	0.826	0.833	0.829	0.834	0.816	0.826	0.827

(1) Preprocessing techniques; (2) EDA; (3) sentence shuffling; (4) back translation; (5) syntax-tree transformation; (6) contextual substitution (w2v); (7) contextual substitution ( + ); (8) masked language model (PhoBERT).