Research Article
Vietnamese Sentiment Analysis under Limited Training Data Based on Deep Neural Networks
Table 2
The accuracy results of various preprocessing techniques for Vietnamese sentiment analysis based on machine learning classifiers.
| Datasets | Classifiers | (1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) |
| Dataset 1 | LR | 0.822 | 0.823 | 0.822 | 0.823 | 0.828 | 0.823 | 0.843 | 0.852 | 0.829 | SVM | 0.832 | 0.833 | 0.832 | 0.833 | 0.830 | 0.835 | 0.852 | 0.856 | 0.837 | OVO | 0.826 | 0.827 | 0.826 | 0.826 | 0.829 | 0.828 | 0.848 | 0.852 | 0.833 | OVR | 0.826 | 0.827 | 0.826 | 0.826 | 0.829 | 0.828 | 0.848 | 0.852 | 0.833 |
| Dataset 2 | LR | 0.707 | 0.707 | 0.707 | 0.707 | 0.703 | 0.712 | 0.740 | 0.742 | 0.707 | SVM | 0.673 | 0.672 | 0.673 | 0.673 | 0.664 | 0.686 | 0.707 | 0.698 | 0.673 | OVO | 0.715 | 0.715 | 0.715 | 0.715 | 0.705 | 0.718 | 0.748 | 0.745 | 0.714 | OVR | 0.697 | 0.699 | 0.697 | 0.698 | 0.696 | 0.708 | 0.735 | 0.738 | 0.698 |
| Dataset 3 | LR | 0.798 | 0.797 | 0.798 | 0.798 | 0.790 | 0.806 | 0.826 | 0.802 | 0.800 | SVM | 0.799 | 0.797 | 0.799 | 0.800 | 0.789 | 0.806 | 0.822 | 0.811 | 0.801 | OVO | 0.800 | 0.799 | 0.800 | 0.801 | 0.792 | 0.807 | 0.822 | 0.813 | 0.803 | OVR | 0.800 | 0.799 | 0.800 | 0.801 | 0.792 | 0.807 | 0.822 | 0.813 | 0.803 |
| Dataset 4 | LR | 0.805 | 0.806 | 0.805 | 0.805 | 0.798 | 0.808 | 0.828 | 0.815 | 0.811 | SVM | 0.808 | 0.810 | 0.808 | 0.810 | 0.804 | 0.813 | 0.830 | 0.822 | 0.812 | OVO | 0.806 | 0.807 | 0.806 | 0.806 | 0.804 | 0.812 | 0.830 | 0.824 | 0.813 | OVR | 0.806 | 0.807 | 0.806 | 0.806 | 0.804 | 0.812 | 0.830 | 0.824 | 0.813 |
|
|
(1) Without preprocessing (the baseline result); (2) number removal; (3) punctuation removal; (4) elongated characters removal; (5) POS tagging selection; (6) intensifier handling; (7) negation handling (replacing the lexicons); (8) negation handling (using pretrained models); (9) emoji icons substitution.
|