Research Article
A Topic Recognition Method of News Text Based on Word Embedding Enhancement
Table 4
Results of the 20NewsGroup in 20 classes for 7532 texts by SVM and LR.
| Model | Average 5-fold micro-F1 score of different dimensions | 100 | 200 | 300 | 400 | 500 | | SVM | LR | | SVM | LR | | SVM | LR | | SVM | LR | | SVM | LR |
| TF-IDF | 100 | 0.373 | 0.372 | 200 | 0.48 | 0.475 | 300 | 0.586 | 0.580 | 400 | 0.637 | 0.631 | 500 | 0.671 | 0.663 | LDA | 100 | 0.686 | 0.682 | 200 | 0.692 | 0.689 | 300 | 0.712 | 0.711 | 400 | 0.715 | 0.714 | 500 | 0.721 | 0.723 | Glove | 100 | 0.724 | 0.710 | 200 | 0.767 | 0.754 | 300 | 0.784 | 0.771 | 400 | 0.794 | 0.780 | 500 | 0.799 | 0.788 | SGL | 100 | 0.745 | 0.732 | 200 | 0.779 | 0.767 | 300 | 0.792 | 0.780 | 400 | 0.807 | 0.794 | 500 | 0.813 | 0.802 | CGL | 200 | 0.782 | 0.772 | 400 | 0.804 | 0.792 | 600 | 0.812 | 0.796 | 800 | 0.822 | 0.806 | 1000 | 0.826 | 0.813 | Word2vec | 100 | 0.740 | 0.733 | 200 | 0.765 | 0.756 | 300 | 0.766 | 0.755 | 400 | 0.769 | 0.758 | 500 | 0.765 | 0.755 | SWL | 100 | 0.747 | 0.743 | 200 | 0.777 | 0.769 | 300 | 0.782 | 0.771 | 400 | 0.787 | 0.777 | 500 | 0.787 | 0.777 | CWL | 200 | 0.780 | 0.776 | 400 | 0.795 | 0.787 | 600 | 0.793 | 0.782 | 800 | 0.793 | 0.783 | 1000 | 0.796 | 0.785 |
|
|
Bold indicates that values are the optimal results.
|