|
Methods | Advantage | Disadvantage |
|
SVM [16ā19, 25, 29, 31] | SVM avoids the complexity of high-dimensional space and directly uses the kernel function of this space | SVM is difficult to implement for large training samples and determine the kernel function |
RF [16, 19, 25ā27, 31] | RF can handle very high-dimensional data without feature selection | RF may overfit on some noisy classification or regression problems |
KNN [31] | The training time complexity of KNN is lower than the support vector machine (SVM) | The amount of calculation is large |
Compared with naive Bayes (NB), it has no assumptions about the data, has high accuracy, and is insensitive to outliers | When the sample is unbalanced, the prediction accuracy of rare categories is low |
NB [16, 32] | NB performs well on small-scale data, and the algorithm is relatively simple | The posterior probability is determined by the prior and the data, and then to determine the classification, so there is a certain error rate in the classification decision |
LDA [16, 26] | LDA works better when the sample classification information depends on the mean rather than the variance | LDA is not suitable for dimensionality reduction of samples from non-Gaussian distributions and may overfit |
XGBoost [31] | Regularization is added to the loss function to prevent overfitting | The split gain of many leaf nodes at the same level is low, and it is unnecessary to perform further splits, which may bring unnecessary overhead |
Parallel computing makes the algorithm more efficient |
Memory optimization |
|