Research Article

Prediction and Screening Model for Products Based on Fusion Regression and XGBoost Classification

Table 1

The advantages and disadvantages of SVM, RF, KNN, NB, LDA, and XGBoost.

MethodsAdvantageDisadvantage

SVM [16–19, 25, 29, 31]SVM avoids the complexity of high-dimensional space and directly uses the kernel function of this spaceSVM is difficult to implement for large training samples and determine the kernel function
RF [16, 19, 25–27, 31]RF can handle very high-dimensional data without feature selectionRF may overfit on some noisy classification or regression problems
KNN [31]The training time complexity of KNN is lower than the support vector machine (SVM)The amount of calculation is large
Compared with naive Bayes (NB), it has no assumptions about the data, has high accuracy, and is insensitive to outliersWhen the sample is unbalanced, the prediction accuracy of rare categories is low
NB [16, 32]NB performs well on small-scale data, and the algorithm is relatively simpleThe posterior probability is determined by the prior and the data, and then to determine the classification, so there is a certain error rate in the classification decision
LDA [16, 26]LDA works better when the sample classification information depends on the mean rather than the varianceLDA is not suitable for dimensionality reduction of samples from non-Gaussian distributions and may overfit
XGBoost [31]Regularization is added to the loss function to prevent overfittingThe split gain of many leaf nodes at the same level is low, and it is unnecessary to perform further splits, which may bring unnecessary overhead
Parallel computing makes the algorithm more efficient
Memory optimization