Computational Intelligence and Neuroscience

Research Article

Prediction and Screening Model for Products Based on Fusion Regression and XGBoost Classification

Table 1

The advantages and disadvantages of SVM, RF, KNN, NB, LDA, and XGBoost.


Methods	Advantage	Disadvantage

SVM [16–19, 25, 29, 31]	SVM avoids the complexity of high-dimensional space and directly uses the kernel function of this space	SVM is difficult to implement for large training samples and determine the kernel function
RF [16, 19, 25–27, 31]	RF can handle very high-dimensional data without feature selection	RF may overfit on some noisy classification or regression problems
KNN [31]	The training time complexity of KNN is lower than the support vector machine (SVM)	The amount of calculation is large
KNN [31]	Compared with naive Bayes (NB), it has no assumptions about the data, has high accuracy, and is insensitive to outliers	When the sample is unbalanced, the prediction accuracy of rare categories is low
NB [16, 32]	NB performs well on small-scale data, and the algorithm is relatively simple	The posterior probability is determined by the prior and the data, and then to determine the classification, so there is a certain error rate in the classification decision
LDA [16, 26]	LDA works better when the sample classification information depends on the mean rather than the variance	LDA is not suitable for dimensionality reduction of samples from non-Gaussian distributions and may overfit
XGBoost [31]	Regularization is added to the loss function to prevent overfitting	The split gain of many leaf nodes at the same level is low, and it is unnecessary to perform further splits, which may bring unnecessary overhead
	Parallel computing makes the algorithm more efficient
	Memory optimization