|
Model | Main idea | Advantages and disadvantages |
|
Artificial neural network (ANN) | There are three types of processing units in the network: input unit, output unit, and hidden unit. The input unit receives signals and data from the outside world. The output unit realizes the output of system processing results. A hidden unit is a unit that lies between an input and output unit and cannot be viewed from outside the system. ANN is a kind of nonprogrammed, adaptive, and brain-style information processing mode, whose essence is to obtain a parallel and distributed information processing function through network transformation and dynamic behavior. | Advantages: ① it is a simple application; ② it has more accurate classification results; and ③ it has the ability to quickly search for optimization. Disadvantages: ① it easily enters the local optimum. |
|
SVM | The algorithm finds a dividing hyperplane that can correctly separate the two types of data on both sides to achieve the effect of data classification and prediction. This hyperplane is determined by the support vectors. | Advantages: ① the “curse of dimensionality” can be avoided; ② it has a known effective algorithm that can be used to find the global minimum of the objective function; ③ the generalization ability of the algorithm is good. Disadvantages: ① it is difficult to implement large-scale training samples; ② it has difficulty solving the multicategory problem; ③ it is sensitive to parameter and kernel function selection. |
|
Random Forest (RF) | The forest is composed of many trees, so the result of RF depends on the decision result of multiple trees. This is an integrated learning idea. For example, there is a new animal in the forest, and the forest holds a forest meeting to determine what kind of animal it is. Every tree must express its opinions. The result with the most votes will be the final result. | Advantages: ① it can handle very high-dimensional (many features) data, and there is no need to perform feature selection; ② the training speed is fast, and it is easy to make a parallel method; ③ the implementation is relatively simple. Disadvantages: ① it is prone to overfitting; ② for data with attributes with different values, the attribute weights produced by RF on such data are unreliable. |
|
AdaBoost | The algorithm trains several individual learners with a certain combination strategy so that a strong learner can finally be formed to achieve the goal of more people and more power. | Advantage: ① under the framework of AdaBoost, various classification models can be used to build weak learners, which is very flexible; ② given its high precision, it can be applied to most classifiers without the need to adjust parameters. Disadvantages: ① unbalanced data leads to a decrease in classification accuracy; ② training is time-consuming. |
|
CNN | A method consisting of the following layered form: input layer: data entry Convolutional layer: for feature extraction Pooling layer: used to extract features again Hidden layer: the layer in the middle Fully connected layer: after vectorizing the extracted feature matrix, classify its features. | Advantages: it has a high classification accuracy rate. Disadvantages: ① parameters need to be adjusted; ② it needs large amount of data; ③ it requires a large amount of calculation. |
|