Classification Prediction of Breast Cancer Based on Machine Learning
Table 2
Summary of filtering methods in feature selection.
Name
Variable type
Selection rules and variable requirements
Missing percentage
Univariate
Eliminate features that have too many missing samples and are difficult to fill
Variance
Exclude features with variance close to or equal to 0 apply to categorical variables
Frequency
Eliminate features that are overly concentrated on a certain category of values
Pearson’s correlation coefficient
Multivariate
Features with correlation coefficients close to or equal to 0 are removed, but the sample needs to follow a normal distribution
Analysis of variance
Exclude features with an F value that is too low, or features with a value <0.05. And the population sample is required to have homogeneity of variance and independence between samples
Kendall tau rank correlation coefficient
Exclude features with correlation coefficients close to or equal to 0, and require the categories to be ordered
Mutual information
Eliminate features with mutual information close to or equal to 0