Research Article

Classification Prediction of Breast Cancer Based on Machine Learning

Table 2

Summary of filtering methods in feature selection.

NameVariable typeSelection rules and variable requirements

Missing percentageUnivariateEliminate features that have too many missing samples and are difficult to fill
VarianceExclude features with variance close to or equal to 0 apply to categorical variables
FrequencyEliminate features that are overly concentrated on a certain category of values

Pearson’s correlation coefficientMultivariateFeatures with correlation coefficients close to or equal to 0 are removed, but the sample needs to follow a normal distribution
Analysis of varianceExclude features with an F value that is too low, or features with a value <0.05. And the population sample is required to have homogeneity of variance and independence between samples
Kendall tau rank correlation coefficientExclude features with correlation coefficients close to or equal to 0, and require the categories to be ordered
Mutual informationEliminate features with mutual information close to or equal to 0