The authors have improved the speed of fetching the best features using orthogonal least squares. They have compared mutual information and other embedded methods.
Multiple correlation coefficient and the canonical correlation coefficient can be improved when feature generation and instance generation methods are used.
A set of features which can represent a strong convergence to a set of classes is identified. This increases the position of classification margin and reduces the error.
The noisy features can be identified and removed before finding the strong convergence.
Most of the feature selection methods just use frequency. The authors used category information as additional metric to select features for classification.
Semantics information can degrade the performance of the classification.