Uncertainty Measurement and Attribute Reduction Algorithm Based on Kernel Similarity Rough Set Model

Chen, Baoguo; Chen, Lei; Deng, Ming

doi:https://doi.org/10.1155/2022/5675200

Journal of Mathematics

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Fuzzy Approximation and its Applications in the Stabilization of Dynamic Systems 2021

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 5675200 | https://doi.org/10.1155/2022/5675200

Uncertainty Measurement and Attribute Reduction Algorithm Based on Kernel Similarity Rough Set Model

Baoguo Chen,¹Lei Chen,¹and Ming Deng¹

Academic Editor: Peijun Wang

Received29 Nov 2021

Accepted15 Jan 2022

Published22 Feb 2022

Abstract

Attribute reduction is the core research content in rough set theory. At present, the attribute reduction of numerical information system mostly adopts the neighborhood rough set method. In order to further improve the similarity measurement effect between data objects, kernel function method is used to construct a new rough set model in numerical information system, and an uncertainty measurement method and attribute reduction method are proposed. Firstly, the similarity between objects of numerical information system is calculated by kernel function, and a granular structure model and rough set model based on kernel similarity relation are proposed. Then, from the perspective of kernel similarity rough approximation, an information system uncertainty measurement method called kernel approximation precision and kernel approximation roughness is proposed. Because these two measurement methods do not meet the strict monotonicity of information granulation, the concept of kernel knowledge granularity based on kernel similarity granular structure is further proposed in this paper. By combining kernel approximation precision and kernel approximation roughness with kernel knowledge granularity, an uncertainty measurement method of kernel similarity combination measurement is proposed. Finally, using the strict monotonicity of kernel similarity combination measurement, an attribute reduction algorithm for numerical information system is designed. Experimental analysis shows the effectiveness and superiority of the proposed method.

1. Introduction

Rough set theory was first proposed by Pawlak, a Polish scholar. At present, it has become an important mathematical analysis tool in the fields of machine learning, pattern recognition, and knowledge discovery [1, 2]. Attribute reduction is an important application of rough set theory. Its main purpose is to eliminate redundant and inconsistent attributes in the original data set and improve the performance of data classification and knowledge discovery. Based on rough set theory, many kinds of attribute reduction algorithms have been proposed [3–6].

Equivalence relation is used to partition data in classical rough set model, so attribute reduction can only be carried out for discrete information system, but there are a large number of numerical information systems in the actual environment. In order to solve this problem, scholars have established neighborhood rough set model under numerical information system and designed a variety of attribute reduction algorithms [7–10]. Hu et al. [11] proposed the earliest attribute reduction algorithm of numerical information system through neighborhood rough set. On the basis of Hu, several scholars improved neighborhood rough set, and Fan et al. [12] proposed attribute reduction algorithm based on maximum decision neighborhood rough set, Wang et al. [13] proposed an attribute reduction algorithm based on k-nearest neighbor rough set, Hu et al. [14] proposed an attribute reduction algorithm based on weighted neighborhood rough set. On the other hand, based on the neighborhood granulation of neighborhood rough set, many attribute evaluation methods of numerical information system are defined and used to construct attribute reduction algorithms. For example, He et al. [15] combined neighborhood granularity with neighborhood roughness and proposed an attribute reduction method of neighborhood combination measurement. Yao et al. [16] and Zhao and Yang [17] respectively defined the concepts of neighborhood rough mutual information entropy and neighborhood conditional entropy and designed the corresponding attribute reduction method. Wang et al. [18, 19] proposed neighborhood discrimination and neighborhood self-information degree on neighborhood rough set for attribute reduction of numerical information system. In short, attribute reduction based on neighborhood rough set has attracted more and more attention.

However, in the neighborhood rough set model, the similarity relation between data objects is measured by distance [7–19]. Therefore, attribute reduction based on neighborhood rough set is realized by distance measurement similarity. In the field of data analysis, the kernel method [20] is a general tool for processing, analyzing, and comparing various types of data. In the kernel method, the kernel function is used to map the data to a higher dimensional feature space, in which the data can become easier to separate or better structured. Because kernel function provides a method to calculate the point product between vectors in high-dimensional feature space, it can be used as a kind of similarity measurement method in machine learning and also has good similarity measurement effect [21, 22].

For numerical information system, in order to further improve the similarity measurement effect between data objects, a new rough set model is proposed by using kernel function to measure the similarity between objects, and the uncertainty measurement and attribute reduction of information system are studied. Firstly, the similarity between objects in numerical information system is calculated by kernel function, a concept of kernel similarity relation is proposed, and the granular structure model and rough set model based on kernel similarity relation are constructed. Then, based on the kernel similarity relation rough set model, an information system uncertainty measurement method of kernel approximation precision and kernel approximation roughness is proposed. Because the kernel approximation precision and kernel approximation roughness do not satisfy the strict monotonicity of information granulation, the concept of kernel knowledge granularity of granular structure model based on kernel similarity relation is defined. By combining kernel approximation precision and kernel approximation roughness with kernel knowledge granularity, an uncertainty measurement method of kernel similarity combination measurement is proposed. Finally, using the strict monotonicity of kernel similarity combination measurement, an attribute reduction algorithm for numerical information system is proposed. Experimental analysis shows the effectiveness and superiority of the proposed algorithm.

2. Basic Theory

In this section, we mainly introduce the concepts of neighborhood rough set and kernel function under numerical information system.

In rough set theory, numerical information system is a numerical attribute value information system [11], which is usually expressed as . Here, is called the universe of numerical information system, is the attribute set of numerical information system, and is the attribute value domain of numerical information system.

Definition 1 (see [11]). Given the numerical information system and the attribute subset , the neighborhood relation induced by the attribute subset under the numerical information system is defined aswhere is the distance measurement between objects and and represents the attribute value of object under attribute . Generally, takes 1, 2 or , is called the neighborhood radius of neighborhood relation .

Definition 2 (see [11]). Given the numerical information system , the neighborhood relation determined by the attribute subset is . The neighborhood class of about under the universe is defined as . According to the neighborhood class of each object in the universe , the neighborhood granular structure of the universe under the neighborhood relation can be further obtained. If , the neighborhood granular structure is expressed asDefinition 2 shows that given the neighborhood relation, the universe of information system can be induced into a neighborhood granular structure of the universe, that is, the neighborhood granulation of the universe from the perspective of granular computing [11]. The neighborhood granulation can be used for various rough approximate calculations of information system [11–14].

Definition 3 (see [11]). Given the numerical information system , the neighborhood relation determined by the attribute subset is , and the induced granular structure is . Then, for the approximation object set , the lower approximation set and the upper approximation set with respect to the neighborhood relation are defined aswhere is called the neighborhood rough set of the approximation object set under neighborhood relation .
Kernel function provides an effective method for the calculation of inner product in high-dimensional space. Usually, kernel function is interpreted as a similarity measurement between a pair of objects.

Definition 4 (see [20]). Given the numerical information system , when the function mapping ( is a set of real numbers) satisfies the following two relations at the same time, the function is called kernel function:(1)(2)Table 1 provides three common kernel functions [20], where represents the Euclidean distance between the object and , and the parameter is a variable value. The larger the parameter value is, the closer the value of these kernel functions will be to 1, and the smaller the parameter value is, the closer their value will be to 0. The selection of the optimal value of parameter is beyond the scope of this paper, which is not discussed in this paper. In the following, we assume that the parameter has given an appropriate value.

3. Rough Set Model and Uncertainty Measurement Based on Kernel Similarity Relation

Since kernel function provides a method to measure the similarity between objects, in this section, the binary relation in numerical information system will be defined through kernel function, which is called kernel similarity relation. Then, the kernel similarity relation is used to induce the information granulation of numerical information system, and the corresponding rough set model is proposed. At the same time, the uncertainty measurement of numerical information system is further studied.

3.1. Granular Structure and Rough Set Model Based on Kernel Similarity Relation

Definition 5. Given the numerical information system , attribute subset , and kernel similarity threshold , the kernel similarity relation is defined aswhere is the kernel similarity measurement result between and . For , the kernel similarity class under the kernel similarity relation is defined asThrough the definition of kernel function in Definition 4 and the definition of kernel similarity relation in Definition 5, it can be seen that the kernel similarity relation satisfies reflexivity and symmetry, but not necessarily transitivity.

Property 1. Given the numerical information system , attribute subsets and kernel similarity thresholds . For , so satisfied(1)If , then (2)If , then

Proof. (1)For , according to the definition of Euclidean distance, if exists when is met, is established, and can be obtained based on Definition 5.(2)For , there must be when , but not when . Therefore, based on Definition 5, if , there is .In Property 1, (1) it shows that the kernel similarity class decreases monotonically with the increase of attributes and (2) it shows that the kernel similarity class decreases monotonically with the increase of kernel similarity threshold.
In Definition 4, the kernel similarity relation of numerical information system is established by kernel function, and the kernel similarity class of each object can be calculated by kernel similarity relation. From the perspective of granular computing, each kernel similarity class is a subset of the universe, so it can be regarded as a unit of information granular computing. The collection of all object kernel similarity classes in the universe forms the coverage of a universe. This paper calls it kernel similarity granular structure. At the same time, by using different attribute subsets and kernel similarity thresholds, different kernel similarity granular structures can be constructed. Therefore, the granular structure framework model of numerical information system based on kernel similarity relation can be established.

Definition 6. Given the numerical information system , , attribute subset , and kernel similarity threshold , the kernel similarity relation determined by attribute subset is . Then, the kernel similarity granular structure induced by in universe is defined aswhere is called the kernel similarity granular of object under .
In the numerical information system, each object can generate a kernel similarity granular. Given different attribute subsets and kernel similarity thresholds, different kernel similarity granular structures can be constructed, which requires us to compare the two kernel similarity granular structures. Therefore, next, we give the refinement and coarsening relationship of kernel similarity granular structure.

Definition 7. Given the numerical information system , attribute subsets and kernel similarity threshold :(1)(2)In Definition 7, when , we call the kernel similarity granular structure equivalent to , expressed as . When , we call the kernel similarity granular structure refined in and coarsened in , which is expressed as . In particular, when and , we call the kernel similarity granular structure strictly refined in and strictly coarsened in , expressed as .
The refinement and coarsening relationships satisfy the transitivity of the relationship. For example, if satisfies the refinement and coarsening relationships and , there is , and so is the strict refinement and strict coarsening relationship.

Property 2. Given the numerical information system , attribute subsets and kernel similarity thresholds . Then, the kernel similarity granular structure is satisfied(1)If , then (2)If , then

Proof. According to Property 1 and Definition 7, Property 2 can be obtained directly.
In Property 2, (1) it shows the monotonic refinement of kernel similarity granular structure with the increase of attribute, and (2) it shows the monotonic refinement of kernel similarity granular structure with the increase of kernel similarity threshold.
Granular structure shows the spatial structure of information system based on granular and takes granular as the basic calculation unit, so it can carry out a variety of rough approximate calculations of information system. Next, a rough set model of numerical information system based on kernel similarity relation is proposed.

Definition 8. Given the numerical information system , attribute subset , and kernel similarity threshold , the kernel similarity relation determined by is . For the target approximation set , the kernel lower approximation set and the kernel upper approximation set under are defined aswhere is called the kernel similarity rough set of the target approximation set under the kernel similarity relation .
At the same time, the kernel positive region set, kernel negative region set, and kernel boundary region set of the target approximation set under are defined asIn particular, for the numerical decision information system , the decision class of universe under the decision attribute set is divided into . Then, the kernel lower approximation set and kernel upper approximation set of decision attribute set under are defined asAt the same time, the kernel positive region set, kernel negative region set, and kernel boundary region set of decision attribute set under are defined asIn the neighborhood rough set model, the similarity relation between data objects is measured by distance. In the kernel similarity rough set model, the similarity relation between data objects is measured by kernel function, and the universe of information system is granulated by kernel function. The kernel similarity rough set model proposed in Definition 8 provides a new solution for the rough approximation of numerical information system, which further expands the research results of rough set model of numerical information system.

3.2. Uncertainty Measurement Based on Kernel Similarity Relation

In rough set theory, approximation precision and approximation roughness are a common method to measure the uncertainty of information system from the perspective of rough approximation [22–24]. In this section, based on the kernel similarity rough set model, the kernel approximation precision and kernel approximation roughness of numerical information system are proposed.

Definition 9. Given the numerical decision information system , , the kernel similarity relation determined by attribute subset is , . The kernel approximation precision and kernel approximation roughness of decision attribute set with respect to are defined aswhere represents the cardinality of the set. Obviously, .
Through Definition 9, it can be seen that the kernel approximation precision reflects the approximation degree of the kernel similarity relation to the rough approximation of the decision attribute set, and the kernel approximation roughness is the inverse measure of the kernel approximation precision, which reflects the uncertainty of the approximation.
The kernel approximation precision and kernel approximation roughness satisfy the following properties.

Property 3. Given the numerical decision information system , , attribute subsets and kernel similarity thresholds , so satisfied,(1)If , then , (2)If , then ,

Proof. (1)According to Property 1, if , meets . If , meets , that is, , then , so can be obtained. Similarly, if , then meets , and because , so , then , can be obtained. According to Definition 9, , can be obtained.(2)According to Property 1, similar to the proof of (1), we can get that (2) holds.

Property 4. Given the numerical decision information system , , attribute subsets and kernel similarity thresholds , so satisfied,(1)If , then , (2)If , then ,

Proof. Property 4 can be obtained by combining Property 2 and Property 3.
It can be seen from Properties 2 and 3 that the kernel approximation precision and kernel approximation roughness can evaluate the uncertainty of information system. However, in some cases, these two measurement methods cannot provide enough information to evaluate the uncertainty caused by kernel similarity relation in numerical information systems, which can be illustrated by the following example.

Example 1. Given the numerical decision information system , universe , which contains two classes and . Let the two attribute subsets and of the information system satisfy , and the kernel similarity granular structure iswhere
; ; ; ;
; ; ; ;
; and .where
; ; ; ; ; ; ; ; ; and .
Obviously, the relationship is satisfied. It can be obtained according to Definitions 8 and 9,So,that is, .

Example 1. shows that with the increase of attributes, the kernel similarity class of the object decreases monotonically, while the kernel approximation precision and kernel approximation roughness do not change, that is, they do not meet the strict monotonicity.
In order to overcome the fact that approximation precision and approximation roughness cannot measure the uncertainty of information system well, scholars introduce the measurement of knowledge granularity of information system and propose a combined measurement method by combining approximation precision (approximation roughness) with knowledge granularity, which can make up for the defect of uncertainty measurement of approximation precision (approximation roughness) [22–24]. Based on this method, a combined measurement method of kernel approximation precision (kernel approximation roughness) and kernel knowledge granularity is proposed in this paper.

Definition 10. Given the numerical information system , , attribute subset and kernel similarity threshold . The kernel knowledge granularity of numerical information system under kernel similarity granular structure is defined asThe kernel knowledge granularity satisfies . When , , ; when , , .
The kernel knowledge granularity satisfies the following property.

Property 5. Given the numerical information system , attribute subsets and kernel similarity thresholds , so satisfied,(1)If , then (2)If , then

Proof. According to the definition of kernel knowledge granularity in Property 1 and Definition 10, Property 5 can be proved.
In Property 5, (1) shows that the kernel knowledge granularity decreases monotonically with the increase of attributes, (2) shows that the kernel knowledge granularity decreases monotonically with the increase of kernel similarity threshold, and the decreases of (1) and (2) are strict. When at least one kernel similarity class in the kernel similarity granular structure decreases monotonically, the kernel knowledge granularity decreases. The situation shown in Example 1 will not occur.
Based on kernel approximation precision (kernel approximation roughness) and kernel knowledge granularity, a new uncertainty measurement method for numerical information systems will be proposed, which is called kernel similarity combination measurement.

Definition 11. Given the numerical decision information system , , attribute subset and kernel similarity threshold . Then, the kernel similarity combination measurement is defined asThe kernel similarity combination measurement satisfies .
The kernel similarity combination measurement satisfies the following properties.

Property 6. Given the numerical decision information system , , attribute subsets and kernel similarity thresholds , so satisfied,(1)If , then (2)If , then

Proof. According to Properties 1, 3, and 5 and Definition 11, Property 6 can be obtained directly.

Property 7. Given the numerical decision information system , , attribute subsets and kernel similarity thresholds , so satisfied,(1)If , then (2)If , then

Proof. According to Property 6, Property 7 can be obtained directly.
The kernel similarity combination measurement reflects the uncertainty of the information system from two aspects: one is to evaluate the uncertainty through the approximation degree of the kernel lower approximation set and the kernel upper approximation set, and the other is to evaluate the uncertainty through the average granularity of the kernel similarity granular structure under the kernel similarity relation. As shown in Definition 10, average the size of each kernel similarity class in universe to define the kernel knowledge granularity. When at least one kernel similarity class in the universe decreases, the kernel knowledge granularity will decrease, since the kernel similarity combination measurement is the product of the kernel approximation precision (the kernel approximation roughness) and the kernel knowledge granularity. This makes the kernel similarity combination measurement have more strict monotonicity with the increase of attributes and also has higher uncertainty measurement effect in numerical information system.

Example 2. For the results of kernel similarity granular structure of the information system in Example 1, according to Definition 10, the following can be obtainedTherefore, it can be obtained according to Definition 11,So, .

4. Attribute Reduction Algorithm Based on Kernel Similarity Combination Measurement

In the previous section, the uncertainty measurement method of numerical information system based on kernel similarity combination measurement is proposed by combining kernel approximation precision (kernel approximation roughness) and kernel knowledge granularity. Due to the strict monotonicity of the kernel similarity combination measurement with the increase of attributes, this measurement is used to design the attribute reduction of information system in this section, and a numerical information system attribute reduction algorithm based on the kernel similarity combination measurement is proposed.

Definition 12. Given the numerical decision information system and the kernel similarity threshold . If the attribute subset is the kernel similarity combination measurement attribute reduction set of information system , then if and only if(1)(2)In Definition 12, condition (1) requires that the kernel similarity combination measurement of the attribute reduction set is equal to that of the complete attribute set, and condition (2) requires that the attribute reduction set is minimal, that is, it does not contain redundant attributes.
Generally, heuristic method is an important way to search for attribute reduction of information system [4]. The main strategy of this method is to select the optimal attribute in each iteration. The strict monotonicity of kernel similarity combination measurement is an important property to evaluate attribute significance. We define the following methods to evaluate the attribute importance of numerical information systems based on kernel similarity combination measurement.

Definition 13. Given the numerical decision information system , attribute subset and kernel similarity threshold . The external importance of attribute with respect to attribute set is defined asThe internal importance of attribute with respect to attribute set is defined asIn Definition 13, the external attribute importance is to evaluate the attribute importance of attributes other than candidate attributes, and the internal attribute importance is to evaluate the attribute importance of attributes in candidate attributes. Using the attribute importance shown in Definition 13 to evaluate the significance of attributes in information system, an attribute reduction algorithm for numerical information system is proposed, as shown in Algorithm 1.
In Algorithm 1, step 2 searches for attributes from the complete attribute set through the external attribute importance and selects the attribute with the maximal importance as the candidate attribute. Because every time a new attribute is selected, the kernel similarity granular structure of the candidate attribute reduction result is changed, and step 3 needs to continue the heuristic search for attributes in the remaining attributes until the external attribute importance of the remaining attributes is 0. In order to prevent redundant attributes in the selected attribute reduction results, step 4 performs reverse redundancy elimination through the internal attribute importance, that is, the attribute with internal attribute importance equal to 0 will be deleted, and the final attribute reduction result will be obtained after completing steps 2 to 4. In Algorithm 1, the time complexity of calculating attribute importance is , so the time complexity of the whole Algorithm 1 is .

	Input: Numerical decision information system , specific kernel function , kernel similarity threshold .
	Output: Attribute reduction result .
	Step 1. Initialize attribute reduction result ;
	Step 2. For attribute , the external attribute importance of each attribute is calculated. In particular, .
	Step 3. Find the attribute corresponding to the maximum external attribute importance of in step 2. If , then and return to step 2; otherwise, jump to step 4.
	Step 4. For , calculate the internal attribute importance of attribute . If exists, then .
	Step 5. Return attribute reduction result .

5. Experimental Analysis

In this section, we will test the uncertainty measurement method and attribute reduction algorithm proposed in this paper through UCI machine learning classification data set, so as to verify the effectiveness and superiority of this method. Table 2 shows the details of UCI data sets. The conditional attribute values of these data sets are numerical, so they can be directly applied to the algorithm proposed in this paper. In order to prevent the influence of the dimension of attribute values of data sets on the experiment, the conditional attribute values of these data sets are normalized to the [0,1] interval by range normalization method. At the same time, the kernel similarity measurement of all experiments adopts Gaussian kernel function, and the parameters are uniformly set to 0.8. All experiments are coded by MATLAB 2015b and run in a personal PC environment with Intel(R) Core(TM) i7-4790 CPU @3.60 GHz and 16 GB memory.

5.1. Experiment on Uncertainty Measurement

In the kernel similarity combination measurement proposed in this paper, the kernel similarity threshold plays a key role in information granulation, so it is particularly important to select an appropriate kernel similarity threshold. In this experiment, the kernel similarity threshold is taken in the interval through the interval of 0.02, and then the uncertainty measurement is calculated for the data sets in the order of increasing attribute serial number. The experimental results of all data sets are shown in Figure 1. The abscissa of each graph is the increasing number of attributes, the ordinate is the kernel similarity threshold , and the vertical coordinate is the uncertainty measurement result. By observing the subgraphs in Figure 1, it can be found that when the kernel similarity threshold is small, such as 0.57 to 0.71, with the increase of attributes, the kernel similarity combination measurement of most data sets decreases slowly, and the difference between the uncertainty measurement of the first attribute and the last attribute is not particularly large. When the kernel similarity threshold is large, for example, 0.87 to 0.95, with the increase of attributes, the kernel similarity combination measurement of most data sets decreases rapidly, and for the last several attributes, the uncertainty measurement result is equal to or close to 0. This shows that too large or too small value makes the kernel similarity class too refined or too coarse, which cannot measure the uncertainty of numerical information system well. Based on the experimental results of each subgraph in Figure 1, the kernel similarity threshold is selected for the experiment, and this value is also used in subsequent experiments.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

In order to verify the effectiveness of the kernel similarity combination measurement proposed in this paper, this section compares the uncertainty measurement results of each data set with the kernel approximation roughness, kernel similarity knowledge granularity, and kernel similarity combination measurement, respectively. The experimental results of all data sets are shown in Figure 2. The abscissa of each graph is the increasing number of attributes, and the ordinate is the uncertainty measurement result. In Figure 2, the three uncertainty measurement results gradually decrease with the increase of attributes. This is mainly because more knowledge is acquired with the increase of attributes, so the uncertainty of information system gradually decreases. Among these experimental results, for the kernel similarity roughness of the data set messidor, when the number of information system attributes increases from 1 to 8, the uncertainty measurement results basically remain unchanged. Similarly, there are data sets winequality-red, winequality-white, and magic. At this time, the kernel similarity knowledge granularity decreases, indicating that the granulation of information system is gradually refined, so the kernel similarity roughness does not meet the strict monotonicity of data granulation. For kernel similarity knowledge granularity and kernel similarity combination measurement, the uncertainty measurement results decrease strictly monotonically with the increase of attributes. At the same time, the measurement results of kernel similarity combination measurement integrate the results of kernel similarity roughness and kernel similarity knowledge granularity, so kernel similarity combination measurement has better uncertainty measurement effect of information system.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

5.2. Experimental Comparison of Attribute Reduction

This section carries out the comparative experiment of attribute reduction. In order to make comparison, this experiment selects three comparison algorithms among the same type of attribute reduction algorithms proposed in recent years, namely, (1) a novel hybrid feature selection method considering feature interaction in neighborhood rough set [25] (it is recorded as the comparison Algorithm 1); a novel approach to attribute reduction based on weighted neighborhood rough sets [14] (it is recorded as the comparison Algorithm 2); and attribute reduction with fuzzy rough self-information measures [5] (it is recorded as the comparison Algorithm 3). In this experiment, SVM classifier and kNN (k = 3) classifier will be used to evaluate the attribute reduction performance of all algorithms through 10 cross validations. In the training stage, all algorithms are used to reduce the attributes of the training set, and the corresponding attribute reduction results of each algorithm are obtained. In the verification stage, the obtained attribute reduction set is used to extract subdata from the verification data set to calculate the classification accuracy of SVM and kNN. Repeat the above experiment for 10 times, and take the average value of 10 times as the final experimental result.

The classification accuracy calculated by SVM and kNN on the original attribute sets and the reduction sets of the four algorithms are shown in Tables 3 and 4, respectively, in which the best experimental results are highlighted in bold. It can be seen from Tables 3 and 4 that the average classification accuracy of all reduction results is better than the original data. Compared with the average classification accuracy of original data, the average classification accuracy of comparison Algorithms 1–3, and the algorithm proposed in this paper under SVM are improved by 4.3%, 4.5%, 2.1%, and 5.3%, respectively, and the average classification accuracy under kNN is improved by 3.5%, 4.6%, 2.0%, and 5.8%, respectively. For these four algorithms, the proposed algorithm has higher SVM and KNN classification accuracy in most data sets. Therefore, the attribute reduction classification accuracy performance of the proposed algorithm is obviously better than the three comparison algorithms.

Running time is an important index to evaluate the feasibility of attribute reduction algorithm. The average running time of the four algorithms for attribute reduction is shown in Figure 3. As can be seen from Figure 3, the running time of the algorithm proposed in this paper is slightly worse than that of the comparison Algorithm 3. The main reason is that the comparison Algorithm 3 uses the fuzzy rough self-information for heuristic search attribute reduction, only needs to calculate the fuzzy similarity class of each object, and does not need to calculate the upper and lower approximation sets of the decision class, thus reducing a part of the amount of calculation. The running time of the algorithm proposed in this paper is obviously better than that of the comparison Algorithm 1 and the comparison, Algorithm 2. The main reason is that the comparison Algorithm 1 needs to perform the operation of attribute interaction in attribute reduction, and the comparison Algorithm 2 needs to perform the operation of attribute weight in attribute reduction, so more calculation will be generated, and the comparison Algorithm 2 needs to perform a large number of matrix operations when calculating attribute weight, so the calculation time of the comparison Algorithm 2 is the largest. Obviously, under the condition of comprehensive calculation efficiency and classification accuracy, the attribute reduction algorithm in this paper is the best choice.

(a)

(b)

The length of attribute reduction results is another important index to evaluate the performance of attribute reduction. Table 5 shows the average length results of attribute reduction results of all algorithms. Comparing these four algorithms, it can be found that the length of the attribute subset selected by the algorithm proposed in this paper is the smallest in most data sets, and there are slightly larger attribute subsets in data sets iono, messidor, and magic. The synthesis shows that the algorithm proposed in this paper has good attribute reduction effect and obvious advantages in dimension reduction.

Combining all the experimental results, it can be proved that the attribute reduction algorithm based on kernel similarity combination measurement proposed in this paper can quickly select smaller attribute subsets and obtain higher classification performance.

6. Conclusion

Deleting low correlation and redundant attributes in the data sets before classification and regression can improve performance and computational efficiency. For the numerical information systems, distance is used to evaluate the similarity between objects in neighborhood rough sets. In order to further improve the effect of similarity measurement between data objects, a rough set model of kernel similarity relation is proposed in this paper. Then, based on the combination of kernel approximation precision (kernel approximation roughness) and kernel knowledge granularity, an uncertainty measurement method of kernel similarity combination measurement is proposed. Lastly, using the strict monotonicity of kernel similarity combination measurement, an attribute reduction algorithm for numerical information system is proposed. Experiments show its effectiveness and superiority. Due to the dynamic situation of data in the actual environment, we will further study the incremental attribute reduction of kernel similarity combination measurement method.

Data Availability

All data are available from the website http://archive.ics.uci.edu/ml/index.php.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors gratefully acknowledge the support of the Natural Science Foundation of Anhui Educational Commission Project (KJ2018A0469 and KJ2021A0972) and the Natural Science Research Project of Huainan Normal University (2021XJZD026).

References

Z. Yu, D. Wang, and P. Wang, “A study of interrelationships between rough set model accuracy and granule cover refinement processes,” Information Sciences, vol. 578, pp. 116–128, 2021.
View at: Publisher Site | Google Scholar
L. Sun, T. Wang, W. Ding, J. Xu, and Y. Lin, “Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification,” Information Sciences, vol. 578, pp. 887–912, 2021.
View at: Publisher Site | Google Scholar
Y. Wu and J. Tang, “Research progress of attribute reduction based on rough set in context of big data,” Computer Engineering and Applications, vol. 55, no. 6, pp. 31–38, 2019.
View at: Google Scholar
T. Zhou, H. Lu, and H. Ren, “Survey on attribute reduction algorithm of rough set,” Acta Electronica Sinica, vol. 49, no. 7, pp. 1439–1449, 2021.
View at: Google Scholar
C. Wang, Y. Huang, W. Ding, and Z. Cao, “Attribute reduction with fuzzy rough self-information measures,” Information Sciences, vol. 549, pp. 68–86, 2021.
View at: Publisher Site | Google Scholar
Z. Yuan, H. Chen, T. Li, Z. Yu, B. Sang, and C. Luo, “Unsupervised attribute reduction for mixed data based on fuzzy rough sets,” Information Sciences, vol. 572, pp. 67–87, 2021.
View at: Publisher Site | Google Scholar
Z. Jiang, K. Liu, X. Yang, H. Yu, H. Fujita, and Y. Qian, “Accelerator for supervised neighborhood based attribute reduction,” International Journal of Approximate Reasoning, vol. 119, pp. 122–150, 2020.
View at: Publisher Site | Google Scholar
X. Yang, S. Liang, H. Yu, S. Gao, and Y. Qian, “Pseudo-label neighborhood rough set: measures and attribute reductions,” International Journal of Approximate Reasoning, vol. 105, pp. 112–129, 2019.
View at: Publisher Site | Google Scholar
H. Chen, T. Li, X. Fan, and C. Luo, “Feature selection for imbalanced data based on neighborhood rough sets,” Information Sciences, vol. 483, pp. 1–20, 2019.
View at: Publisher Site | Google Scholar
J. Xiong, J. Wu, and Q. Wang, “Decision cost attribute reduction of hybrid data based on neighborhood mutual information entropy,” Journal of Chinese Computer Systems, vol. 42, no. 8, pp. 1584–1590, 2021.
View at: Google Scholar
Q. Hu, D. Yu, J. Liu, and C. Wu, “Neighborhood rough set based heterogeneous feature subset selection,” Information Sciences, vol. 178, no. 18, pp. 3577–3594, 2008.
View at: Publisher Site | Google Scholar
X. Fan, W. Zhao, C. Wang, and Y. Huang, “Attribute reduction based on max-decision neighborhood rough set model,” Knowledge-Based Systems, vol. 151, no. 1, pp. 16–23, 2018.
View at: Publisher Site | Google Scholar
C. Wang, Y. Shi, X. Fan, and M. Shao, “Attribute reduction based on k-nearest neighborhood rough sets,” International Journal of Approximate Reasoning, vol. 106, pp. 18–31, 2019.
View at: Publisher Site | Google Scholar
M. Hu, C. C. Tsang Eric, Y. Guo, D. Chen, and W. Xu, “A novel approach to attribute reduction based on weighted neighborhood rough sets,” Knowledge-Based Systems, vol. 220, 2021.
View at: Publisher Site | Google Scholar
S. He, C. Kang, and M. Lu, “Attribute reduction method based on neighborhood combination measure,” Control and Decision, vol. 31, no. 7, pp. 1225–1230, 2016.
View at: Google Scholar
S. Yao, F. Xu, and Z. Wu, “Non-monotonic attribute reduction based on neighborhood rough mutual information entropy,” Control and Decision, vol. 34, no. 2, pp. 353–361, 2019.
View at: Google Scholar
X. Zhao and Y. Yang, “Incremental attribute reduction algorithm based on neighborhood granulation conditional entropy,” Control and Decision, vol. 34, no. 10, pp. 2061–2072, 2019.
View at: Google Scholar
C. Wang, Q. Hu, and X. Wang, “Feature selection based on neighborhood discrimination index,” IEEE Transactions on Neural Networks, vol. 29, no. 7, pp. 2986–2999, 2018.
View at: Google Scholar
C. Wang, Y. Huang, M. Shao, Q. Hu, and D. Chen, “Feature selection based on neighborhood self-information,” IEEE Transactions on Cybernetics, vol. 50, no. 9, pp. 4031–4042, 2020.
View at: Publisher Site | Google Scholar
J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern analysis, Cambridge University Press, Cambridge, UK, 2004.
Q. Hu, D. Yu, W. Pedrycz, and D. Chen, “Kernelized fuzzy rough sets and their applications,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 11, pp. 1649–1667, 2011.
View at: Publisher Site | Google Scholar
X.-A. Ma, “Measures associated with granularity and rough approximations in interval-valued information tables based on kernel similarity relations,” Information Sciences, vol. 538, pp. 337–357, 2020.
View at: Publisher Site | Google Scholar
J. Liang, J. Wang, and Y. Qian, “A new measure of uncertainty based on knowledge granulation for rough sets,” Information Sciences, vol. 179, no. 4, pp. 458–470, 2009.
View at: Publisher Site | Google Scholar
Q. Zhang, Y. Chen, G. Zhang, Z. Li, L. Chen, and C.-F. Wen, “New uncertainty measurement for categorical data based on fuzzy information structures: an application in attribute reduction,” Information Sciences, vol. 580, pp. 541–577, 2021.
View at: Publisher Site | Google Scholar
J. Wan, H. Chen, and Y. Zhong, “A novel hybrid feature selection method considering feature interaction in neighborhood rough set,” Knowledge-Based Systems, vol. 227, 2021.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Baoguo Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies