Abstract

Pawlak's classical rough set theory has been applied in analyzing ordinary information systems and decision systems. However, few studies have been carried out on the attribute selection problem in incomplete decision systems because of its complexity. It is therefore necessary to investigate effective algorithms to deal with this issue. In this paper, a new rough conditional entropy-based uncertainty measure is introduced to evaluate the significance of subsets of attributes in incomplete decision systems. Furthermore, some important properties of rough conditional entropy are derived and three attribute selection approaches are constructed, including an exhaustive search strategy approach, a heuristic search strategy approach, and a probabilistic search strategy approach for incomplete decision systems. Moreover, several experiments on real-life incomplete data sets are conducted to assess the efficiency of the proposed approaches. The final experimental results indicate that two of these approaches can give satisfying performances in the process of attribute selection in incomplete decision systems.

1. Introduction

Rough set theory, proposed by Pawlak [13], is an extension of set theory for the study of the intelligent systems characterized by uncertain, imprecise, incomplete, and inconsistent data. It has been proven to be an innovative and efficient mathematic tool, compared with other traditional data processing strategies like PCA, neural networks, SVM and so forth [47]. Unlike those methods, rough set theory allows knowledge discovering process to be conducted automatically by the data themselves without any dependence on the prior knowledge. By using the concepts of upper and lower approximations in rough set model to deal with the data, knowledge hidden in information systems could be discovered and expressed in the form of decision rules. Rough set methodology presents a novel paradigm to deal with uncertainty and then it has been successfully applied into feature selection [8], rule extraction [9, 10], uncertainty reasoning [1113], decision evaluation [14], granular computing [15, 16], and so on.

It has been known that Pawlak’s classical rough set model can only be used to tackle the problems of complete information systems [17]. Nevertheless, because of the error of data measuring, the impreciseness from the limitation of data acquisition manners and other probable factors, in the real database it is inevitable to meet the empty values, which stand for the inaccessible information in the database for the moment. In other words, the incomplete information systems with missing values often exist in practical knowledge acquisition. For now, two main approaches have been proposed to cope with incomplete information systems. One is indirect approach, which transforms an incomplete information system into a complete information system by eliminating objects with missing values or filling up missing values with processed data. The other is direct approach, which extends some basic notions in the classical rough set models [18, 19]. In the past decade, with respect to different requirements, various extensions of the rough set models have been proposed, such as variable precision rough set models [20, 21], rough set models based on tolerance relation [22, 23], fuzzy rough set models [24], and weighted attributes values rough set models [25].

In many fields such as data mining, machine learning, and pattern recognition, data sets containing huge numbers of attributes can often be encountered. In such cases, feature selection, or attribute reduction as we know, is necessary. It is well known that irrelevant and redundant attributes in input attributes not only complicate the problem but also degrade solution accuracy [26, 27]. As a significant step among data preprocessing procedures, the main objective of attribute selection is to determine a minimal attribute subset, which is also called reduction, from a problem domain, which can retain a relatively high accuracy in replacing the original attributes.

In comparison with the study of attribute selection from complete information systems, scant effort has been made to develop the tolerance relation-based methods of attribute selection for incomplete information systems [2831]. Furthermore, there is a lack of the study of attribute selection of incomplete decision systems, compared with the above. As an effective method for attribute selection, rough set can preserve the meaning of the attributes. The essence of rough set approach for attribute selection is to find a subset of attributes, which can predict the decision concepts as well as the original attribute set. So far, there have been some new widespread approaches which are usually considered to develop the classical rough set theory as follows: tolerance-based rough set model [32], covering rough set model [33, 34], and dominance-based rough set model [35, 36]. However, they still have some inherent disadvantages and are not suitable for attribute selection of incomplete decision systems.

As we know, finding the minimal reduct in an incomplete decision system is an NP-hard problem [37]. The general method of solving this kind of problems is adopting a heuristic search, which always depends on the measurements associated with the attributes [38]. However, little investigation has addressed the issue of measuring the uncertainty of knowledge of the tolerance relation-based rough set models in incomplete decision system until now. Hence, a further study on uncertainty measures applicable for evaluating the roughness and accuracy of a set in an incomplete decision system is of both theoretical and practical importance.

The main aim of this paper is to construct an effective uncertainty measure evaluating the roughness and accuracy of knowledge to find a heuristic attribute selection algorithm of incomplete decision systems. The rest of this paper is organized as follows. In Section 2, we briefly review some fundamental concepts concerning the main subject of this paper. Section 3 introduces the concept of entropy-based uncertainty measures and demonstrates several heuristic attribute selection algorithms of incomplete decision systems. Experimental comparisons and results are illustrated and analyzed in Section 4. Finally, some conclusions are presented in Section 5.

2. Preliminary

Classical rough set theory is originated by Qian et al. to deal with imprecise or vague concepts [39]. In the last decade, many generalized rough set models have been proposed and developed. In this section, we will only introduce some notions being used in this paper.

Pawlak’s classical rough set is to be defined at the first place. An information system in rough set theory is a pair , where denotes a nonempty finite set of objects, called universe of discourse, and denotes a finite condition attribute set. With every attribute we associate a set , of its values, called the domain of . Then, a data pattern can be defined by the object and attributes from . A decision table, abbreviated as DT, is a special system with the form , where denotes decision attribute. Let denote the domain of decision attribute mapping . For the application of pattern classification, the attribute set is just the feature set and the universe of the discourse may represent a training pattern set or a sign set of training pattern sets.

Let be an equivalence relation on , which means relation satisfies reflexivity, symmetry, and transitivity. Relation generates a partition on , where denotes the equivalence classes, as well as indiscernible class, generated by the equivalence relation . These are also called elementary sets of in rough set theory. Let denote the empty sets. For any , we can describe by elementary sets of and the two sets which are called the lower and upper approximations of , respectively. Then, the concepts of positive region, negative region, boundary region, and approximation measure, are introduced, where . The lower approximation, which is equivalent to the positive region, is the complete set of the objects in the universe that can be unambiguously classified as belonging to the target set . In contrast, the upper approximation is the complete set of the objects that are possibly members of the target set . In other words, these objects cannot be positively classified as belonging to the complement of the target set , that is, . Furthermore, the negative region contains the set of objects that can be definitely ruled out as the members of the target set . Finally, the approximation measure is intended to capture the degree of completeness of our knowledge about the target set .

In most cases, some precise values of particular attributes in an information system are not known, which means missing or known partially. Then such a system is called an incomplete information system and is still denoted without any confusion by a pair . As for an incomplete information system, a missing value may be represented by the set of all the possible values for the corresponding attribute; that is, . Moreover, if is known partially, for instance, which implies that cannot be , then the value of is specified as .

In what follows, we only take the consideration of the incomplete information systems with missing values. In such a case, the special symbol “” is to be used to indicate that the specific value of an attribute is missing. Let be an incomplete information system. Each subset of attributes determines a similarity relation The incomplete information system can be described by a set-valued information system with when . In such a case, the similarity relation can be equivalently defined as With the similarity relation , two objects are considered to be possibly indiscernible in terms of the values of attributes . Furthermore, a similarity relation satisfies reflexive and symmetric, but may not be transitive, so it is one of the tolerance relations.

3. Rough Entropy-Based Uncertainty Measures

In this section, the concept of rough entropy is introduced to measure the uncertainty or roughness of knowledge in incomplete information systems. And then, some rough entropy-based uncertainty measures are presented in incomplete information systems and incomplete decision systems. Some important properties concerning the uncertainty measures are derived, respectively, and the relationship among them is discussed as well.

For a given information system, we need to assess its uncertainty or roughness for a target object or a target decision. A form of uncertainty measure, which is called rough entropy, has been mentioned in rough sets, rough relational databases, and information systems to calculate the roughness of knowledge. The following definition gives the description of the rough entropy in incomplete information systems.

3.1. Rough Entropy in IIS and IDS

Let denote an incomplete information system with and , and . According to Shannon’s theory of information entropy, the rough entropy of knowledge on is denoted by where represents the probability of an element within the tolerance class [40].

Let be an incomplete information system, with . If there exists a one-to-one onto function such that for any , then . Therefore, the rough entropy of knowledge is invariant with respect to the different set of tolerance classes that are size isomorphic.

Let denote an incomplete decision system with . For any subset of condition attributes , a tolerance relation , which is in generalized form of the former similarity relation, can be defined as Then, the tolerance class of an object with respect to an attribute set is defined as Obviously, relation is reflexive and symmetric, but may not be transitive.

Definition 1. Let , be an incomplete decision system, then the generalized decision is defined as follows:

Definition 2. Let , be an incomplete decision system. Let be the generalized decision function. If for any , then the incomplete decision is consistent, which implies that it is deterministic and definite, where denotes the number of the elements of the set.
In what follows, we give an example, which was firstly shown by Kryszkiewicz [31], to demonstrate how tolerance relationship works in a specific incomplete decision system. Given an incomplete decision system shown in Table 1, where , = and , and . By definition, we have Consequently, we get , which proves that the incomplete decision system is consistent.

Definition 3. Let be an incomplete decision system. Let , then the attribute set is a relative reduct of , if and only if(1) for all ,(2), are satisfied simultaneously.

3.2. Conditional Entropy Measure for Incomplete Decision System

The rough entropy of knowledge in and has been discussed above. In this subsection, we will introduce a new form of conditional entropy and the mutual information based on the tolerance relationship to measure the uncertainty of knowledge in incomplete decision systems. And then, some important properties will be deduced.

Definition 4. Given a consistent incomplete decision system, and . Let , . The conditional entropy of to is defined as follows: where Hence, we have It is obvious that when .

Proposition 5. Let be a consistent incomplete decision system. Then, we have .

Proof. Since is consistent, we have , . This means that , , .
Hence, we have Consequently, we know that

Proposition 6. Let be a consistent incomplete decision system; is a relative reduct of relative to decision attribute , if and only if(1);(2), .

Proof. (1) We first prove that if , , then is a relative reduct.
By Proposition 5, we know that . Then .
Since we have
At the same time, we know that if and only if This implies that there exists only one such that and , , . This means that , . Hence, we get .
From , , we have . It follows that there at least exist , , and such that and . This means that . Hence, .
(2) We then prove that if is a relative reduct, then , , hold.
Since is a relative reduct, we get ; that is, , . It follows that, , there exists such that and , , .
It follows that Therefore, this means that Consequently, we know For , we know that ; that is, there at least exists such that . It follows that . Thus, there at least exist and , such that Then, we get .
From parts (1) and (2), we finally prove that the theorem holds.

4. Attribute Selection Approaches Based on Rough Conditional Entropy for IDS

Two important steps are contained in the procedure of attribute selection: evaluation of a candidate attribute subset and search strategy through the attribute space, which are all for finding the most significant attributes, that is, relative reduct. Therefore, we use the conditional entropy discussed previously in Section 3 to evaluate the attribute subset and a measurement is defined as follows.

Definition 7. Given a consistent decision system , let and . Then, the significance of attribute relative to is defined as This definition describes the increment of discernibility power relative to the decision caused by involving attribute . It implies that the larger the difference between and is, the more significant the specific condition attribute is for condition attribute subset . Thus, it can be used as a new measurement for attribute selection in incomplete decision system. According to this new measurement, three attribute selection approaches based on different search strategies are proposed, respectively, in the following subsections.

4.1. Breadth-First: Exhaustive Search

Breadth-first is one of earliest feature selection or attribute selection algorithms in machine learning area. It begins with an empty attribute set and carries out search process with breadth-first strategy until it finds a minimal subset that satisfies stop criterion. Since the Breadth-first algorithm adopts an exhaustive search strategy, it can guarantee an optimal solution [41]. Then, we present the Breadth-first approach for attribute selection of incomplete decision system as shown in Algorithm 1.

Input: An incomplete decision system .
Output: An attribute selection result .
  (1) For every to
  (2) For all subset SelectAttr with
  (3) If , go to Step  2, otherwise return
  (4) End
  (5) End

Example 8. Given the incomplete decision system shown in Table 2, we have According to the previous definition of rough conditional entropy, we can simply get Consequently, we can deduce other conditional entropy values with respect to the different condition attributes in the same way as follows: Then, we calculate the rough conditional entropy values with respect to different combinations of two condition attributes: Finally, we calculate the rough conditional entropy values with respect to different combinations of three condition attributes: Since , we obtain the desired attribute set of selected , which is a relative reduct of the original condition attribute set . It also means that the search procedure ends at this step and it is not necessary to calculate the entropy value of the last combination of three attributes anymore. The detailed search procedure is illustrated in Figure 1.

4.2. Depth-First: Heuristic Search

We can also use heuristic search or greed search to find attribute reduction. At the very beginning, the candidate attribute subset is empty. Then, a new attribute which can maximize the significance measure is added to the selected attribute subset each time, until the stop criterion is satisfied. Depth-first algorithm is fast, close to optimal, and deterministic [42]. Here, we present the Depth-first approach for incomplete decision system, as shown in Algorithm 2.

Input: An incomplete decision system .
Output: A selected attribute subset .
  (1) Initialize ,
  (2)For every attribute , ,
    calculate the tolerance class of object ,
  (3)  Calculate the conditional entropy
  (4)  Choose the attribute which minimizes , that is
    choose the attribute with the maximum significance measure
  (5)   ,
  (6)  If , go to Step  2, otherwise
  (7) End

Example 9. Given the incomplete decision system shown in Table 2, we have Since has the minimum conditional entropy value, we choose the attribute as one of selected attributes. Therefore, we can get Since , we still need to add more attributes to SelectAttr. On the basis of the rule of heuristic search, we only need to test and compare the conditional entropy values of the subsets in which the condition attribute is included: As previously mentioned, since is the minimum, we choose the attribute as another selected attribute. Thus we have Since , it is still necessary to choose more attributes to add to the selected attributes in which attributes are definitely included: Obviously, since is the minimum, we choose the attribute . Thus, we have And since , the stop criterion is satisfied and the algorithm terminates. Hence, the final result of attributes selection is . The detailed procedure of the Depth-first algorithm is shown in Figure 2.

4.3. LVF: Probabilistic Search

Las Vegas algorithm is new for attribute subset selection and can make probabilistic choices of subsets in search of an optimal set. Las Vegas Filter, which is abbreviated as LVF, is a probabilistic algorithm where probabilities of generating any subset are equal [43]. In this paper, we use the investigated conditional entropy as LVF’s evaluation measurement. It generates attribute subsets randomly with equal probability and records the minimal size of attributes subset satisfying the stop criterion of maximum tries times. LVF is fast and efficient in reducing the number of candidate features in the early stages and can produce optimal solutions if the computing resources permit. The LVF approach for attribute selection in incomplete decision system is given, as shown in Algorithm 3.

Input: An incomplete decision system and MaxTries.
Output: A selected attribute subset .
(1) Initialize
(2) For to MaxTries
(3)  Randomly choose a subset of condition attributes
(4)  If and
(5)  
(6) End
(7)

5. Experiments

In this section, the performances of our attribute selection algorithms given in previous section are demonstrated and compared. Several real-life incomplete data sets from UCI Repository of Machine Learning Database at the University of California are used in our experiments. These experiments are performed on a personal computer with Windows 7, Intel (R) Core (TM) i3 CPU 2.13 GHz, and 4 GB RAM. The objective of these experiments is to evaluate the effectiveness and efficiency of the previous algorithms. The summary and statistic of the experimental data sets are shown in Table 3 and Figure 3, respectively.

Since some incomplete data sets contain continuous condition attribute values, we conduct a discretization preprocess to turn these continuous values into discrete ones before carrying out attribute selection. The aim of this step is to compress the data and reduce the time consumption of subsequent attribute selection. The running time of each algorithm is average CPU time, expressed in seconds.

In Breadth-first algorithm, we terminate the program when it runs beyond 5000 seconds. Moreover, we set the parameter MaxTries in LVF algorithm to variant values for different incomplete data sets, according to their sizes. The running time and the size of attribute selection results are highly concerned with the choice of the parameter MaxTries. It means that when MaxTries grows, the running time of LVF approach increases linearly and the size of selected attribute subset decreases. Both of them for each attribute selection approach are shown in Table 4. The running time of each approach is the average CPU time, expressed in seconds. We can easily find that Breadth-first approach takes much more time, even more than 5000 seconds, to obtain an attribute reduction, compared with the other two approaches. Three data sets out of six are too large in scale to calculate in limited time. Furthermore, it is also easy to be observed in Table 4 that Depth-first approach tends to select fewer attributes than LVF approach. In other words, the size of attribute subset selected by Depth-first approach intends to be smaller than that of attribute subset selected by LVF approach. And Depth-first approach consumes less time than the other two approaches in most instances. The time consumptions of Breadth-first approach, Depth-first approach, and LVF approach for attribute selection are , , and , respectively, where , , and denote the MaxTries parameter in LVF approach, the total numbers of the condition attributes, and instances in incomplete data sets, respectively. The relationships between incomplete data sets and the number of selected attributes, the running time of attribute selection, are illustrated in Figures 4 and 5.

The final part of our experiments is to compare and evaluate the efficiency of the proposed algorithms in practical classification tasks. For each of the six data sets in Table 5, we employ the SVM-RBF classifier, which is one of the most frequently used classifiers. We also apply the 10-fold cross-validation method to estimate the classification accuracy with respect to the reducts generated by the proposed algorithms. In each fold, the redundant attributes from the current training set are removed at the beginning, according to the proposed algorithms. Then, the test set is classified by using the rules generated from the training set. The final results of classification accuracies are shown in Table 5. It can be seen that, by using attribute selection algorithms, the classification accuracies for incomplete data sets are all raised in different degrees, compared with the classification accuracy for the original full attributes. It also can be noticed that the Depth-first algorithm exhibits the highest classification accuracy on each incomplete data set. Therefore, the experimental results demonstrate that the proposed algorithms are effective for attribute selection tasks in application domains.

6. Conclusion

In this paper, a rough conditional entropy-based attribute selection approach is proposed to evaluate the significance of condition attributes and find the minimal reduct in incomplete decision systems. By this measure, three types of attribute selection approaches, including the exhaustive search strategy approach Breadth-first, the heuristic search strategy approach Depth-first, and the probabilistic search approach LVF, are constructed. To evaluate the effectiveness of the introduced approaches, experiments on several real-life incomplete data sets are conducted. The experimental results suggest that Depth-first and LVF approaches are practical for attribute selection for classification of high-dimensional data with thousands of condition attributes, and they can efficiently enhance classification accuracy with predominant attributes. However, the process of examining exhaustively all combinations of condition attributes for finding the optimal one is an NP-hard problem. So far, it still cannot be easily calculated by our approaches if there are hundreds of thousands of condition attributes in a complex incomplete decision system. Therefore, for large data sets, to reduce the time consumption of the process of attribute selection, more applicable approaches such as parallel heuristic algorithms are desirable for incomplete decision systems with large scale. This issue needs to be investigated in the future.

Conflict of Interests

The authors would like to declare that there is no confilict of interests regarding the publication of the paper.

Acknowledgments

The authors would like to thank the anonymous reviewers sincerely for their insightful comments and constructive effort on this investigation. Moreover, the authors thank the UCI Repository of Machine Learning Database at the University of California for providing the experiment data sets. This work is supported by the National Natural Science Foundation of China (no. 61074176).