Incremental Fuzzy Clustering Based on Feature Reduction

Liu, Yongli; Zhang, Yajun; Chao, Hao

doi:https://doi.org/10.1155/2022/8566253

Journal of Electrical and Computer Engineering

On this page

Abstract Introduction Related Work Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 8566253 | https://doi.org/10.1155/2022/8566253

Incremental Fuzzy Clustering Based on Feature Reduction

Yongli Liu,¹Yajun Zhang,¹and Hao Chao¹

Academic Editor: Vinod Sharma

Received16 Aug 2021

Accepted02 Mar 2022

Published28 Mar 2022

Abstract

In the era of big data, more and more datasets are gradually beyond the application scope of traditional clustering algorithms because of their large scale and high dimensions. In order to break through the limitations, incremental mechanism and feature reduction have become two indispensable parts of current clustering algorithms. Combined with single-pass and online incremental strategies, respectively, we propose two incremental fuzzy clustering algorithms based on feature reduction. The first uses the Weighted Feature Reduction Fuzzy C-Means (WFRFCM) clustering algorithm to process each chunk in turn and combines the clustering results of the previous chunk into the latter chunk for common calculation. The second uses the WFRFCM algorithm for each chunk to cluster at the same time, and the clustering results of each chunk are combined and calculated again. In order to investigate the clustering performance of these two algorithms, six datasets were selected for comparative experiments. Experimental results showed that these two algorithms could select high-quality features based on feature reduction and process large-scale data by introducing the incremental strategy. The combination of the two phases can not only ensure the clustering efficiency but also keep higher clustering accuracy.

1. Introduction

Clustering technique is very important in data mining research. It can generate data groups in an unsupervised way and ensure high intragroup similarity and intergroup dissimilarity. Since the fuzzy set theory [1] was proposed and applied to cluster analysis [2], the research of fuzzy clustering grew rapidly. Today, many fuzzy clustering algorithms, represented by Fuzzy C-Means (FCM) clustering [3–5], have been proposed and widely used.

In the context of big data, datasets mainly show two characteristics: (1) large amount and (2) high dimension. When the FCM algorithm is used to process this kind of datasets, it lets all features participate in clustering with the same importance weights. Obviously, it will be easy to affect clustering accuracy and efficiency. In 2018, Yang [6] proposed Feature Reduction Fuzzy C-Means (FRFCM) clustering. In this algorithm, an objective function of feature reduction based on weighted entropy is designed, and the feature weight is calculated. If the influence of one feature is greater, the weight is higher, and vice versa. According to the weights, the original high-dimensional features are reduced to lower-dimensional space by feature reduction, which can effectively help improve the efficiency.

Although fuzzy clustering can effectively deal with high-dimensional feature data through feature reduction [7–13], it is still difficult to process large-scale data, especially streaming data. Previously, in order to realize large-scale data clustering [14–18], Hore et al. [19, 20] proposed two incremental algorithms, named SPFCM (Single-Pass Fuzzy C-Means) and OFCM (Online Fuzzy C-Means), based on single-pass and online clustering strategies, respectively. The single-pass method divides the whole dataset into several chunks. The centroids obtained by clustering the previous chunk participate in the clustering process of the next chunk until all the chunks are processed completely. However, in the online method, each chunk is clustered separately. Then centroids of each chunk form a new chunk, which is clustered again to generate the final results. Although SPFCM and OFCM both have the ability to deal with large-scale datasets, once the datasets are high-dimensional and sparse, the clustering results are difficult to be satisfactory. Therefore, Mei et al. [21] considered scalability and high-dimensional processing capability and expanded SPFCM and OFCM to SPHFCM (Single-Pass Hyperspherical Fuzzy C-Means) [21] and OHFCM (Online Hyperspherical Fuzzy C-Means) [21], respectively. In their work, a centroid normalization step is added to each iteration to normalize all centroids to the unit norm, and the Cosine similarity instead of the Euclidean distance is used to measure the distance between centroid and each object.

Based on above analysis, in order to realize the clustering of large-scale and high-dimensional datasets, two incremental fuzzy c-means clustering algorithms based on feature reduction are proposed, named SPFRFCM (Single-Pass Feature Reduction Fuzzy C-Means) and OFRFCM (Online Feature Reduction Fuzzy C-Means). In these two algorithms, different weights are given to the features according to their importance, and the dimension reduction method is used to reduce the data dimension and improve clustering efficiency. Besides, the incremental method is employed so that they are capable of handling large-scale data.

The rest of the paper is organized as follows. In Section 2, we introduce some related work, including the FRFCM algorithm and incremental clustering based on FCM. Section 3 presents our two algorithms, SPFRFCM and OFRFCM. In Section 4, experiments are provided for demonstrating the performance of proposed algorithms. Section 5 concludes this paper.

2.1. FRFCM

In the FRFCM algorithm, each feature has its own weight, and the weight value will be updated at each iteration. Let X = {x₁, x₂, …, x_N} represent a D-dimensional dataset, let U be a membership matrix whose element u_ci represents the fuzzy membership of the i-th data (1 ≤ i ≤ N) to the c-th (1 ≤ c ≤ C) cluster, let V be a centroid matrix whose element _cj represents the value of the j-th feature of the c-th centroid, and let W be a feature weight matrix whose element ω_j represents the weight of the j-th feature (1 ≤ j ≤ D), where N and C are the numbers of objects and clusters, respectively.

The objective function and constraints are expressed as follows:where δ_j is used to adjust the feature weight ω_j and T_ω depends on the values of N and C; that is,

2.2. Incremental Clustering Based on FCM

Both SPFCM and OFCM algorithms are based on WFCM (Weighted Fuzzy C-Means Clustering) [22–25], which assigns different weights to objects according to their importance. The higher the weight, the higher the importance, and vice versa. The objective function of WFCM algorithm is expressed as follows:where represents the weight of the i-th object. The iterative formulas of u_ci and v_cj can be obtained, respectively, by Lagrange multiplier method:

After dividing dataset X into s chunks with size n, that is, X = [X¹, X², …, X^s], both SPFCM and OFCM process each chunk in turn. In SPFCM, Let X^z represent the z-th chunk (1 ≤ z ≤ s). After clustering X^z with WFCM algorithm, we get a centroid set Δ_z = [v1_z, v2z, …, v_cz]. Then these centroids and their weights are merged with the objects in the z+1 chunk, and a new chunk X^z+1′ = [Δ_z, X^z+1] is generated. On this new chunk, the WFCM algorithm will be carried out again. The above process is cycled until all chunks are processed.

Different from SPFCM, OFCM algorithm clusters s chunks, respectively, at the same time; and the centroids of each chunk form a total chunk, on which the WFCM algorithm will be implemented again to produce the final clustering result.

3. Incremental Fuzzy Clustering Based on Feature Reduction

Based on single-pass and online incremental framework, this paper proposes two fuzzy incremental clustering algorithms based on feature reduction, named SPFRFCM and OFRFCM, respectively. During the implementation of SPFRFCM and OFRFCM algorithms, a weighted FRFCM algorithm, named WFRFCM (Weighted Feature Reduction Fuzzy C-Means), is designed. Its objective function is

The constraints include

According to the Lagrange multiplier method, by fixing U and W and calculating the partial derivative of V, the value of v_cj can be calculated as follows:

Similarly, the iterative formulas of u_ci and ω_j can be described as

Next, we analyze the time complexity of WFRFCM algorithm. This algorithm contains three parts: (1) calculating the membership division u_ci, (2) updating the cluster center , and (3) updating the feature weight ω_j. The complexities of these three parts are O(NC²D), O(NCD), and O(NCD²), respectively. Therefore, the total computational complexity is O(NC²D + NCD²).

3.1. SPFRFCM

Before the SPFRFCM algorithm is implemented, the dataset is divided into s chunks with size n, and the weight vector is initialized with a 1 × n vector, that is, = [1, 1, …, 1]. Let z be the index of chunk. Then this algorithm will run by performing the following steps:(1)When z = 1, WFRFCM algorithm is used to group this chunk directly; and we can obtain the centroid matrix Δ₁ = [₁₁, ₂₁, …, _c1] and the corresponding data weight matrix: where = 1, 1 ≤ i ≤ n_z, and n_z represents the size of the z-th chunk. In addition, we can also obtain the weight matrix ω¹ = [ω₁₁, ω₂₂, …, ω_j1] of the features. The feature weight is By minimizing the objective function, the formulas of membership u_ci and centroid after clustering chunk X¹ are(2)When z > 1, the centroids of previous chunk are added to current chunk, and a new chunk X^z′ = [Δ_z-1, X^z] is obtained. Let be the weights of elements in X^z′, and it contains C weights of centroids from the previous chunk and n_z weights of objects in current chunk. Then we implement WFRFCM clustering on this new chunk; and the corresponding data weight can be calculated asHere = 1, and the feature weight can be updated as

By minimizing the objective function, we can get the formulas of membership u_ci and centroid v_cj:

The pseudocode of SPFRFCM algorithm is as follows: Step 1: Initialize some parameters including the number of clusters C, fuzzy index m = 2, membership matrix U⁽⁰⁾, characteristic weight W⁽⁰⁾ = [ω_j]_1 × D, number of iterations t = 1, parameter ε₁ = 1/(DN)^1/2, and parameter ε₂ = 1 × 10⁻⁸. Step 2: Divide the dataset into s chunks, that is, X = [X¹, X², …, X^s]; initialize the data weight, that is, = [1, 1, …, 1], and set z = 1. Step 3: Calculate V^(t) with formula (19) or (23). Step 4: Calculate W^(t) with formula (16) or (20). Step 5: Calculate ω_j with formula (17). Step 6: If ω_j < ε₁, delete the j-th feature; update the number of data features d_new = D-d_r, where d_r represents the number of features to be deleted. Step 7: Calculate U^(t) with formula (18) or (22). Step 8: z < s: if ║U^t − U^t−1║>ε₂, make t + 1⟶t and go to Step 3; otherwise, add the centroids and weights to the next chunk, obtain a new chunk, and make z + 1⟶z and go to Step 3. Step 9 z = s: if ║U^t − U^t−1║ > ε₂, make t + 1⟶t and go to Step 3; otherwise, this algorithm ends.

3.2. OFRFCM

OFRFCM is implemented based on the online incremental framework. Firstly, s chunks are clustered, respectively, and then the centroids and weights of each chunk are merged to form a new chunk and weight matrix. This new chunk can be expressed as , and the weight of each chunk after clustering is expressed as z = [w1 1, w1 2, …, w1 c, w2 1, w2 2, …, w2 c, …, ws 1, ws 2, …, ws c]. The specific steps are as follows:(1)Cluster each chunk, and obtain the centroid Δ_z = [_1z, _2z, …, _cz] and the corresponding weight matrix: In addition, the feature weight matrix ω^z = [ω_1z, ω_2z, …, ω_jz] is also obtained. The feature weight is By minimizing the objective function, we obtain the formulas of membership u_ci and centroid v_cj:(2)The centroids with different weights obtained from each chunk form a new dataset, on which the WFRFCM algorithm will be implemented again to generate the final results.

The formulas of membership u_ci and centroid _cj obtained from the new chunk arewhere 1 ≤ i ≤ n_p and n_p is the number of objects in the new chunk.

The clustering process of OFRFCM algorithm can be described as the following pseudocode: Step 1: Initialize some parameters including the number of clusters C, fuzzy index m = 2, membership matrix U⁽⁰⁾, feature weight W⁽⁰⁾ = [ω_j]_1 × D, iteration time t = 1, centroid matrix X_new, data weight Q, parameter ε₁ = 1/(DN)^1/2, and parameter ε₂ = 1 × 10⁻⁸. Step 2: Divide the dataset into s chunks, that is, X = [X¹, X², …, X^s]; initialize the data weight, that is, = [1, 1, …, 1], and set z = 1. Step 3: Calculate V^(t) with formula (27) or (29). Step 4: Calculate W^(t) with formula (24). Step 5: Calculate ω_j with formula (25). Step 6: If ω_j < ε₁, delete the j-th feature; update the number of data features d_new = D-d_r, where d_r represents the number of features to be deleted. Step 7: Calculate U^(t) with formula (26) or (28). Step 8: z ≤ s: if ║U^t-U^t−1║>ε₂, make t + 1⟶t and go to Step 3; otherwise, add the centroids and weights to the centroid matrix X_new and object weight matrix Q, respectively, make z+1⟶z, and go to Step 3. Step 9 z > s: if ║U^t-U^t−1║>ε₂, make t + 1⟶t and go to Step 3; otherwise, this algorithm ends.

4. Experiments

4.1. Experimental Preparation

In order to verify the effectiveness of our algorithms, six datasets and six incremental comparison clustering algorithms are selected for experiments. The specific information of each dataset is listed in Table 1; and the comparison algorithms include three single-pass algorithms and three online algorithms. The three single-pass algorithms are SPFCM, SPHFCM, and SPFCOM (Single-Pass Fuzzy C-Ordered-Means clustering) [26], and three online algorithms are OFCM, OHFCM, and OFCOM (Online Fuzzy C-Ordered-Means clustering) [26]. SPFCOM and OFCOM are two incremental fuzzy c-ordered-means clustering algorithms using single-pass and online architectures, respectively, and these two algorithms can achieve better robustness. The final experimental results will be evaluated by two evaluation criteria, accuracy (AC) and F-Measure (FM) [27].

4.2. Experimental Results

The experiments are divided into three parts. The first part evaluates clustering accuracy by comparing the clustering results of our algorithms with six comparison algorithms, the second part of experiments verifies the effectiveness of the dimension reduction, and the third part evaluates the robustness of our algorithms.

4.2.1. Algorithm Accuracy

This group of experiments first compares our SPFRFCM algorithm with three single-pass comparison algorithms and then compares our OFRFCM algorithm with three online comparison algorithms. In our experiment, the chunk size is set to 5%, 10%, 20%, and 50% of the total number of objects in each dataset, so as to verify the clustering performance under different chunk sizes. In addition, the experiments adopt the same parameter setting as FRFCM algorithm, and the final clustering result is the average value of each algorithm running independently for 10 times. The experimental results of four single-pass algorithms and four online algorithms are shown in Tables 2 and 3, respectively.

It can be seen from Table 2 that when the chunk size is 5%, compared with SPFCOM, SPHFCM, and SPFCM algorithms, the average FM value of SPFRFCM algorithm on six datasets is increased by 10.98%, 13.89%, and 18.55%, respectively, and the average AC index is increased by 10.97%, 12.71%, and 18.22%, respectively. When the chunk size is 10%, the average FM index of SPFRFCM algorithm is 13.65%, 15.12%, and 17.94% higher than those of SPFCOM, SPHFCM, and SPFCM algorithms, respectively, and the AC index is 9.93%, 11.96%, and 15.34% higher, respectively. When the chunk size is 20%, the improvement ranges of average FM index are 14.33%, 15.22%, and 14.28%, respectively, and the improvement ranges of AC index are 11.48%, 15.22%, and 12.35%, respectively. When the chunk size is 50%, the improvement ranges in terms of FM are 11.26%, 15.12%, and 19.02%, respectively, and the improvement ranges in terms of AC are 7.37%, 13.06%, and 17.53%, respectively. From the result distribution, SPFRFCM algorithm has greatly improved on such three datasets as WR, WB, and SE, and the FM and AC indexes have increased by 20.16% and 10.15%, respectively, on average. The improvements on GS, PB, and IR datasets are less, and the FM and AC indexes increase by 5.76% and 5.88%, respectively, on average.

Table 3 lists the clustering accuracy comparison results of the four online algorithms. It can be seen from this table that when the chunk size is 5%, the average FM index of the OFRFCM algorithm on the six datasets is 14.98%, 25.19%, and 14.65% higher than those of the OFCOM, OHFCM, and OFCM algorithms, respectively, and the AC index is 14.20%, 34.88%, and 18.66% higher, respectively. When the chunk size is 10%, the average FM index of OFRFCM algorithm is 13.13%, 22.95%, and 13.74% higher, and the AC index is 9.22%, 21.14%, and 16.37% higher, respectively. When the chunk size is 20%, the improvements of FM index of OFRFCM algorithm are 14.76%, 23.63%, and 13.49%, respectively, and the improvements of AC index are 11.45%, 31.20%, and 15.51%, respectively. When the chunk size is 50%, the improvement ranges of FM index are 13.40%, 26.72%, and 16.97%, respectively, and the improvement ranges of AC index are 9.77%, 33.29%, and 24.62%, respectively. From the result distribution, OFRFCM algorithm has greatly improved on SE, WR, and WB datasets, and the FM and AC indexes have increased by 27.40% and 25.68%, respectively, on average. The improvements in GS, PB, and IR datasets are less, and the FM and AC indexes increase by 8.20% and 14.37%, respectively, on average.

Through this group of experiments, it can be seen that both SPFRFCM and OFRFCM algorithms obtain higher accuracy than the comparative experimental algorithms on the experimental dataset, and the improvements are obvious on some datasets. The reason is that some features with low weights may play a negative role in the clustering process. Therefore, it may help improve clustering accuracy by filtering out these features using feature reduction methods.

4.2.2. Feature Reduction

In the process of feature reduction, the features whose weights are lower than the threshold will be discarded. In order to verify the effectiveness of the feature reduction, the second part of experiments also sets the chunk size with 5%, 10%, 20%, and 50% of the total number of objects in each dataset and records the number of features. The experimental results are shown in Figures 1 and 2.

Figures 1 and 2 describe the performances of SPFRFCM and OFRFCM algorithms in the process of feature reduction, respectively. On IR dataset, both algorithms reduce data features by 50%. On WB dataset, SPFRFCM algorithm reduces features by up to 70%, while OFRFCM algorithm reduces features by up to 33%. On WR dataset, SPFRFCM algorithm reduces the number of features by 73%, while OFRFCM algorithm reduces it by 9%. On PB dataset, both algorithms do not reduce the number of features. On SE dataset, SPFRFCM algorithm is better than OFRFCM algorithm in feature reduction when the chunk size is 5% and 10%, and they are equivalent when the chunk size is 20% and 50%. On GS dataset, the number of features of SPFRFCM algorithm is reduced to 22 on average, while the number of features of OFRFCM algorithm is reduced to 120 on average. It can be seen that, in our experiments, the feature reduction method can effectively reduce the number of features.

In addition, it can be seen from the first part of experiments that the clustering accuracy with feature reduction has been improved, which indicates that some noise features are filtered in the feature reduction process. Obviously, noisy features will not only affect the clustering accuracy but also reduce the clustering efficiency. When noisy features are filtered, the robustness of algorithms will be improved, and the clustering accuracy is also easy to be improved.

After feature reduction, clustering time will be reduced, and thus clustering efficiency will be improved. The average running time of each algorithm in the experiments is shown in Table 4. It can be seen from this table that, for high-dimensional datasets, compared with other single-pass comparison algorithms SPFCOM, SPHFCM, and SPFCM, the running time of SPFRFCM algorithm is reduced by 70.2%, 12.6%, and 38.5%, respectively. The running time of OFRFCM algorithm is equivalent to that of other online comparison algorithms and has not been significantly improved, which is related to the incremental strategy. Due to the different incremental strategies of single-pass and online, when clustering each chunk, the single-pass incremental algorithm will update the feature weight and filter out the low weight features. The updated features participate in the calculation of subsequent chunks and directly improve the clustering efficiency of subsequent chunks. Therefore, when SPFRFCM completes the clustering task of the whole dataset, it can greatly reduce the clustering time. However, the online incremental algorithm groups each chunk separately, and the low weight features in each chunk are filtered out in the clustering process. It can be seen that the feature reduction of each chunk is relatively independent, resulting in little change in the overall efficiency. Therefore, the performance of single-pass incremental algorithm in feature reduction and efficiency improvement is better than online incremental algorithm.

4.2.3. Robustness

This part of experiments aims to evaluate the robustness of our algorithms. In our experiments, the size of the data chunk is set to 50%, and the noise data is added to 6 datasets in the proportions of 5%, 10%, 20%, and 50% for clustering, respectively. The experimental results are shown in Figures 3–8.

(a)

(b)

(a)

(b)

(a)

(b)

(a)

(b)

(a)

(b)

(a)

(b)

For the four single-pass algorithms, when the noise data proportion is 5%, the average FM values of SPFRFCM algorithm on the six datasets are 8.86%, 6.40%, and 1.75% higher than those of SPFCM, SPHFCM, and SPFCOM algorithms, respectively, and the AC values are 8.18%, 10.65%, and 2.3% higher, respectively. When the noise data proportion is 10%, the mean improvements of SPFRFCM algorithm in terms of FM index are 8.10%, 9.81%, and 2.4%, respectively, and the mean performances in terms of AC index are 6.10% and 10.30% higher than those of SPFCM and SPHFCM, respectively and 0.67% lower than that of SPFCOM algorithm. When the proportion is 20%, the mean values of FM index of SPFRFCM algorithm are 6.28% and 7.85% higher than those of SPFCM and SPHFCM algorithms and 1.01% lower than that of SPFCOM algorithm, and the mean values of AC index are 8.05% and 11.1% higher than those of SPFCM and SPHFCM algorithm and 0.4% lower than that of SPFCOM algorithm. When the noise data proportion is 50%, the mean values of FM index are 7.60% and 6.93% higher than those of SPFCM and SPHFCM algorithms and 4.27% lower than that of SPFCOM algorithm, and the mean values of AC index are 7.95% and 9.56% higher than those of SPFCM and SPHFCM algorithm and 4.83% lower than that of SPFCOM algorithm. The above experimental results show that the robustness of SPFRFCM algorithm is higher than that of SPFCM algorithm and that of SPHFCM algorithm. Compared with SPFCOM algorithm, when the noise ratio is less than 10%, the robustness of SPFRFCM algorithm is better; and when the noise ratio is between 10% and 20%, there is little difference between these two algorithms. However, when the noise ratio is higher than 20%, the robustness of SPFCOM algorithm is higher than that of SPFRFCM algorithm. We know that the Fuzzy C-Ordered-Means (FCOM) clustering is famous for its scenery and its insensitivity to the presence of noise and outliers in data, which is further verified by our experiments.

For the four online algorithms, when the noise data proportion is 5%, the average FM index values of OFRFCM algorithm on the six datasets are 5.38%, 8.18%, and 1.53% higher than those of OFCM, OHFCM, and OFCOM algorithms, respectively, and the AC index values are 6.81%, 10.56%, and 1.85% higher, respectively. When the noise data proportion is 10%, the average FM index values of OFRFCM algorithm are 7.33%, 10.86%, and 3.55% higher, respectively, and the AC index values are 7.88%, 11.28%, and 0.98% higher, respectively. When the proportion is 20%, the mean improvements of OFRFCM algorithm in terms of FM index are 7.20%, 9.00%, and 0.26%, respectively, and the mean improvements in terms of AC index are 9.98%, 11.40%, and −0.98%. When the noise data proportion is 50%, the mean values of FM index of OFRFCM algorithm are 5.66%, 8.18%, and 1.53% higher than those of OFCM, OHFCM, and OFCOM algorithms, respectively, and the mean values of AC index are 8.01% and 8.20% higher than those of OFCM and OHFCM, respectively, and 3.17% lower than that of OFCOM algorithm. This part of experiments show that the robustness of OFRFCM algorithm is better than that of OFCM algorithm and that of OHFCM algorithm and proves the better robustness performance of FCOM again.

5. Conclusions

While the scale of dataset is becoming larger and larger, the dimension is also increasing. Traditional clustering algorithms are difficult to deal with large-scale and high-dimensional data. Therefore, two incremental fuzzy c-means clustering algorithms, named SPFRFCM and OFRFCM, are proposed. The former divides the whole dataset into several chunks and clusters the chunks successively based on WFRFCM algorithm, and the clustering results of the previous chunk participate in the clustering process of the next chunks. The latter clusters each chunk based on WFRFCM algorithm and then merges clustering results of all chunks to cluster again. These two algorithms combine the advantages of incremental framework and feature reduction. They not only have the ability to deal with large-scale data but also can reduce feature dimension and help to improve the robustness and efficiency of clustering through feature reduction. In order to evaluate the effectiveness of these two algorithms, experiments were carried out on six datasets. Experimental results show that SPFRFCM algorithm can effectively reduce the feature dimension through feature reduction and obtain higher clustering accuracy and efficiency than the comparison algorithms. In OFRFCM, although the clustering efficiency has not been significantly improved due to the strategy, the feature dimension has also been reduced and the clustering accuracy has also been improved.

Data Availability

All the datasets used in this paper are derived from the UCI (University of California Irvine) Machine Learning Repository. Please visit https://archive.ics.uci.edu/ml/datasets.php.

Conflicts of Interest

The authors claim that there are no conflicts of interest in terms of the publication of this paper.

Acknowledgments

The authors would like to thank members of the IR&DM Research Group from Henan Polytechnic University for their invaluable advice that makes this paper successfully completed. The authors would like to acknowledge the support of National Science Fund subsidized project under Grant no. 61872126.

References

L. A. Zadeh, “Fuzzy sets,” Information and Control, vol. 8, no. 3, pp. 338–353, 1965.
View at: Publisher Site | Google Scholar
E. H. Ruspini, “A new approach to clustering,” Information and Control, vol. 15, no. 1, pp. 22–32, 1969.
View at: Publisher Site | Google Scholar
J. C. Dunn, “A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters,” Journal of Cybernetics, vol. 3, no. 3, pp. 32–57, 1973.
View at: Publisher Site | Google Scholar
J. C. Bezdek, Objective Function clustering[M]//Pattern Recognition with Fuzzy Objective Function Algorithms, Springer, Boston, MA, pp. 43–93, 1981.
View at: Publisher Site
J. C. Bezdek, R. Ehrlich, and W. F. C. M. Full, “The fuzzy c-means clustering algorithm[J],” Computers & Geosciences, vol. 10, no. 2-3, pp. 191–203, 1984.
View at: Publisher Site | Google Scholar
M. S. Yang and Y. Nataliani, “A feature-reduction fuzzy clustering algorithm based on feature-weighted entropy[J],” IEEE Transactions on Fuzzy Systems, vol. 26, no. 2, pp. 817–835, 2017.
View at: Google Scholar
N. Van Pham, L. T. Pham, W. Pedrycz et al., “Feature-reduction fuzzy co-clustering approach for hyper-spectral image analysis[J],” Knowledge-Based Systems, vol. 216, no. 2021, Article ID 106549, 2021.
View at: Publisher Site | Google Scholar
D. Kshirsagar and S. Kumar, “An efficient feature reduction method for the detection of DoS attack,” ICT Express, vol. 7, no. 3, pp. 371–375, 2021.
View at: Publisher Site | Google Scholar
Z. Sun, H. Huo, J. Huan, and J. S. Vitter, “Feature reduction based on semantic similarity for graph classification,” Neurocomputing, vol. 397, pp. 114–126, 2020.
View at: Publisher Site | Google Scholar
R. Jain and W. Xu, “RHDSI: a novel dimensionality reduction based algorithm on high dimensional feature selection with interactions,” Information Sciences, vol. 574, pp. 590–605, 2021.
View at: Publisher Site | Google Scholar
K. Malarvizhi and K. Amshakala, “Feature linkage weight based feature reduction using fuzzy clustering method[J],” Journal of Intelligent and Fuzzy Systems, vol. 40, no. 3, pp. 1–10, 2021, Preprint.
View at: Publisher Site | Google Scholar
J. Kianat, M. A. Khan, M. Sharif, T. Akram, A. Rehman, and T. Saba, “A joint framework of feature reduction and robust feature selection for cucumber leaf diseases recognition,” Optik, vol. 240, Article ID 166566, 2021.
View at: Publisher Site | Google Scholar
M. P. Uddin, M. A. Mamun, and M. A. Hossain, “PCA-based feature reduction for h remote sensing image classification,” IETE Technical Review, vol. 38, no. 4, pp. 377–396, 2021.
View at: Publisher Site | Google Scholar
A. V. Ushakov and I. Vasilyev, “Near-optimal large-scale k-medoids clustering,” Information Sciences, vol. 545, no. 3, pp. 344–362, 2021.
View at: Publisher Site | Google Scholar
Y. Yang, S. Deng, J. Lu et al., “GraphLSHC: towards large scale spectral hypergraph clustering,” Information Sciences, vol. 544, pp. 117–134, 2021.
View at: Publisher Site | Google Scholar
Y. Yin, Y. Zhao, H. Li, and X. Dong, “Multi-objective evolutionary clustering for large-scale dynamic community detection,” Information Sciences, vol. 549, pp. 269–287, 2021.
View at: Publisher Site | Google Scholar
A. Keshavarzi, A. Toroghi Haghighat, and M. Bohlouli, “Clustering of large scale QoS time series data in federated clouds using improved variable Chromosome Length Genetic Algorithm (CQGA),” Expert Systems with Applications, vol. 164, Article ID 113840, 2021.
View at: Publisher Site | Google Scholar
Y. Chen, L. Zhou, N. Bouguila, C. Wang, Y. Chen, and J. Du, “BLOCK-DBSCAN: f,” Pattern Recognition, vol. 109, Article ID 107624, 2021.
View at: Publisher Site | Google Scholar
P. Hore, L. O. Hall, and D. B. Goldgof, “Single pass fuzzy c means,” in Proceedings of the 2007 IEEE International Fuzzy Systems Conference, pp. 1–7, IEEE, London, UK, July 2007.
View at: Google Scholar
P. Hore, L. O. Hall, and D. B. Goldgof, “A scalable framework for segmenting magnetic resonance images[J],” Journal of signal processing systems, vol. 54, no. 1, pp. 183–203, 2009.
View at: Publisher Site | Google Scholar
C. H. Kim, M. S. Kim, and J. J. Lee, “Incremental hyperplane-based fuzzy clustering for system modeling,” in Proceedings of the IECON 2007-33rd Annual Conference of the IEEE Industrial Electronics Society, pp. 614–619, IEEE, Taipei, Taiwan, November 2007.
View at: Google Scholar
H.-J. Xing and M.-H. Ha, “Further improvements in feature-weighted fuzzy C-means,” Information Sciences, vol. 267, pp. 1–15, 2014.
View at: Publisher Site | Google Scholar
Y. Unal, K. Polat, and H. Erdinc Kocer, “Pairwise FCM based feature weighting for improved classification of vertebral column disorders,” Computers in Biology and Medicine, vol. 46, pp. 61–70, 2014.
View at: Publisher Site | Google Scholar
K. Polat, “Classification of Parkinson’s disease using feature weighting method on the basis of fuzzy C-means clustering[J],” International Journal of Systems Science, vol. 43, no. 4-6, pp. 597–609, 2012.
View at: Publisher Site | Google Scholar
M.-S. Yang and K. P. Sinaga, “Collaborative feature-weighted multi-view fuzzy c-means clustering,” Pattern Recognition, vol. 119, Article ID 108064, 2021.
View at: Publisher Site | Google Scholar
Y. Liu, C. Guo, H. Wang, and H. Chao, “Incremental fuzzy C-ordered means clustering[J],” Journal of Beijing University of Posts and Telecommunications, vol. 41, no. 4, pp. 29–36, 2018.
View at: Google Scholar
A. Maratea, A. Petrosino, and M. Manzo, “Adjusted F-measure and kernel scaling for imbalanced data learning,” Information Sciences, vol. 257, pp. 331–341, 2014.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Yongli Liu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Journal of Electrical and Computer Engineering

Incremental Fuzzy Clustering Based on Feature Reduction

Abstract

1. Introduction

2. Related Work

2.1. FRFCM

2.2. Incremental Clustering Based on FCM

3. Incremental Fuzzy Clustering Based on Feature Reduction

3.1. SPFRFCM

3.2. OFRFCM

4. Experiments

4.1. Experimental Preparation

4.2. Experimental Results

4.2.1. Algorithm Accuracy

4.2.2. Feature Reduction

4.2.3. Robustness

5. Conclusions

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright