A Feature Selection Method of the Island Algorithm Based on Gaussian Mutation

Han, Li; Xu, Hongsheng; Ma, Jiming; Jia, Zechen

doi:https://doi.org/10.1155/2022/1438999

Wireless Communications and Mobile Computing

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Explorations in Pattern Recognition and Computer Vision for Industry 4.0

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 1438999 | https://doi.org/10.1155/2022/1438999

A Feature Selection Method of the Island Algorithm Based on Gaussian Mutation

Li Han,¹Hongsheng Xu,¹Jiming Ma,¹and Zechen Jia¹

Academic Editor: Kalidoss Rajakani

Received14 Jan 2022

Accepted21 Feb 2022

Published12 Mar 2022

Abstract

With the development of the Internet, the data we are dealing with is becoming more and more complex, which brings about various difficulties and problems when we process and use data. Feature selection methods are able to filter and remove redundant features, which is necessary to reduce the dimension of complex data. In this paper, the island algorithm is used to find the optimal feature subset in the set of feature subsets, but as the number of iterations increasing, the island algorithm tends to local optimization. To address this problem, a Gaussian mutation strategy is introduced to improve the island algorithm and proposed an island algorithm based on Gaussian mutation (IAGM). The main idea of the IAGM algorithm is to set a warning sign which aims to record the global optimum value generated by each iteration and judge whether the changes are less than a threshold value in the optimum value for three consecutive iterations. If it is less than the global optimum, the Gaussian mutation is applied to the current globally optimal plant location and then evaluates the new location and exchanges them when it is better than the global optimum. Otherwise, discard this plant and add a Gaussian mutation to the formula of the next new plant towards the global optimum, which is used to diversify the populations. Meanwhile, combining the IAGM algorithm with a support vector machine classifier, a feature selection method of the island algorithm based on Gaussian mutation (IAGMFS) is proposed. The UCI dataset is selected for simulation experiments and applied to classify images, which represents the advantages of the feature selection in IAGMFS proposed in this paper.

1. Introduction

In recent years, with the rapid development of technology, a lot of high-dimensional data has emerged in various fields, and there are many irrelevant, redundant, and noisy features in this high-dimensional data [1, 2]; the presence of these features creates significant problems for us when processing and using the data, increasing the cost of computation. Many scholars have begun to research solutions to this problem, and some studies have shown that feature selection can be an effective solution, making feature selection an important step in machine learning and data mining [3]. Feature selection is the removal of redundant, irrelevant, and noisy features from the original features, while retaining the important features in the original features, thus reducing the cost of processing the data [4, 5].

The division of feature selection methods is mainly based on the evaluation method. Feature selection can be broadly divided into two types: the encapsulation method and the filtering method. The filtering method focuses on judging the goodness of a feature subset by the characteristics within that feature subset. The encapsulation method is essentially a classifier, which is used to classify the selected feature subset, and the classification accuracy is used to judge the merit of the feature subset [6, 7]. Feature selection requires us to select the optimal subset of features from a large number of feature subsets [8, 9], which requires global search techniques to achieve our goal.

With the development of group intelligence algorithms in recent years, there are more and more swarm intelligence algorithms proposed. For example, island algorithm (IA) [10], particle swarm optimization (PSO) [11], genetic algorithm (GA) [12], grey wolf optimizer (GWO) [13], and fruit fly optimization algorithm (FOA) [14]. The superiority-seeking capability of the population intelligence algorithm can be used exactly to find the optimal subset of features in the set of feature subsets. According to this feature, it makes group intelligence algorithms in the field of feature selection more and more attention from scholars, and scholars who study these have applied various group intelligence optimization algorithms to the feature selection problem [15, 16]. Peng et al. [17] proposed a neighbourhood rough set attribute approximation method based on reverse learning synergy and binary firefly swarm optimization algorithm. Lin et al. [18] combined the improved monarch butterfly optimization algorithm with the -nearest neighbour classifier and proposed a feature selection method based on the improved monarch butterfly optimization algorithm. Mafarja and Mirjalili [19] combine incremental hill climbing techniques with the binary Antlion algorithm and propose a hybrid binary Antlion optimizer for feature selection using rough sets and approximate entropy approximation.

The island algorithm is a new population intelligence optimization algorithm based on the phenomenon that the growing positions of plants on islands become increasingly concentrated at the highest points as sea level rises and the size of the island shrinks. Its structure is simple, and convergence is fast. However, as the number of iterations increases, the range of the island decreases and the algorithm tends to fall into a local optimum, which reduces the performance of the algorithm. To address this problem, this paper proposes the island algorithm based on Gaussian mutation (IAGM) on the basis of the island algorithm. If the change in the global optimum is small for three consecutive iterations, the algorithm is considered to have fallen into a local optimum and a Gaussian mutation is applied to the current globally optimal plant position; then, the fitness value is calculated and exchanged if it is better than the global optimum; otherwise, the Gaussian mutation strategy is introduced in the next iteration to move the new plant to the global optimum in the formula for moving to the global optimum. This not only facilitates the plant to jump out of the local optimum solution but also increases the diversity of the population. Combining the IAGM algorithm with a support vector machine classifier, and make use of the better search ability of IAGM, a feature selection method of IA based on Gaussian mutation (IAGMFS) is proposed.

2. Basic Island Algorithm

The island algorithm is a new metaheuristic algorithm based on the laws of plant growth on islands. The sea island algorithm is divided into three phases: a phase of elimination, a phase of sea level rise, and a phase of equilibrium.

2.1. Elimination Phase

The main role of the elimination phase is to generate the number of plants to be eliminated for this iteration based on the amount of change in the island’s range, which is done by the elimination function. The independent variable of the elimination function is the amount of change in the extent of the island, and the value of the function is the number of plants eliminated. The range of the island is defined by a maximum and a minimum value, so the maximum number of eliminations in an iteration is the total number of plants minus two. In order to improve the performance of the algorithm, the following two issues are taken into consideration. First, if the range change amount is large, the algorithm tends to fall into a local optimum, and the range change amount in the next iteration should be reduced by reducing the number of eliminations. Second, when the range change amount is small, the algorithm converges too slowly and the range change amount for the next iteration should be increased by increasing the number of eliminations. In general, the elimination function should satisfy the following two requirements. (1)When the amount of range change is 0, the maximum number of eliminations(2)The elimination function is a decreasing function

The negative exponential function satisfies both of the above requirements, and therefore, the elimination function has chosen the negative exponential function. where is the amount of range change, is the maximum number of eliminations, and is the minimum number of eliminations.

2.2. Sea Level Rise Phase

This stage is based on the number of eliminations produced in the previous stage to raise the sea level, i.e., the reduction in the extent of the island. This phase generates a new island range, and the range changes the amount. The new range is determined by the range in which the plants that have not been eliminated grow. The new range is represented by the maximum value and the minimum value for each dimension. In order to improve the performance of the algorithm, the maximum value and minimum value of the range need to be extended according to the following equations:

The amount of change in the extent of the island is represented by the following equation: where and denote the maximum and minimum values of each dimension in the current iteration, and represent the maximum and minimum values of each dimension in the last iteration, respectively, and is a function that finds the vector 2 norm.

2.3. Balance Phase

The balance stage is to keep the total number of plants, and the main task is to produce and eliminate an equal number of new plants, replacing the worst plants in the species with new ones. To speed up the search and improve its accuracy, the resulting new plants are moved towards the global optimum by Equation (5), and the locations of these new plants are then evaluated. where is the position of the th new plant in the new range, is the global optimal position, is the dimension, and 2 is the parameter.

3. Improved Island Algorithm

3.1. Gaussian Mutation

The theoretical basis of Gaussian mutation is the Gaussian distribution function. Replace the original value with a random number from a normal distribution with mean and variance when performing the mutation [20]. It is clear from the properties of the normal distribution that the Gaussian variant is focused on searching local areas near individuals, strengthening the local search capability of the algorithm, and stepping outside the confines of the local optimum. Xiang et al. [21] use Gaussian mutation to increase the search space and improve the global search capability of particle swarms. Yunshui et al. [22] performed Gaussian variation on globally optimal individuals as a way to enhance the ability of individuals to jump out of local extremes. Gaussian mutation is the addition of a random perturbation term that obeys a Gaussian distribution to the original individual, as shown in the following equation: where in the formula is the standard Gaussian distribution.

3.2. Island Algorithm-Based Gaussian Mutation (IAGM)

During iterations of the sea island algorithm, the new plants themselves do not have a mechanism for mutation. If they are influenced by the local optimal solution, it is difficult for them to jump out by themselves, and in subsequent iterations, the new plant individuals keep approaching the local optimal solution, which thus affect the performance of the algorithm and will reduce the diversity of the population. The introduction of a mutation strategy is not only effective in avoiding the algorithm falling into a local optimum but also in increasing the diversity of the population. For the above problem that the island algorithm easily falls into local optimum, a Gaussian mutation strategy is introduced and used to jump out of the constraints of the local optimization. The main idea is to set up a warning sign in the algorithm to record the global optimum and the corresponding position for each iteration. If the global optimum is the same three times in a row or changes very little (), the algorithm is considered to fall into in a local optimum. At this time, Gaussian mutation is performed on the position corresponding to the current global optimal value, and a new individual is generated using the following equation: where is the global optimal position of the current iteration and is to generate a random number that follows a Gaussian distribution with 1 row and columns. The fitness value of the new individual is then calculated and compared to the fitness value of the original globally optimal individual. If it is better than the global best individual, then the location information of both is exchanged, and set the flag equal to 1. Otherwise, discard the new individual and set the flag equal to 0. Also, to enhance the local search and increase the diversity of the population, a Gaussian mutation is applied to the new plants produced in the next iteration and Equation (8) is used to move the plants towards the global optimum. where is the position of the th new plant in the new range, is the global optimal position, and is the dimension. is to generate a random number that follows a Gaussian distribution with 1 row and columns.

3.3. Algorithm Steps

The steps of the island algorithm based on Gaussian mutation are as follows. (1)Step 1: Initialize the island algorithm with a random set of solutions, population size , dimension , maximum number of iterations , maximum number of evaluations , maximum number of eliminations , minimum number of eliminations , island range , warning token that stores the optimal fitness value at the end of each iteration, and amount of change , flag to record whether Gaussian mutation is used(2)Step 2: Calculate the corresponding fitness values for each plant and rank them. Select the globally optimal fitness value and its corresponding position for this time and record the globally optimal fitness value into (3)Step 3: Enter the iterative process. Elimination phase: generates the number of eliminations according to Equation (1). Sea level rise phase: sea level rise generates a new range and extends the new range with Equations (2) and (3), which then generate the amount of range change according to Equation (4). Equilibrium phase: produces and eliminates the same number of new plants in the new range, replacing the worst plants in the population with new plants, moving according to Equation (5) if flag equals 1, otherwise moving according to Equation (8). The fitness value of the new plant is calculated, and if it is better than the optimal plant, exchange the location information of both. Finally, all plants are ranked and the global optimum is recorded in the warning sign (4)Step 4: If the change in the fitness value of the globally optimal plant for three consecutive times is less than the amount of change , then step 5 is executed; otherwise, step 6 is executed(5)Step 5: A new plant is generated by applying a Gaussian mutation to the position of the current global optimum using Equation (7) and calculating its fitness value. If it is better than the current global optimum, it is exchanged; otherwise, the new plant is discarded and the flag is set equal to 0(6)Step 6: Determine whether the algorithm has reached the termination condition. If the termination condition is reached, output the result and end the algorithm. Otherwise, step 3 is executed and the algorithm proceeds to the next iteration

4. Feature Selection Method Based on IAGM Algorithm

4.1. Plant Codes

As the sea island algorithm acts on continuous space and the features of the data are discrete, certain transformations are needed to allow the IAGM algorithm to perform discrete feature selection. In this paper, the function is used to binary code the plant locations. The range of individual position is limited to in the algorithm, so the plant position is composed of random numbers between . Each dimension of data in plant location is then mapped in between by the sigmoid function; that is, . At this time, each dimension of data in represents a feature. Then, define a threshold of 0.5 and divide the data in into two categories. When , set to indicate that the feature is selected; otherwise, let , which means that the feature is not selected. In this way, each plant position is represented by a set of binary.

4.2. Multiclassification Method of Support Vector Machine (SVM)

Due to the support vector machine, classifier itself can only be used for binary classification that largely limits the scope of use of the algorithm. The main idea behind the implementation of the SVM multiclassification function in this paper is as follows: Get the number of category in the dataset and train a classifier for each of the two classes in the class. Then, there are a total of binary classifiers. For each unclassified data, it needs to be predicted by all the binary classifiers and finally a voting method is used to decide which class it belongs to.

4.3. Adaptability Function

In the island algorithm, each individual plant can be viewed as a solution. If the solution has fewer features, the higher the classification accuracy, the better the solution. In this paper, the support vector machine classifier is used to obtain the classification accuracy of the solution and the number of features in the solution to construct the fitness function and also to balance the number of features selected with the importance of classification accuracy. The following fitness function has been chosen for this paper: where is the classification error rate obtained for a given classifier, is the number of features selected by the current solution, is the total number of features of the current solution, and and are the parameters, where is the importance of the classification error rate of the classifier and is the importance of the number of features selected. .

4.4. Feature Selection Algorithm

In this paper, we search for the optimal subset of features using the island algorithm with Gaussian mutation. Binary encoding of plant locations uses a function, where 1 means this feature is selected and 0 means this feature is not selected. The SVM classifier is used to construct the fitness function and evaluate the merit of a subset of features. The main steps of the algorithm are as follows. (1)Step 1: Initialization: population size , dimensionality , maximum number of iterations , maximum number of assessments , maximum number of eliminations , minimum number of eliminations , amount of change , flag, amount of range change , and random initialization of a set of solutions: , (2)Step 2: Read the data and divide the dataset into a training set and a test set(3)Step 3: A function is used to binary code each solution, and the number of features selected for this solution is also calculated(4)Step 4: The features selected for each solution can be obtained according to step 3. The selected features are used to select the corresponding data in the original dataset and brought into the classifier for prediction while calculating the classification error rate(5)Step 5: The number of selected features obtained by step 3 and the classification error rate obtained by step 4 are brought into the fitness function. The fitness value of each solution is calculated and ranked, and the fitness value of the globally optimal solution is put into the warning sign (6)Step 6: Finding the most characteristic subset with the IAGMFS algorithm(7)Step 7: Whether the algorithm satisfies the termination condition, the optimal solution, i.e., the optimal subset of features, is output; otherwise, the iteration continues

5. Experiments and Analysis of Experimental Results

5.1. Experimentation and Analysis of the IAGM Algorithm

5.1.1. Experimental Design for the Performance of the IAGM Algorithm

In order to verify the performance of the improved island algorithm, seven typical benchmark test functions were selected to test the performance of the algorithm in the experiments. The specific information of the function is shown in Table 1. is a single-peaked function: the main purpose of this function is to test the optimization of the algorithm. is a multimodal function: the main role is to test the global search capability of the algorithm. and are single-peaked functions: their main purpose is to test the convergence ability of the algorithm. and have many local extremes: their main purpose is to test the ability of a function to jump out of a local. is a discontinuous step function. The particle swarm optimization (PSO) algorithm and the original island algorithm (IA) were selected for comparison with the island algorithm based on Gaussian mutation (IAGM). The three algorithms were run 30 times independently on 30 and 50 dimensions, respectively. The best error value, the worst error value, the mean value, and the standard deviation of the 30 times obtained are used as a measure of the algorithm’s merit, where boldface indicates optimal. All three algorithm parameters are set as follows: population size 100 and number of iterations 200. The algorithms are all in Matlab R2016a software for Windows 10 with an i7-7700HQ 2.8 GHz CPU and 16 GB of RAM.

5.1.2. Experimental Results and Analysis

The three algorithms were run 30 times independently on search dimensions 30 and 50, respectively. The results of the 30 and 50 dimensional runs are shown in Tables 2 and 3.

As can be seen from the data in Table 2, in the function , the worst value of PSO algorithm is the best. The average value of the IA algorithm is the best, but the optimal value and standard deviation of the IAGM algorithm are the best. In the three functions , , and , the best value, worst value, mean, and standard deviation of the IAGM algorithm are all optimal. In the function , the IA algorithm is optimal in terms of its minimum value and standard deviation; however, the optimal and average values of the IAGM algorithm are the best. In the function , the PSO algorithm is optimal in terms of the worst value and standard deviation, but the IAGM algorithm leads in terms of both the best value and the mean. In the function , the PSO algorithm has the best worst value, but the IAGM algorithm achieves the function theoretical optimum at the optimal value, and the mean and standard deviation are also optimal for the IAGM algorithm. The analysis and summary of the results of the seven benchmark test functions run on 30 dimensions show that the IAGM algorithm has a good advantage over the PSO and IA algorithms.

It can be concluded from the data in Table 3 that the results for the 50 dimensions are very similar to those for the 30 dimensions. In the function , the worst value, average value and standard deviation of PSO algorithm are the best. In IAGM algorithm, only the optimal value is the best. The best value of the IA algorithm in the function is the best, the IAGM algorithm has the best worst value, mean, and standard deviation. In the function , the IA algorithm’s worst case and standard deviation are the best. The optimal and average values of the IAGM algorithm are the best. In the three functions , , and , the IAGM algorithm has the best value, the worst value, the mean, and the standard deviation ahead of the other two algorithms across the board. In the function , the IA algorithm has the best optimal value, while the IAGM algorithm is ahead of the other two algorithms in terms of the worst value, the mean, and the standard deviation. Although the IAGM algorithm is inferior to the PSO algorithm in the function, it has a certain lead in other functions. The experimental results in Tables 2 and 3 show that overall the IAGM algorithm has a good advantage over the IA and PSO algorithms. This also validates the optimization-seeking capability of the IAGM algorithm and shows that improvements to the IA algorithm are effective and feasible.

5.2. Experimentation and Analysis of IAGMFS

5.2.1. Introduction to the Dataset

A set of UCI datasets containing six datasets is used in this section to test the performance of the IAGMFS algorithm [23]. The specific information (name of the dataset, number of samples, and number of features) for these six datasets is shown in Table 4.

The classification algorithms used in this paper also use tenfold cross-validation [24] to determine the classification accuracy. Tenfold cross-validation is the random division of each dataset into 10, each copy has an equal number of data samples, and in turn, one of the subsets of data is taken out as the test set and the other nine as the training set, so that each subset of data is cycled through 10 times and each subset of data is taken as the test set. Parameter settings in the experiment are as follows: , , , and .

5.2.2. Experimental Design of the IAGMFS Algorithm

The value of the fitness function is the key to evaluating the goodness of the plant location, and a suitable fitness function is important for the feature selection algorithm itself. The fitness function constructed in this paper relies on the classification accuracy of each solution obtained by the classifier and the number of features selected in the solution. In constructing the fitness function, the following experiment is designed in order to select the appropriate classifier. Three different classifiers, KNN classifier, plain Bayesian classifier, and support vector machine classifier, were used as classifiers for constructing the fitness function. Construct the feature selection algorithm based on KNN classifier (KIAGM), the feature selection algorithm based on naive Bayes classifier (NIAGM), and the feature selection algorithm proposed in this paper (IAGMFS). The three algorithms were run 10 times independently on each of the above datasets, and the average fitness value, average classification accuracy, and average number of features were used as evaluation criteria. In the table, — indicates that this item is not available (the NIAGM algorithm was unable to pick the results on the zoo dataset). The experimental results are shown in Table 5.

As can be seen from Table 5, on the breast cancer and vehicle datasets, the IAGMFS algorithm was ahead of the other two algorithms in terms of average classification accuracy, average fitness value, and average number of features. On the spectfheart and wine datasets, the IAGMFS algorithm and the NIAGM algorithm have the same and better average number of features than the KIAGM algorithm, but the IAGMFS algorithm significantly outperforms the NIAGM algorithm in terms of average classification accuracy and average fitness value. On the tic-tac-toe dataset, the NIAGM algorithm outperformed the other two algorithms in terms of average fitness values, while the IAGMFS algorithm led in both average classification accuracy and average number of features. On the zoo dataset, the KIAGM algorithm was ahead of the IAGMFS algorithm in terms of average classification accuracy, but the average fitness value and average number of features were not as good as the IAGMFS algorithm. Overall, the IAGMFS algorithm outperformed the other two in terms of classification accuracy and the number of features selected and also validated that the support vector machine classifier is more suitable as a classifier in the constructor.

To verify the feasibility of the IAGMFS algorithm, design the following experiment: the algorithm was run independently on each of the above six datasets, the number of iterations and the corresponding fitness values were recorded, and the fitness convergence curves were plotted. The results of the experiment are shown in Figure 1.

This can be seen in Figure 1. On the vehicle dataset, the algorithm starts to converge around 60 iterations. On the spectfheart dataset, the algorithm started to converge in about 40 iterations. On the breast cancer dataset, the algorithm started to converge in about 20 iterations. On the tic-tac-toe dataset, the algorithm starts to converge in about 20 iterations. On the wine dataset, the algorithm starts to converge in about 75 iterations. On the zoo dataset, the algorithm starts to converge in about 65 iterations. Overall, the algorithm decreases in fitness values with the number of iterations and converges after a certain number of iterations on all six datasets. This verifies that the IAGMFS algorithm is valid and feasible.

5.2.3. IAGMFS Algorithm Compared to Other Feature Selection Algorithms

Based on the dataset in Table 3, experiments were conducted using the PSORSFS [25] algorithm, the FSARSR [26] algorithm, the FSRSWOA algorithm, and the IAGMFS algorithm proposed in this paper. The population size in the experiment was 30, and the number of iterations was 100. The four algorithms were run 10 times independently on each of the above datasets, and the average and optimal values of the 10 results were used as evaluation metrics. The results of the experiment are shown in Table 6. The experimental results for the comparison algorithm are taken from the literature [23].

This can be derived from Table 6. In the four datasets of breast cancer, spectfheart, vehicle, and tic-tac-toe, the IAGMFS algorithm outperformed the other three algorithms in terms of both the average classification accuracy and the optimal value. On the wine dataset, FSRSWOA has the best value ahead of the other three algorithms, but is not as good as the IAGMFS algorithm in terms of average classification accuracy. On the zoo dataset, FSRSWOA had the best average classification accuracy and FSARSR had the best optimal value. Although the IAGMFS algorithm is slightly inferior to the FSRSWOA and FSARSR algorithms in terms of average classification accuracy and optimal values on the zoo dataset, it is ahead of the other algorithms in terms of average classification accuracy on all other datasets. Overall, the IAGMFS algorithm is ahead of the other three algorithms in terms of classification accuracy, indicating that the IAGMFS algorithm has a good advantage in feature selection.

6. Applications in Image Classification

6.1. Image Classification

Through the analysis and conclusion of the above experimental results, it can be concluded that the IAGMFS algorithm proposed in this paper has good advantages in feature selection. Image classification focuses on feature extraction, feature selection, and classification. The extraction of features is divided into extraction of image texture features, extraction of colour features, and extraction of shape features [27]. Feature selection can effectively remove those redundant features, reducing the computational complexity while improving the classification accuracy. However, feature selection generates a large collection of feature subsets, from which we need to find the optimal feature subset. The population intelligence optimization algorithm has a better global search capability, so the population intelligence optimization algorithm is widely used in image classification. The particle swarm optimization (PSO) algorithm, a classical intelligent optimization algorithm, has been widely applied to image classification. In this paper, the IAGMFS algorithm is applied to image classification. Features are first extracted from the image, and then, the IAGMFS algorithm is used to filter the extracted features, eliminate the redundant features, and finally perform the classification.

6.2. Experiments and Analysis of Experimental Results

In order to verify the superiority of the IAGMFS algorithm in image classification, the particle swarm optimization algorithm, which is widely used in image classification, is used as a comparison in this paper. The LBP algorithm was used to extract the texture features of the images. The three were compared using the N-BPSO [28] algorithm, the IAGMFS algorithm, and all features. The three methods were run 10 times independently, and the classification accuracy and the number of features selected were recorded for each time. The images used for the experiments are from Caltech101 and are all flat stills. This is shown in Table 7. Experiments were carried out on Windows 10 with Matlab R2016a software, a CPU of i7-7700HQ 2.8 GHz, and 16 GB of RAM. The results of the experiment are shown in Table 8.

As can be seen from the data in Table 8, in each category, the IAGMFS algorithm was ahead of the other two in terms of average classification accuracy. In the category of dogs, the average classification accuracy of the IAGMFS algorithm is higher than that of the N-BPSO algorithm, although the average number of features is not as high as that of the N-BPSO algorithm. Overall, the IAGMFS algorithm improves classification accuracy while eliminating redundant features. This shows that the IAGMFS algorithm has some advantages in image classification.

7. Conclusion

In order to enhance the performance of the island algorithm for finding the best, this paper introduces a Gaussian mutation strategy into the island algorithm. A warning sign is set up to record the global optimum after each iteration, and when the global optimum changes by a small amount in three consecutive iterations, a Gaussian mutation is applied to the current optimum position as a way to avoid the algorithm falling into a local optimum. Also, new plants that meet certain conditions undergo Gaussian mutation to increase the diversity of the population. The improved island algorithm is combined with a support vector machine classifier to propose a feature selection algorithm based on the Gaussian variational island algorithm. The algorithm uses the classification accuracy of each solution obtained by the support vector machine classifier and the number of features selected in the solution to evaluate the goodness of the plant location, finding the most characteristic subset with the improved island algorithm. The performance of the algorithm is verified by a two-part experiment. The experimental results show that the IAGMFS algorithm can find a subset of features with high classification accuracy for a small number of features. In order to adapt to the needs of big data, the next step is to consider introducing the idea of parallelization to improve the computation speed.

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Acknowledgments

This paper was supported by the Henan Provincial Science and Technology Research Project (212102210546).

References

L. Sun, L. Wang, Y. Qian, X. Jiucheng, and S. Zhang, “Feature selection using Lebesgue and entropy measures for incomplete neighborhood decision systems,” Knowledge-Based Systems, vol. 186, p. 104942, 2019.
View at: Publisher Site | Google Scholar
Y. F. Qiu, W. Wang, and D. Y. Liu, “Research on an improved CHI feature selection method,” Applied Mechanics and Materials, vol. 2111, 2013.
View at: Google Scholar
L. Brezočnik, I. Fister, and V. Podgorelec, “Swarm intelligence algorithms for feature selection: a review,” Applied Sciences, vol. 8, no. 9, p. 1521, 2018.
View at: Publisher Site | Google Scholar
E. Emary, H. M. Zawbaa, and A. E. Hassanien, “Binary grey wolf optimization approaches for feature selection,” Neurocomputing, vol. 172, pp. 371–381, 2016.
View at: Publisher Site | Google Scholar
F. Viegas, L. Rocha, M. Gonçalves et al., “A genetic programming approach for feature selection in highly dimensional skewed data,” Neurocomputing, vol. 273, pp. 554–569, 2018.
View at: Publisher Site | Google Scholar
E. A. Laura, S. dos Santana, and A. M. de Paula Canuto, “Filter-based optimization techniques for selection of feature subsets in ensemble systems,” Expert Systems with Applications, vol. 41, no. 4, pp. 1622–1631, 2014.
View at: Publisher Site | Google Scholar
G. Chandrashekar and F. Sahin, “A survey on feature selection methods,” Computers and Electrical Engineering, vol. 40, no. 1, pp. 16–28, 2014.
View at: Publisher Site | Google Scholar
M. Manonmani and S. Balakrishnan, “Feature selection using improved teaching learning based algorithm on chronic kidney disease dataset,” Procedia Computer Science, vol. 171, pp. 1660–1669, 2020.
View at: Publisher Site | Google Scholar
J. Pérez-Rodríguez, A. d. Haro-García, J. A. Romero del Castillo, and N. García-Pedrajas, “A general framework for boosting feature subset selection algorithms,” Information Fusion, vol. 44, pp. 147–175, 2018.
View at: Publisher Site | Google Scholar
J. Ma, Z. Song, S. Rijian, Z. Guoliang, C. H. Yang, and S. S. Jiao, “A metaheuristic algorithm: island algorithm,” Journal of Zhengzhou University (Engineering Edition), vol. 40, no. 4, pp. 54–60, 2019.
View at: Google Scholar
Z. Yong, G. Dun-wei, and Z. Wan-qiu, “Feature selection of unreliable data using an improved multi-objective PSO algorithm,” Neurocomputing, vol. 171, pp. 1281–1290, 2016.
View at: Publisher Site | Google Scholar
C. Luigi and A. Schettini Raimondo, “Genetic algorithm to combine deep features for the aesthetic assessment of images containing faces,” Sensors, vol. 21, no. 4, p. 1307, 2021.
View at: Publisher Site | Google Scholar
W. Zhengtong, C. Fengqin, and L. S. Juventus, “Grey wolf optimization algorithm based on somersault foraging strategy,” Computer Application Research, vol. 38, no. 5, pp. 1434–1437, 2021.
View at: Google Scholar
M. Mitić, N. Vuković, M. Petrović, and Z. Miljković, “Chaotic fruit fly optimization algorithm,” Knowledge-Based Systems, vol. 89, pp. 446–458, 2015.
View at: Publisher Site | Google Scholar
C. Yuanyuan, W. Zhibin, and W. Zhaoba, “Feature selection of infrared spectrum based on improved bat algorithm,” Infrared and Laser Engineering, vol. 43, no. 8, pp. 2715–2721, 2014.
View at: Google Scholar
W. Hongxia and X. Qiang, “Feature selection of high dimensional data based on weighted community detection and enhanced ant colony algorithm,” Computer Applications and Software, vol. 36, no. 9, pp. 285–292, 2019.
View at: Google Scholar
P. Peng, N. Zhiwei, Z. Xuhui, and X. Pingfan, “Attribute reduction method based on improved binary glowworm swarm optimization algorithm and neighborhood rough set,” Pattern Recognition and Artificial Intelligence, vol. 33, no. 2, pp. 95–105, 2020.
View at: Google Scholar
S. Lin, Z. Jing, X. Jiucheng, and X. Zhanao, “Feature selection method based on improved monarch butterfly optimization algorithm,” Pattern Recognition and Artificial Intelligence, vol. 33, no. 11, pp. 981–994, 2020.
View at: Google Scholar
M. M. Mafarja and S. Mirjalili, “Hybrid binary ant lion optimizer with rough set and approximate entropy reducts for feature selection,” Soft Computing, vol. 23, no. 15, pp. 6249–6265, 2019.
View at: Publisher Site | Google Scholar
Y. Li and Z. Zhao, “An improved bat algorithm based on cross-border relocation and Gaussian mutation,” Computer Engineering and Science, vol. 41, no. 1, pp. 144–152, 2019.
View at: Google Scholar
T. Xiang, D. Pan, and H. Pei, “Vehicle routing problem based on particle swarm optimization algorithm with gauss mutation,” American Journal of Software Engineering and Applications, vol. 5, no. 1, p. 1, 2016.
View at: Publisher Site | Google Scholar
Z. Yunshui, Y. Xiaoxue, and L. Junting, “Shuffled frog leaping and bat algorithm with gauss mutation,” Application Research of Computers, vol. 32, no. 12, pp. 3629–3633, 2015.
View at: Google Scholar
W. Shengwu and C. Hongmei, “Feature selection method based on rough sets and improved whale optimization algorithm,” Computer Science, vol. 47, no. 2, pp. 44–50, 2020.
View at: Google Scholar
D. Zhang, C. Score et al., “A new filter method for feature selection with pairwise constraints,” Pattern Recognition, vol. 41, no. 5, 2008.
View at: Google Scholar
X. Wang, J. Yang, X. Teng, W. Xia, and R. Jensen, “Feature selection based on rough sets and particle swarm optimization,” Pattern Recognition Letters, vol. 28, no. 4, 2007.
View at: Publisher Site | Google Scholar
Y. Chen, Q. Zhu, and H. Xu, “Finding rough set reducts with fish swarm algorithm,” Knowledge-Based Systems, vol. 81, pp. 22–29, 2015.
View at: Publisher Site | Google Scholar
F. Liu and J. Y. Gong, “A classification method for high spatial resolution remotely sensed image based on multi-feature,” Geography and Geographic Information Science, vol. 25, no. 3, pp. 19–41, 2009.
View at: Google Scholar
W. Lianhong and D. Min, “N-BPSO face recognition based on parameter synchronization optimization,” Computer Engineering and Design, vol. 40, no. 9, pp. 2601–2630, 2019.
View at: Google Scholar

Copyright

Copyright © 2022 Li Han et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies