Abstract
Environmental sound recognition has been a hot topic in the domain of audio recognition. How to select the optimal feature subsets and enhance the performance of classification precisely is an urgent problem to be solved. Ensemble learning, a new kind of method presented recently, has been an effective way to improve the accuracy of classification in feature selection. In this paper, experiments were performed on environmental sound dataset. An improved method based on constraint score and multimodels ensemble feature selection methods (MmEnFs) were exploited in the experiments. The experimental results show that when enough attributes are selected, the improved method can get a better performance compared to other feature selection methods. And the ensemble feature selection method, which combines other methods, can obtain the optimal performance in most cases.
1. Introduction
With the developing of artificial intelligence, Environmental Sound Recognition (ESR) has been the focus in the domain of speech recognition in recent years.
There are many environmental sounds in our daily life, such as birdcall, wind, thunder, and so on. By recognizing the sounds, a large amount of information can be obtained to understand our environment. Many applications can benefit from ESR, such as scene recognition, event detection, and surveillance systems [1]. By identifying the sounds happening in the scene, scene recognition is aimed at judging whether the scenes like the specific place such as home, square, field, or not [2]. Event detection is to identify the sounds occurring in our daily life such as talking, laughing, and police siren [3]. Surveillance systems are used to monitor the abnormal event by detecting danger sound during our daily life like gunshot, screams, alarm, and so on [4]. Traditionally, in the field of ESR, more attention has been paid to feature extraction, for example, Mel Frequency Cepstral Coefficients (MFCC), and Code Excited Linear Prediction (CELP). Chu et al. adopted a set of features with the Matching Pursuit (MP) technique; the main disadvantage of MP was the huge computational complexity [2]. Tsau et al. who used the different feature sets with the Bayesian network based on CELP achieved a better performance for ESR [5]. Zhang et al. proposed the aggregation of multiple classifiers for environmental audio classification and improved the accuracy of prediction evidently [6]. It is difficult to find the optimal subset for ESR. Most researchers concentrate on the feature extraction and single feature selection method; few focus on the subspace for feature selection with ensemble learning. The idea of ensemble feature selection was proposed by Saeys et al. [7]. Ensemble feature selection [8] has many irreplaceable advantages; in the past, single feature selection method was used to select the optimal subsets and then to evaluate the performance. For the same datasets, different methods will generate different results. Combining the different outputs of several methods, ensemble feature selection may get a more accurate result [9]. In this work, in order to validate that whether the ensemble method is superior to other feature selection methods solely in the aspect of accuracy, experiments have been performed on environmental sound dataset by using several different approaches in Weka [10], i.e., Correlation [11], GainRatio [12], InfoGain [13], OneR [14], ReliefF [15], SymmetricalUncert [16], and two other methods, namely, csFs and MmEnFs, proposed by us.
This paper is made up of five sections. Introduction is in Section 1. Background is in Section 2. Section 3 briefly introduces the constraint score, csFs, and the process of MmEnFs. Subsequently, the methods of experiments are described in detail and the quality of results are assessed in Section 4. Finally, we summarize the conclusion and the contribution of this work in Section 5.
2. Background
Feature selection is an important preprocessing step in many fields, such as data mining, pattern recognition, and machine learning [17, 18]. For a mass amount of datasets, it is a process to select a small amount of optimal feature subsets to enhance the performance of a specific model. Generally, one effective solution to reduce data dimensionality for feature selection is to remove irrelevant and redundant features [19, 20]. For the domain of supervised feature selection, it can be divided into three groups as follows: filters, wrappers, and embedded methods [17]. Filter methods evaluate the performance of features by using the training data, which are independent of any learning algorithm [21]. The evaluation criterion of features can be categorized into four types, namely, information, dependency, distance, and consistency [22]. Wrapper methods commonly use a specific learning algorithm to evaluate the features. And the embedded methods perform feature evaluation by using internal properties of the classification model [23]. With the development of feature selection techniques, many different learning algorithms are proposed to acquire a good performance in the domain of feature selection. In the process of feature selection, different algorithms will select different subsets and then yield different results.
In this paper, six different kinds of existed methods in Weka are mainly used to evaluate the performance of features. These methods are briefly introduced below. Correlation evaluates the value of an attribute by measuring the correlation between it and the class. And GainRatio evaluates the worth of an attribute by measuring the gain ratio with respect to the class. InfoGain assesses the value of an attribute by measuring the information gain with the class, while OneR uses the minimum error attribute to predict the performance [24]. ReliefF estimates the value of an attribute by repeatedly sampling an instance and considering the value of the given attribute for the nearest instance of the same and different class [25]. And SymmetricalUncert assesses the value of an attribute by measuring the symmetrical uncertainty with respect to the class.
Ensemble techniques are originally introduced to cope with the stability of feature selection techniques. In most cases, different feature subsets will generate different optimal results for high dimensional data with small sample size [23]. By using the ensemble feature selection, it can not only reduce the risk of selecting an unstable subset, but also avoid the problem of local optima. So the ensemble techniques are usually superior to the single model, when unstable models are combined [26, 27].
There are two essential points in ensemble learning: the first is the difference and the second is the diversity. To the best of our knowledge, in the aspect of feature selection, ensemble learning can be categorized into two types. One is to change the base learner to achieve ensemble; the other is to vary the samples of data to achieve ensemble [28, 29]. Both types of methods are all aims at getting a diverse set of feature selections. There are two key steps in constructing a feature selection ensemble. The first step is creating a set of different feature selectors, each providing an output, and then the second step aggregates the results of the single model [30].
3. Related Works
Constraint score [31] was first proposed by Zhang et al. in 2007. The prior knowledge of data can be obtained from two different approaches. The first is class labels, and the second is pairwise constraint score. In the past, in order to estimate the class of sample data precisely, we needed to know the details of class labels. Fortunately pairwise constraint score has changed this situation. It uses pairs of data samples to evaluate whether they belong to the same class (must-link) or the different classes (cannot-link) [32, 33].
The algorithm of constraint score makes use of some supervised information to select the most representative feature subsets. In this process, the key step is how to generate C and M. Given a dataset X, X = , x2,..., two subsets C and M, M and belong to the same ; C and belong to the different , then using the supervised information in C and M to get the n features in initial dataset X. Equation (1) is used to evaluate the performance of r-th feature.Here, represents the r-th feature of the i-th sample , i = 1,…,m; r = 1,…,n. In order to select features with the superior constraint preserving ability, in the same class, the distance between two samples in the same class is obviously shorter than that in the different classes(). So the regularization coefficient is set as 0.1 to coordinate the two classes.
3.1. The Proposed Method
The proposed method is an improvement of the constraint score for feature selection. Based on the constraint score, csFs method is presented to perform the experiments in this paper. At first, we select ten percent of columns in feature subsets as initial value. Then we employ the greedy method to add a column of attribute per time to compute its value in a forward sequence successively. Finally, from the perspective of columns of attribute set, we can get the range of peak value.
3.2. Ensemble Method
Different from the methods mentioned above, in this paper, we have tried another approach called multimodels ensemble feature selection. Firstly, several different kinds of feature selection methods are used to evaluate the performance of the training data, and then each generates a subset. Secondly, the single subset acquired before is used to train a model separately. Finally, we aggregate the results of the different models. Figure 1 illustrates the process of multimodels ensemble feature selection. According to the process of multimodels ensemble feature selection, the MmEnFs algorithm is shown below (Algorithm 1).
|

4. Experiments
4.1. Data of Experiments
In this section, environmental sound dataset is selected as experimental data. It is made up of five classes and collected from the wild. It contains different sounds of birds, wind, rain, frogs, and thunder, totally 23 attributes. And 13 dimensionalities of features are involved in MFCC, while other 10 variables are composed of CELP. The details of dataset are shown in Table 1.
4.2. Methods of Experiments
In these experiments, seven feature selection methods tested on the data are derived from the platform of development Weka, open source data mining software, with the default settings. The decision tree is selected as the base classifier. And the whole process of experiments was run on Matlab. By using the bootstrap [34], the dataset was divided into 75% training data and 25% testing data at a ratio of 3:1 approximately. Then seven kinds of feature selection methods were chosen to select the different ratios of subsets to perform the experiment (Tables 2, 4, 5, and 6). If the number of selected attributes is not an integer, it will round down. Because of the instability in constraint score, we have performed the experiment ten times to get the average value. The MmEnFs1 is an ensemble method, which includes six existed methods, namely, Correlation, GainRatio, InfoGain, OneR, ReliefF, and SymmetricalUncert, while MmEnFs2 is an aggregation of seven methods composed of Correlation, GainRatio, InfoGain, OneR, ReliefF, SymmetricalUncert, and csFs. Besides, as a comparison, we have built a decision tree classification model with the original data (Table 3). And the accuracy of decision tree is 0.686. Here, k denotes the ratio of selected features.
4.3. Result Analysis
According to Table 2 and Figure 2, much useful information can be obtained. Table 2 shows the accuracy of several methods on environmental sound dataset in different ratios. Figure 2 shows the variation tendencies of nine methods on this dataset. In terms of csFs, when 1/2 subsets are selected, it is inferior to most methods and lower than the model of decision tree. However, with the selected subsets increasing, it will get a good performance in an interval ranging from 2/3 features to 3/4 features, compared to other six existed methods in Weka. After that, csFs decreases slightly and is inferior to OneR and InfoGain. Then it reaches its peak value at the end. As for MmEnFs, the performances of MmEnFs1 and MmEnFs2 are both significantly better than that of other seven methods. When 13 features are selected, they both reach their peak values, respectively.

As for Tables 3, 4, 5, and 6, we can see the variation of accuracy on each class at different ratios. In terms of bird, csFs will reach its peak value when 16 features are selected. When k = 1/2, MmEnFs1 and MmEnFs2 will get the best performance. To the wind, when k = 1/2 and 2/3, ensemble methods are superior to other methods. When k = 2/3, csFs will have a better accuracy. While k = 3/4, MmEnFs1, MmEnFs2, and csFs are worse than Correlation and GainRatio. As for rain, ensemble methods are better than other methods all the time. However, at k = 2/3, MmEnFs2 is worse than EnFs1. When k = 3/4, csFs will get its perfect performance. For the frog, at k = 1/2, GainRatio will have the best performance. MmEnFs2 and csFs can reach their peak values at k =2/3. With ratio increasing, MmEnFs1 will get worse gradually. In respect of the thunder, ensemble methods can get better accuracy when k = 1/2 and 3/4. However, at k = 3/4, ensemble methods are worse than InfoGain and SymmetricalUncert. Compared to the decision tree with original data, MmEnFs2 has an evidently improvement all the time.
Obviously, the MmEnFs method is superior to the other single method. MmEnFs1 is not as good as MmEnFs2 in terms of accuracy. It is noteworthy that csFs plays an important role in MmEnFs2. Combined with csFs, the accuracy of MmEnFs2 makes greater progress distinctly. Meanwhile, this result indicates that the aggregation of different feature selection methods will have a great impact on the performance. The MmEnFs achieves the aim which is selecting the less features to gain the better accuracy for the environmental sound dataset.
5. Conclusion
In this paper, we use the csFs based on constraint score to evaluate the performance of different feature subsets on environmental sound dataset. Compared to other six existed feature selection methods, csFs will get a good quality. And then we compare the results of single feature selection method and MmEnFs by varying the ratios of feature subsets. Experiments show that different methods will reach their perfect performance at different ratios or at least maintain the baseline results when enough subsets are selected for ensemble and other approaches. Obviously, in most cases, MmEnFs is superior to the single feature selection method in accuracy.
The advantage of csFs is that it selects the subspace randomly to achieve the assessment of feature subsets. But there still exists a problem that the result is not stable enough. As for the stability, three indices, Pearson’s correlation coefficient [35], Spearman’s rank correlation coefficient [36], and Tanimoto distance [37], are commonly used to assess the performance of stability thoroughly in recent years. In the future, further research will focus on the stability of csFs to obtain a better performance from the perspective of three indices. Combining the ensemble method, maybe we will make greater progress in the domain of environmental sound recognition.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work is financially supported by National Natural Science Foundation of China under Grant No. 61462078.