Abstract

Emotion recognition plays a crucial role in human-robot emotional interaction applications, and the brain emotional learning model is one of several emotion recognition methods, but the learning rules of original brain emotional learning model play poor adaptation and do not work very well. In fact, existing facial emotion recognition methods do not have high accuracy and are not sufficiently practical in real-time applications. In order to solve this problem, this paper introduces an optimal model, which merges interval type-2 recurrent wavelet fuzzy system and brain emotional learning network for emotion recognition. The proposed model takes advantage of type-2 recurrent wavelet fuzzy theory and brain emotional neural network. There are no rules initially, and then the structure and parameters of model are tuning online simultaneously by the gradient approach and Lyapunov function. The system input data streams are directly imported into the neural network through a type-2 recurrent wavelet fuzzy inference system; then, the results are subsequently piped into sensory and emotional channels which jointly produce the final outputs of the network. The proposed model could reduce the uncertainty in terms of vagueness by using type-2 recurrent wavelet fuzzy theory and removing noise samples. Finally, the superior performance of the proposed method is demonstrated by its comparison with some emotion recognition methods on five emotion databases.

1. Introduction

Emotion recognition is one of the most effective methods to obtain human information to improve the human-robot interaction, which includes body language emotion [1], facial expression recognition [2], and speech emotion recognition [3]. So it is necessary to recognize humans’ emotional states as accurate as possible for robot.

One important part of providing effective and natural interaction between human and computers is to enable computers to understand the emotional states expressed by the human subjects. Rahul et al. [4] proposed ont method to put this theory’s practical applications. Also, Cen et al. [5] indicated a new measure of authentic auditory emotion recognition, and put this application to patients with schizophrenia. Bhandari and Pal [6] aim to check if an explicit use of edges can help in emotion recognition from images using convolutional neural network (CNN).

For the speech emotion recognition, speaker-dependent (SD) and speaker-independent (SI) have been introduced in recognition processing [7, 8], in which the support vector machine [8], neural network [9], deep conventional neural networks, and deep brief network have been used for speech emotion recognition [1013]. For the facial expression recognition, which has been successfully applied in many application scenarios, such as human-computer interaction systems and games, the machine could interpret user emotions through emoticon recognition, which makes the algorithm more intelligent and humanized. Single-tag learning or multitag learning paradigms have been used for facial expression recognition. Yang et al. [14] expanded the conditional probability neural network to a fuzzy form, which is used to predict the emotions expressed by the picture and obtain better performance than the original conditional probability neural network. There are several research studies focused on the conventional neural network for FER [1518]. Hu et al. [19] combined fuzzy rough set theory with conventional neural network for FER and so on.

However, there exist two problems: firstly, these neural networks mentioned above have large computation and complexity; for this problem, brain emotional learning model is presented in [20, 21], which has lower computation and complexity compared with the other neural networks, but the original BEL cannot achieve desirable performance for speech emotion recognition, so the fuzzy logic system was introduced in the BEL model to improve the recognition accuracy. The second problem is that the fuzzy logic systems used in the above references are all based on type-1, which does not handle the uncertainties more flexibly. In order to solve this problem, we choose type-2 as the better scheme, which has been applied in various applications [22, 23].

A fuzzy wavelet neural network combines neural networks and wavelet theory with fuzzy logic. Several researchers have used fuzzy wavelet neural network for solving signal processing and control problems. In [24] the wavelet network model of a fuzzy inference system is proposed. A fuzzy wavelet neural network structure constructed on the base of a set of fuzzy rules is proposed in [25] and used for the approximation nonlinear functions. Other wavelet based approaches include fuzzy wavelet neural network structures developed for system identification and control [26] and prediction of electricity consumption [27]. In [28] the combination of wavelet technology and type-2 fuzzy system is proposed. A type-2 fuzzy wavelet neural system is proposed for the prediction of stock market prices in [29], in which the convergence and learning of the proposed system are not discussed. In [30], a novel, type-2 fuzzy wavelet neural network (type-2 FWNN) structure combines the advantages of type-2 fuzzy systems and wavelet neural networks for identification and control of nonlinear uncertain systems by the use of clustering and gradient techniques for the updating of the parameters.

Furthermore, the combination of type-2 fuzzy logic sets and brain emotional learning network is proposed in [31, 32], in which the authors used this model to control robots. In [33], the wavelet fuzzy brain emotional learning network is used to control MIMO unconcern system. In [34], the type-2 recurrent fuzzy brain emotional learning network is used to achieve adaptive filter for the active noise cancellation. In [35], the data-driven control method is combined with the interval type-2 intuition fuzzy brain emotional learning network for the multiple degree-of-freedom rehabilitation robot.

According to the above analysis, this paper adopts interval type-2 recurrent wavelet fuzzy logic system combined with brain emotional learning network to construct one model for emotion recognition. The main contributions of this paper include the following: (1) the novel self-organizing type-2 recurrent wavelet fuzzy brain emotional learning network model is proposed firstly; (2) the parameters can be tuned online by adaptive laws; (3) the structure of interval type-2 recurrent wavelet fuzzy brain emotional learning network can be constructed automatically from the empty initial rule; (4) numerical simulations have been made to demonstrate the performance of the proposed method for emotion recognition.

This paper is organized as follows: Section 2 presents the structure of novel interval type-2 recurrent wavelet fuzzy brain emotional learning network model, the parameters learning, and self-organizing structure learning algorithms. Section 3 shows the simulation results. And finally, the conclusions are given in Section 4.

2. Framework of the IT2RWFBELN

The IT2RWFBELN is composed of two parts: one of them is amygdala network, which is suitable for emotional judgment, and the other one is orbitofrontal cortex network, which is responsible for emotional control. The fuzzy inference part of IT2RWFBELN adopts interval type-2 recurrent wavelet system; then these two parts can be described aswhere the membership function grades and the weights for the two network could be described as and and and , respectively.

2.1. Structure of IT2RWFBELN

Figure 1 shows the structure of IT2RWFBELN, which includes the amygdala network, the orbitofrontal cortex network, and the interval type-2 recurrent wavelet fuzzy sets. The proposed IT2RWFBELN is constructed with six layers, which includes an input layer, a MF layer, a spatial firing layer, a weight memory layer, a summarily layer, and an output layer:(1)Input space: these nodes in this space are given as , where represents the number of the input signals, and all data from this layer will be transmit to the next space without any computation.(2)MF space: in this layer, Gaussian activation function is adopted to finish the fuzzification by using interval type-2 wavelet membership functions, which are adopted as the basis function. Then the approximation ability by using wavelet functions than triangle or Gaussian basis functions, so the learning speed could be increased. Furthermore, the recurrent term with previous information is inserted in this layer; therefore, the network performance can be further improved. The wavelet membership functions can be represented aswhere the parameters , and represent the center and variance of the fuzzy rules of type-2 wavelet membership functions. So the output of ith input feature and kth rule can be represented by an interval MF , and represents the recurrent inputs, which could be given aswhere is the recurrent gain for the network.(3)Spatial firing layer: the rule in this layer is composed of both upper and lower firing strength of MFs as well as non-MFs. The firing strength of interval type-2 fuzzy rule is interval and can be calculated as follows:where denotes the intervals of the firing strength of MFs, the m, and M represent number of input signals and fuzzy rules, respectively.(4)Weight memory space: this layer contains two memory spaces: they are amygdala memory and an orbitofrontal memory, which are interval values because the firing space is interval. So the weight of the kth output of the amygdala network and orbitofrontal network are given asand the updating rules of are introduced in the derivative form aswhere indicate the learning rates of the updating rules and represent the outputs of the value.(5)Defuzzification space: the output of this defuzzification space is calculated by the output of the firing space and the weight space. So the left- and right-most point values for amygdala and orbitofrontal network output are as follows:(6)Output space: the output of defuzzification space is an interval value, so the average operations are used to obtain the amygdala network and e orbitofrontal cortex network outputs are

2.2. Self-Organization of IT2RWFBELN

According to the theory of nuclear fuzzy rough set, fuzzy upper approximation is used to indicate the possibility of a sample belonging to a certain emotion, and fuzzy lower approximation is used to indicate the inevitability of a sample belonging to a certain emotion. In general, the classification decision of a sample has less uncertainty for the strong ability to distinguish between feature spaces, which means that the closer the upper fuzzy approximation is to the lower fuzzy approximation, the better.

Achieving the better structure of the IT2RWFBELN requires deciding the appropriate rules by using the corresponding self-organizing algorithm. If the number of rules is large, this will cost long time for computational loading, and also if the numbers are small, this cannot reflect all the cases especially with the data with large ranges. Initially, there are no rules and MFs in the first space; when the first input data stream comes, the first MF then will be created. Then the self-organizing algorithm will be used to determine whether to generate new rules and MFs or to delete inappropriate rules and MFs. In this paper, the (interval type-2 fuzzy c-means) IT2FCM is used here to choose the cluster centers of the membership functions for fuzzy rules of the RT2WFNN. The IT2FCM [32] is an iterative optimization algorithm that minimizes the objective function aswhere denotes the distance between the cluster centers and an input pattern . The main steps of the IT2FCM can be shown as follows:(1)Set the fuzzifiers and the number c of the cluster prototypes, and initialize the cluster center using GA algorithm(2)Calculate the distance between the cluster centers and input pattern ; the lower and upper partition functions can be calculated by (2)(3)Update the cluster center , and the interval type-1 fuzzy set is obtained during the iterative process and optimal improved EKM algorithm, which is adopted to estimate both ends of the interval fuzzy set(4)The new cluster center is updated by a defuzzification method as ; go to the next step; otherwise, set (5)Finally, the type-reduction of the type-2 fuzzy partition matrix is set as

The output of IT2FCM algorithm is an interval type-2 FS that cannot be transformed to crisp set by defuzzifier directly. Hence, the type-reduction process is needed. The aim of type-reduction is to compute the centroid of a type-2 fuzzy set. At present, iterative Karnik–Mendel (KM) algorithm and the enhanced Karnik–Mendel (EKM) algorithm can compute the centroid of an interval type-2 fuzzy set efficiently. The improved EKM is used here, which could change the initialization conditions of switch points and improve the searching method for switch points.

2.3. Parameters Learning Algorithm of IT2RWFBELN

Defining one Lyapunov cost function as , then and using the gradient descent method, the online tuning laws for the parameters of IT2RWFBELN are given aswhere represent the learning rates for updating the weight of orbitofrontal cortex and an amygdala network, respectively, and represent the learning rates for updating the means and the variances of type-2 wavelet MFs, respectively. represent the learning rates for updating the IF-indices, indicating the hesitation level of recurring t. By applying the chain rule for the derivation of above terms, then we have

2.4. Convergence Analyses

Theorem 1. Let and be the learning rates for the parameter of FBELC and , respectively. Then, the stable convergence is guaranteed if and are chosen as

Proof. A Lyapunov function is selected asThe change of the Lyapunov function isThe predicted error can be represented bywhere denotes the change of .
Using (11), it is obtained thatSubstituting (11) and (18) into (17) yieldsThus,If is chosen as (13), in (20) is less than 0. Therefore, the Lyapunov stability of and is guaranteed. The proof for can be derived in similar method, which should be chosen as (14). This completes the proof.

3. Experiments and Validation

This paper implements the IT2RWFBELN model on Python 2.7. The initialization parameters of the IT2RWFBELN are derived from the ImageNet data set. The calculation of the model is done on the GPU. They were based on Windows OS i7-8700 CPU with a clock speed of 3.20 GHz, RAM of 8 GB, and GTX 1070 GPU.

In this part, in order to verify the performance of IT2RWFBELN in the facial emotion recognition, two groups of experiments have been conducted on five public expression data sets (Jafffe, BU-3DFE, CASIA, SAVEE, and FAU). The first one is tested on Chinese corpus of CASIA [36], English corpus of SAVEE [37], and FAU emotion corpus [38], in which both speaker-dependent (SD) and speaker-independent (SI) speech emotion recognition are performed; the second one is tested on Jafffe [39] and BU-3DFE [40], in which six different metrics are introduced to test. To verify the robustness and suitability of the proposed model, some conventional methods are used for comparison.

3.1. Emotion Databases

The five data sets include CASIA database, SAVEE database, FAU database, JAFFE database, and BU-3DFE database, which are described as CASIA Chinese emotion corpus provided with the data set including 300 emotional short utterances, which contain six basic emotions: surprise, happy, sad, angry, fear, and neutral. SAVEE data set is conducted by seven different emotions on recording from four males and seven basic emotions: surprise, happy, sad, angry, fear, disgust, and neutral. The FAU database is conducted by recording 30 females and 21 males and contains five emotional states: angry, emphatic, positive, neutral, and rest. The JAFFE database is conducted by 10 different people with 213 facial images, which include six expressions: anger, disgust, fear, happy, sad, surprise, and neutral. The BU-3DFE multiuser facial expression database is conducted by 56 females and 44 males, which includes six facial expressions: anger, disgust, fear, happiness, sadness, and surprise.

3.2. Experiments on JAFEE and BU-3DFE Databases

The first experiments focused on facial emotion recognition for JAFFE and BU-3DFE database. BU-3DFE database has six emotions: anger, disgust, fear, happiness, sadness, and surprise. The JAFFE database has seven emotions: fear, happy, sadness, anger, disgust, surprise, and neutral. Furthermore, six different metrics including chebyshev, kldist, cosine, canberra, clark, and intersection are used here to validate the effectiveness of the proposed method by comparing with other methods, including Local Binary Patterns (LBP), “AN(FC1)” and “AN(FC2),” which are the first two layers of fully connected layer output of Alex Net per-trained, “AN-FT(MSE)” and “AN-FT(KL),” respectively, and represent the fuzzy classification results obtained by Alex Net using the mean square error and KL divergence as loss functions and fuzzy rough conventional neural network (FRCNN).

Table 1 gives the experimental results for the BU-3DFE data set. The features are obtained in the type-2 fuzzy sets convolutional neural network training task, and the fuzzy classification results are obtained in the fuzzy expression recognition task based on Algorithm Adaption k-Nearest-Neighbors classification. It can be considered that type-2 fuzzy convolutional neural network effectively learns relevant knowledge from fuzzy multilabels. Therefore, the fuzzy classification effect is better than “AN(FC1)” and “AN(FC2). As shown in Table 2, using Algorithm Adaption k-Nearest-Neighbors as the fuzzy classification algorithm, under the six metrics, emotional learning networks extract features are better than “LBP.” Compared with the other features, type-2 fuzzy wavelet emotional learning neural network achieves good fuzzy classification accuracy under various indicators. Since the performance of the Algorithm Adaption k-Nearest-Neighbors algorithm depends on the distinguishability of the feature space, it can be considered that, compared to other algorithms, type-2 fuzzy wavelet neural network model maps the original picture to a space more suitable for distinguishing facial expressions, that is, a face picture with a similar expression.

To evaluate the performance of the proposed algorithm more efficiently, Tables 3 and 4 list the accuracy of proposed model and other algorithms on the JAFEE and BU-3DFE databases. From the results, we can see that accuracy of our algorithm is superior to most of the other advanced algorithms.

3.3. Experiments on CASIA, SAVEE, and FAU Databases

To verify the performance of the IT2RWFBELN model for speech emotion recognition, the performance of the proposed model is compared with conventional brain emotional learning (BEL), support vector machine (SVM), extreme learning machine (ELM), genetic algorithm-brain emotional learning (GA-BEL), BELFIS, and BELBLA methods that have been made on CASIA, SAVEE, and FAU databases, are presented in Tables 57.

Table 5 shows the recognition accuracy of BEL model, SVM, ELM, GA-BEL, BELFIS, and BELBLA methods on CASIA database, which illustrates that the average accuracy based on IT2RWFBELN model is improved for SD recognition and SI recognition compared with BEL and GA-BEL models. Moreover, the SD recognition accuracy of IT2RWFBELN model is higher than SVM and ELM, but average accuracy of IT2RWFBELN model on SI is similar to ELM.

Table 6 shows the recognition accuracy of BEL model, SVM, ELM, GA-BEL, BELFIS, and BELBLA methods on SAVEE database, which shows that the IT2RWFBELN model has higher accuracy on SD and SI than BEL and GA-BEL models and which is also superior compared with SVM and ELM on SD, but the accuracy of IT2RWFBELN model is lower than the ELM on SI for some instance as well.

Table 7 shows the recognition accuracy of BEL model, SVM, ELM, GA-BEL, BELFIS, and BELBLA methods on FAU database, which shows that the IT2RWFBELN model has higher accuracy on SD and SI than BEL and GA-BEL models and which is also superior compared with SVM and ELM on SD, but the accuracy of IT2RWFBELN model is similar to ELM.

According to the above two groups of experiments, we can conclude that the proposed model could get better recognition effects compared with other models, owing to the combination of type-2 recurrent wavelet fuzzy system and brain emotion learning network. As a result, the proposed method is feasible for facial or speech emotion recognition.

4. Conclusions

This paper introduces one applicable model for emotion recognition, which is vital part in the communication between humans and machines. The proposed model is based on the combination of interval type-2 recurrent wavelet fuzzy system and brain emotional learning network (IT2RWFBELN), which takes advantage of dealing with uncertainties by interval type-2 recurrent wavelet fuzzy system and the lower computation by brain emotional learning network. The proposed model takes advantage of type-2 recurrent wavelet fuzzy theory and brain emotional neural network, and there are no rules initially; then the structure and parameters of model are tuning online simultaneously by gradient approach and Lyapunov function. The system input data streams are directly imported into the neural network through a type-2 recurrent wavelet fuzzy inference system, and then the results were subsequently piped into sensory and emotional channels which jointly produce the final outputs of the network. In order to demonstrate the performance of IT2RWFBELN model, two groups of experiments are conduced, which include facial expression recognition and speech emotion recognition. The results illustrated the effectiveness of the proposed recognition model.

Data Availability

The data used in this paper have been cited in the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors are grateful for the support of the National Science Foundation of China (Nos. 61502211, 61572242, and 61702234).