Abstract
In recent years, with the continuous development of artificial intelligence and brain-computer interface technology, emotion recognition based on physiological signals, especially, electroencephalogram (EEG) signals, has become a popular research topic and attracted wide attention. However, how to extract effective features from EEG signals and accurately recognize them by classifiers have also become an increasingly important task. Therefore, in this paper, we propose an emotion recognition method of EEG signals based on the ensemble learning method, AdaBoost. First, we consider the time domain, time-frequency domain, and nonlinear features related to emotion, extract them from the preprocessed EEG signals, and fuse the features into an eigenvector matrix. Then, the linear discriminant analysis feature selection method is used to reduce the dimensionality of the features. Next, we use the optimized feature sets and train a classifier based on the ensemble learning method, AdaBoost, for binary classification. Finally, the proposed method has been tested in the DEAP data set on four emotional dimensions: valence, arousal, dominance, and liking. The proposed method is proved to be effective in emotion recognition, and the best average accuracy rate can reach up to 88.70% on the dominance dimension. Compared with other existing methods, the performance of the proposed method is significantly improved.
1. Introduction
Emotion plays an important role in people’s social activities. It is the carrier of nonverbal communication between people and makes our daily life vivid. However, due to the acceleration of social development and pace of people’s lives, many people tend to feel stressed and anxious. If this condition continues this way, it may lead to various health issues or depression, even influence people’s daily life and self-development. Therefore, emotion recognition gradually becomes a hot and realistic research topic which has received extensive attention from researchers [1, 2].
Nowadays, emotion recognition is applied to many fields such as image [3], text [4, 5], speech [6], facial expressions [7], and gestures [8, 9]. However, these fields, to some extent, are not reliable enough because emotion can be easily affected by others. Just as Picard [10] said, if someone has the ability to disguise his or her emotion, the estimation may have a high error rate. As opposed to this, some researchers focus on the physiological signals [11] for emotion classification, considering that the physiological signals are not easy to forge and are expressed from the inside out. Meanwhile, with the continuous development of artificial intelligence and brain-computer interface technology [12, 13], emotion recognition based on physiological signals, especially on electroencephalogram (EEG) signals, has gradually become the mainstream of emotion recognition [14, 15].
The idea of emotion recognition research based on EEG signals can be summarized as data preprocessing, feature extraction, classification, and evaluation of the model’s performance [15]. Among them, extracting emotion-related features from EEG signals and training the classifier with strong generalization ability is the main factor affecting the model performance.
Based on feature extraction and classification with machine learning, many research methods were proposed and promoted the progress of emotion recognition of EEG signals. For example, Itsara Wichakam and Peerapon Vateekul [16] extracted bandpower and power spectral density (PSD) from EEG signals and used support vector machine (SVM) classifier for binary classification. The best accuracy rate of valence was 64.90%, arousal was 64.90%, and liking was 66.80%. Yoon and Chung [17] proposed a supervised learning algorithm based on the Bayesian weighted logarithmic posterior function and perceptron convergence algorithm, which achieved accuracies of 70.9% in valence and 70.1% in arousal. Bagzir et al. [18] used EEG signals and established a model based on a valence/arousal emotion recognition system. Discrete wavelet transform was used to change the EEG signals, which were decomposed into gamma, beta, alpha, and theta bands, to extract the frequency spectrum characteristics of each frequency band, and a support vector machine (SVM), k-nearest-neighbour (KNN), and an artificial neural network (ANN) were used for emotion recognition. The best accuracy rates on valence and arousal were 91.1% and 91.3%, respectively. You and Liu [19] used the DEAP data set. The first EEG data of 20 subjects were randomly selected to better deal with EEG signals. Only 60 seconds after the use of F3 and F4 channel data, they were divided into 5 seconds (overlapping every 2.5 seconds) for every segment. For each piece of data, some time domain features such as mean value, variance, and a second-order differential mean value were extracted. Then, KNN, SVM, a multilayer neural network (MLN), and an autoencoder neural network (AEN) were used for classification. Experimental results show that AEN had the highest classification accuracy, which reached more than 80%, while the classification accuracy of other classifiers was only approximately 60%. Alhagry et al. [20] proposed a deep learning method to identify emotions from raw EEG signals using long-short-term memory (LSTM) neural networks to learn features from EEG signals and then categorized these features into low/high arousal, valence, and liking. The method was tested on the DEAP data set. The average accuracy of the method was 85.45%, 85.65%, and 87.99% on arousal, valence, and liking, respectively. Xing et al. [21] established a stacked autoencoder (SAE) to decompose EEG signals and classified them by the LSTM model. The observed accuracy rate of valence was 81.1% and that of arousal was 74.38%. Zhan et al. [22] extracted PSD of four frequency bands from EEG and designed a shallow depthwise parallel convolutional neural network (CNN). This method achieved the competitive accuracy of 84.07% and 82.95% on arousal and valence, respectively, in the DEAP data set. Meanwhile, the method shows extensive application prospects for EEG-based emotion recognition on resource-limited devices. Parui et al. [23] extracted multiple features from EEG signals, and use the information gain of correlation matrix to calculate the feature sets, which are optimized by recursive feature elimination methods. Then, an XGBoost classifier was trained for classification. With testing in the DEAP data set, the best accuracy of four emotional dimensions were 75.97%, 74.20%, 75.23%, and 76.42%, respectively. With testing in the DEAP data set, the four emotional dimensions with the best accuracy were 75.97%, 74.20%, 75.23%, and 76.42%. Aggarwal et al. [24] employed two gradient boosting machines (GBMs) based on supervised learning, XGBoost, and LightGBM, for emotion classification on of the DEAP data set. The number of participants was excluded from the features, and an independent model of participants was constructed. The performance was the best in the valence dimension, with an average accuracy rate of 77.11%, and the training speed was very fast.
In summary, in recent years, in addition to traditional machine learning and deep learning methods, the classification method based on ensemble learning [25, 26] has gradually attracted the attention of many researchers and achieved good results in the emotion recognition of EEG signals. However, some problems still exist in this kind of method, such as features extracted from EEG signals which are not so typical and can reflect emotional information well, and the performance of models need to be improved.
Therefore, in this paper, we propose an emotion recognition method of EEG signals based on the ensemble learning method, AdaBoost. First, we consider the time domain, time-frequency domain, and nonlinear features related to emotion, extract them from the preprocessed EEG signals, and fuse the features into an eigenvector matrix. Then, the linear discriminant analysis feature selection method is used to reduce the dimension of the features. Next, we use the optimized feature sets and train a classifier based on the ensemble learning method, AdaBoost, for binary classification. Finally, the proposed method has been tested in the DEAP data set on four emotional dimensions: valence, arousal, dominance, and liking. The proposed method is proved to be effective in emotion recognition, and the best average accuracy rate can reach up to 88.70% on the dominance dimension. Compared with other existing methods, the performance of the proposed method is significantly improved.
2. Methodology
The technical route of the proposed method in this paper is shown in Figure 1. More details of each step of the route are as follows.

2.1. DEAP Data Set: Emotion Recognition Standard Data Set
In this paper, the DEAP data set (Python Version) [27], a standard data set for multimodal emotion recognition, is used for emotion recognition. The data set collected physiological signals and corresponding emotional data from 32 volunteers. Each volunteer watched 40 music videos containing different emotions and recorded their physiological signals to the data file S01–S32.dat. When recording physiological signals, there is a total of 40 conductors (the first 32 conducting EEG signals and the last 8 conducting peripheral physiological signals, as shown in Table 1, and the acquisition electrode corresponding to the 32 conducting EEG signals, as shown in Figure 2). The sampling frequency is 512 Hz, but it is reduced to 128 Hz after a series of filtering operations.

Each data file contains the following two matrices:(1)Data matrix (40 ∗ 40 ∗ 8064): the first 40 represents the total number of videos, the second 40 represents the total number of channels of the collected signals, and 8064 is 63 seconds of any 1 video 1 channel (63 ∗ 128) on the experimental data, which contains 3 seconds baseline data obtained before the experiment and 60 seconds of data recorded in the process of the experiment(2)Labels matrix (40 ∗ 4): these four columns represent four affective dimensions: valence, arousal, dominance, and liking scores ranging from 1 to 9
The object of the proposed method is EEG signals, so we choose all the.dat files’ and the former 32 channels’ 4∼63 seconds of data (other scopes of data are beyond the scope of this study) and put them in the file data labels and four emotional dimension matrices, respectively. All data are read, initialized, and stored in features_raw.csv and labels 0–3.dat files, respectively. Inspired by reference [28], if the label scoring value ≥ 5.0, we store label “1,” which represents this kind of emotion is positive; otherwise, we store label “0,” which represents this kind of emotion is negative (the distribution of positive and negative cases on the four emotional dimensions is shown in Table 2) to prepare for subsequent feature extraction, feature selection, and binary classification.
2.2. Feature Extraction
Feature represents the properties or characteristics of the signal. By feature extraction, a set of features that contains relevant information as much as possible is extracted from the raw data, which is significant for improving the performance of the model [4, 29].
EEG signals contain abundant hidden emotional characteristic information. How to extract the characteristic information related to emotion is the key to this part. Research studies indicated that EEG signal characteristics mainly include the time domain, frequency domain, time-frequency domain, and nonlinear dynamic features, especially the latter two, and the correlation with emotion is stronger [30]. Therefore, to better capture the detailed information in EEG signals, the time domain, time-frequency domain, and nonlinear dynamic features are extracted.
2.2.1. Time Domain Features
EEG signal is a kind of chaotic time series. Therefore, time domain features are the most direct reflection of EEG signals [31]. Suppose that x(t) is the preprocessed EEG signal lasting 60 seconds. In this paper, eight typical time domain features are selected as follows: ① Mean: ② Std: ③ Range: ④ Skewness: ⑤ Kurtosis: ⑥ Hjorth parameter-activity: ⑦ Hjorth parameter-mobility: ⑧ Hjorth parameter-complexity:
Skewness is the standard third-order central moment of the sample, which focuses more on describing the symmetry of the overall value distribution. Kurtosis is the standard fourth-order center moment of a sample, which places more emphasis on describing the degree of steepness of all value distribution patterns of the whole population. The combination of the two can better describe the data distribution pattern. The Hjorth parameter [32] provides a method for rapidly computing three important characteristics of a signal in the time domain: activity, mobility, and complexity, which are widely used in the field of physiological signal processing.
2.2.2. Time-Frequency Domain Features
In addition to time domain features, time-frequency domain features are also an important kind of EEG signals’ features. Studies have found that because the wavelet packet decomposition method has effective multiresolution ability when it analyzes nonstationary signals, overcoming the wavelet transform at each level of signal decomposition can only obtain the low-frequency subband decomposition without information of high-frequency subband high-resolution from the same problem. Moreover, wavelet transform’s generating function db4 has good smooth characteristics and can better detect the transformation of EEG signals. Therefore, we adopt the wavelet packet decomposition method based on the db4 generating function of 4 layers. Specific implementation details are as follows: the EEG signals are decomposed into 4-order detailed signals D4–D1 and first-order approximation signal A4. The values in the signal are the wavelet coefficients of each order signal, representing the frequency bands Theta (4–8 Hz), Alpha (8–16 Hz), Beta (16–32 Hz), and Gamma (32–64 Hz). Take channel Fp1 of Subject 9 as an example. The original signal and four spectrum decomposed signals are shown in Figure 3. Then, wavelet energy and wavelet entropy are extracted from the wavelet coefficients of each frequency band [33]. ① Wavelet energy: ② Wavelet entropy:

2.2.3. Nonlinear Dynamic Features
The human brain is a typical nonlinear dynamic system, so the nonlinear dynamic features of brain electrical signals reflect emotion information, which is also significant for feature extraction [34]. In this paper, we adopt the nonoverlap of the Hamming window short-time Fourier transform and calculate the traditional power spectrum density (PSD) and the characteristics of the differential entropy (DE) [35] on the 60-seconds data in each channel. It should be noted that the differential entropy is a characteristic of a relatively continuous random variable, and the calculation formula can be expressed aswhere x is the random variable of Gaussian distribution N (μ, σ2) and f (x) is the probability density function of x. Research studies found that, for a fixed-length EEG signal sequence in a certain frequency band, the difference entropy is equal to the logarithm of the power spectral density [36, 37].
2.2.4. Construct Eigenvectors and Eigenmatrices
In the above steps, 18 eigenvalues are extracted for each channel in each video data of each subject. In this way, each video file of each subject contained 32 ∗ 18 = 576 eigenvalues, which are split into a (32 ∗ 40) ∗ 576 = 1280 ∗ 576 eigenvalue matrix and stored in the train.csv file.
2.3. Feature Selection
After feature extraction, in order to reduce the feature dimension and further improve the correlation between extracted features and emotion, linear discriminant analysis (LDA) is used to select features based on feature extraction. LDA is a dimensionality reduction technique based on supervised learning, that is, each sample of the data set has a label about its class. The idea is to project the data on the lower dimensions; the projection point of each class of data is expected to be as small as possible after projection, and the distance between class centers of data of different classes is as large as possible [38]. Based on the binary classification task of EEG signals researched in the proposed method, the schematic diagram of the LDA algorithm for binary classification is shown in Figure 4, and the specific implementation is as follows.

The process of the LDA algorithm [39] is described as follows.
Assume data set is D = {(x1, y1), (x2, y2), ..., (xm, ym)}, sample xi is an n-dimensional vector, yi∈{0, 1}, Nj is the number of samples, Xj is the collection of samples, μj is the mean vector of jth sample, Σj is the covariance matrix of jth sample (strict lack of covariance matrix of the numerator), and j = 0 and 1. Thus,
In the binary classification task, we only need to project the data onto a line. Assume that the projection line is vector . Then, for any sample xi, its projection on the line is Txi, and the projections on vector of two classes’ central points μ0 and μ1 are Tμ0 and Tμ1, respectively. Combined with the principle of LDA “data within a class should be as close as possible and data between classes should be as far as possible,” we define the following.
Within-class divergence matrix:
Between-class divergence matrix:
Optimization goal:
Note that the direction of Sbis always parallel to (μ0–μ1), so set
Substitute it into the eigenvalue formula:
Solve it:
In summary, the original sample set is projected into a 1-dimensional space generated by a vector based on , and the feature set after projection is the feature set after dimension reduction.
2.4. Binary Classification
On the basis of Sections 2.1–2.3, the obtained feature set can be sent to the classifier for training and testing and the performance of the model is evaluated.
2.4.1. Randomly and Independently Divide into a Training and Testing Set of the Feature Set
In the training and testing of classifiers, it is very important to ensure the random independent partition of the training and testing sets. In this paper, the feature set data is divided randomly and independently at a ratio of 4 : 1 (80% training set and 20% test set) to ensure that the classifier is fed previously into unknown samples during testing and then the accuracy of the results is ensured.
2.4.2. Training AdaBoost Classifier
After the feature set is randomly and independently divided into a training set and testing set, the training set data are sent to the AdaBoost classifier [40, 41] for training.
AdaBoost is an adaptive enhancement algorithm in the field of ensemble learning and has been effectively applied in binary classification. The basic principle is iteration. A new weak classifier is added in each iteration, and only this weak classifier is trained in each iteration until a predetermined small-enough error rate is reached. Each training sample is assigned a weight, indicating its probability of being selected for the training set by a certain classifier. If a sample point has been accurately classified, its probability of being selected will be decreased in constructing the next training set. Conversely, if a sample point is not accurately classified, its weight will be increased. In this way, AdaBoost can better focus on the misclassified samples, improve the generalization ability of the classifier, and avoid overfitting easily.
The working principle diagram of AdaBoost is shown in Figure 5, and its mathematical description is as follows.

(1)Initialize the weight distribution of the training data: where D1 represents the weight of each sample at the first iteration, 11 represents the weight of the first sample at the first iteration, and N represents the total number of samples;(2)Perform the m-round iteration: ① Use the training sample with the distribution Dm (m = 1, 2, ..., n) for learning and obtain the weak classifier: Gm (x): x->{ − 1, 1}. The performance index of the weak classifier can be measured by the value of the following error function: ② Calculating the power (also called “weight”) of weak classifier Gm to indicate how important Gm is in the final classification: With the decrease of , increases gradually. That is, the classifier with a small error is of great importance in the final classifier. ③ Update the weight distribution of the training sample for the next iteration: the weight of the misclassified sample increases, while the weight of the correctly classified sample decreases: where Dm+1 is the sample weight used in the next iteration, m+1,1 is the weight of the ith sample in the next iteration, yi is the corresponding class (1/−1) of the ith sample xi, and Gm (xi) is the classification result of the weak classifier on the ith sample xi (1/−1). If the classification is correct, the value of yiGm (xi) is 1; otherwise, it is −1.(3)Combine the weak classifiers to obtain a strong classifier: ① Weighted sum of all iterated classifiers: ② Apply the sign function to the sum and obtain the final strong classifier G (x):
After the implementation and parameter adjustment of the classifier, we set the number of iterations of the weak learner as 100 and learning rate as 0.05, which can achieve the best performance.
2.4.3. Testing AdaBoost Classifier and Evaluation
After training the AdaBoost classifier, the data of the test set are sent to the trained classifier in Section 2.4.2, and based on 10-fold cross-validation [42], binary classification tests are performed on four emotional dimensions of valence, arousal, dominance, and liking, and comparison experiments are performed using random forest and XGBoost classifiers. For the main experiment and comparison experiment, the following five performance indicators are investigated: accuracy, precision, recall, F1-score, area under curve (AUC), and confusion matrix. Meanwhile, we draw the figures of results in order to evaluate the performance of the model from multiple angles.
Confusion matrix: under the classification task, there are many different combinations of predicted results and real results, and the corresponding matrix of these combination results is the confusion matrix. In the binary classification task, there are 2 ∗ 2 = 4 different combinations of the above results, which are denoted as TP, FP, FN, and TN. Thus, the confusion matrix, as shown in Table 3, can be obtained.
Performance indicators:(1)Accuracy:(2)Precision:(3)Recall:(4)F1‐score:(5)AUC: the area under the receiver operating characteristic curve.
The higher these indicators are, the better the performance of the model is.
3. Results
The experimental process is shown in Section 2. In order to enhance the reliability of the results and to better evaluate the performance of the proposed method in this paper, random forest and XGBoost classifiers were also trained besides the classifier, AdaBoost, as the comparison experiments of the above experiments.
The final results are shown in Table 4 and Figures 6–11. From the results, we can find that the performance of the proposed method in this paper has been improved on all four emotional dimensions of valence, arousal, dominance, and liking. On dimension valence and arousal, the degree of improvement is most obvious. On dimension dominance, we can get the best performance in all three classifiers on an average. Although on dimension liking, the overall performance is not better than other three dimensions, the AdaBoost classifier still can get slight improvement. Therefore, the AdaBoost method, which is based on the ensemble learning, is a method worth spreading.






The results of the proposed method in this paper have been compared with some different existing methods that used the DEAP data set, as shown in Table 5. Our proposed method represents a significant improvement over the existing methods on all four emotional dimensions. The main reasons can be summarized as the following two points: first, determine extracted feature types comprehensively, especially, time-frequency and nonlinear dynamic features, and help the model to find better emotional information from EEG signals. The proposed method in this paper makes overall consideration of three kinds of features: time, time-frequency, and nonlinear, which has the characteristics of comprehensiveness and pertinence. Second, the trained AdaBoost classifier can focus better on the misclassified samples, improve the generalization ability of the classifier, and avoid overfitting effectively, which also helps to improve the performance of emotion recognition.
4. Conclusion
To sum up, in this paper, we propose a method of emotion recognition of EEG signals based on the ensemble learning method AdaBoost. After data preprocessing, feature extraction, feature selection, binary classification, and performance evaluation in the DEAP data set on four emotional dimensions of valence, arousal, dominance, and liking, we find that the performance of the proposed method has been significantly improved compared with the existing methods. The best average accuracy is 88.70% on the dominance dimension. It indicates that the ensemble learning method is also a method which is worth spreading and has high practical value and in-depth research value. In the future, we will continue to tune the parameter and optimize the model structure of the proposed method, and hope this method will be helpful to explore and invent new methods for emotion recognition, which can help to diagnose someone’s mental state from their EEG signals.
Data Availability
All data used in this study are available upon request from the corresponding authors.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publishing of this paper.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (61300098), Natural Science Foundation of Heilongjiang Province (F201347), and Fundamental Research Funds for the Central Universities (2572015DY07).