Abstract

Microexpression recognition has been widely favored by researchers due to its many potential applications, such as business negotiation and lie detection. Cross-database microexpression recognition is more challenging and attractive than normal microexpression recognition because the training and testing samples come from different databases. The ensuing challenge is that the feature distributions between training and testing samples differ too much. As a result, the performance of current well-performing microexpression recognition methods often fails to achieve the desired effect. In this paper, we overcome this problem by introducing Subspace Learning and Joint Distribution Adaptation (SLJDA) by projecting the source and target domains into the subspace and later reducing the distance between them and then minimizing the distance between the marginal and conditional probability distributions of the data between the source domain and the target domain. To evaluate its performance, a large number of cross-database experiments are performed in the SMIC database and CASMEII database. The experimental results show the superiority of the method compared with existing microexpression recognition methods.

1. Introduction

Microexpression is a special, weak facial expression that usually lasts only 1/25 to 1/5 seconds [1]. Different from ordinary facial expressions, microexpressions are spontaneous expressions that occur when people try to hide their inner emotions [2]. In other words, microexpressions can show the true thoughts of a person’s heart. Therefore, automatic microexpression recognition technology can be applied in many practical scenarios such as marital relationship prediction [3], clinical diagnosis, and teaching assessment [4, 5], and even in the future, it may be extended to communication security [69], intelligent devices, etc.

In recent years, with the development of intelligent technology, researchers have gradually focused their attention on it, and microexpression recognition has made some progress. However, both training and testing samples for microexpression recognition come from the same microexpression database. This is not the case in practical applications, where the training and testing samples for microexpression recognition come from different databases. Different database recording environments, camera devices, genders, ages, and so on will vary. They lead to the destruction of the consistency of feature distribution that exists in traditional microexpression recognition methods. Therefore, the microexpression recognition methods with good performance often fail to achieve the ideal effect in many practical applications. In order to solve this problem, it is necessary to study cross-database microexpression recognition; that is, the training set (train domain) and the testing set (target domain) are not the same database.

Yan et al. [10] classified cross-database facial expression recognition into two cases, semisupervised and unsupervised. Similarly, according to the presence or absence of label information in the target domain, cross-database microexpression recognition can also be divided into two categories, unsupervised cross-database microexpression recognition (CDMER) and semisupervised cross-database microexpression recognition. Unsupervised cross-database microexpression recognition can only be trained using the sample label information of the source domain, while the latter can also incorporate a small amount of label information of the target domain. Therefore, this paper will focus on unsupervised cross-database microexpression recognition.

Domain adaptation is one of transfer learning methods [11], aiming at how to reduce the distribution differences between source and target domains data under the assumption that the source and target domain feature space and category space are the same and the feature distribution is different. In recent two years, Zong et al. [12] proposed cross-database microexpression recognition based on domain adaptive method for the first time and achieved good results. Current transfer learning methods can be roughly divided into three categories: data distribution [1317], feature selection [18, 19], and subspace learning [2023]. At present, the classical work based on these methods includes Transfer Component Analysis (TCA) [13], which finds a feature mapping while retaining the original information of the source domain and the target domain, so that the conditional distribution between the two domains after the mapping is relatively close. But such feature mapping may not exist. The feature selection-based approach, Structural Correspondence Learning (SCL) [19], assumes the existence of common data features in the source and target domains and uses shared features for modeling. However, when the characteristics of the two are very different, it can lead to failure. Based on the subspace learning method Subspace Alignment (SA) [20], the SA method is to directly seek a linear transformation by projecting the source and target domains into a subspace, through which the transformation alignment of different data is realized. However, such methods tend to ignore the distribution shift between the projected data in the two subspaces. In the face of this challenge, we integrate the subspace learning method with the data distribution adaptation method, namely, Subspace Learning and Joint Distribution Adaptation (SLJDA). First, the source and target domains are mapped to the corresponding subspaces, followed by maximizing the interclass distance and minimizing the intraclass distance in turn to preserve the discriminative features of the source domain data. Maximize the target domain variance samples, minimize the differences in the distribution of the corresponding subspaces of the projected source and target domains, and minimize the distances between subspaces to reduce the difference between source and target domains and improve the recognition of unsupervised cross-database microexpression.

In general, this paper contains the following main contributions:(1)After considering the practical situation, the subspace-based learning method and the data distribution-based adaptive method are cleverly combined to reduce the difference between the source and target domains. The original information is preserved, and there is no need to search for feature mappings that may not exist to a large extent. In addition, the data distribution shift in the projected subspace is reduced, and the distance between them is geometrically reduced.(2)We conducted more extensive CDMER experiments to assess the performance of the SLJDA approach.

The rest of this paper is arranged as follows. Section 2 reviews the recent research on unsupervised cross-database microexpression. In Section 3, we describe the SLJDA for CDMER in detail. For better evaluating this method, extensive experiments and analyses on SMIC and CASMEII are shown in Section 4. Finally, the conclusion and the future planning of this paper are drawn in Section 5.

As mentioned above, most of the existing cross-database emotion recognition problems (including cross-database voice emotion recognition, facial expression recognition, and microexpression recognition) usually solve the mismatch of feature distribution between source domain and target domain by domain adaptive DA method. In the recent two years, a domain adaptive approach was first proposed by Zong et al. [12]. In addition, Target Sample Re-Generator (TSRG) was proposed to learn a sample regenerator that can regenerate the samples of source domain and target domain, and the samples of both have the same characteristic distribution, where the regenerated source domain microexpression samples will remain unchanged in the feature space. Li et al. [24] proposed to use the Target-Adapted Least-Squares Regression (TALSR) method to learn the regression coefficient matrix according to the source domain data information and make the regression coefficient matrix fit the target database.

Before this, most researchers focused on cross-database facial expression and cross-database speech recognition related to cross-database microexpression and put forward many effective methods. For example, Zhu et al. [25] first proposed to use domain adaptation to solve the facial expression recognition problem and achieve good results. In the work of [26], Transductive Transfer Regularized Least-Squares Regression (TTRLSR) was proposed to solve the cross-database facial expression recognition problem. The core idea is to learn a discriminant subspace and use the source domain with complete data information and the dataset selected from the target domain sample to predict the facial expression category in the target domain. Chu et al. [27, 28] discovered a new method, selective transfer machine (STM), which can use target samples to learn a set of weights of the source samples, so that the weighted source samples and the target samples have the same or similar characteristic distribution. Then the learned classifier is used to predict the information of the target domain samples. In addition, Yan et al. [10] innovatively came up with Unsupervised Domain-Adaptive Dictionary Learning (UDADL), with the aim of learning a dictionary from the samples of source domain and target domain and solving the problem of cross-database facial expression recognition by using the basic idea of common subspace. In the Domain-Adaptive Subspace Learning (DoSL) model [29], the source and target domain samples are mapped into the common subspace by learning a mapping matrix, and the regenerated source and target domain samples follow the same or similar feature distributions.

In the early research of cross-database speech emotion recognition, a series of good methods appeared. For example, Hassan et al. [30] proposed to use the Importance weighted support vector machine (IW-SVM) method to deal with the challenge of cross-database speech emotion recognition. The method was developed by combining three domain adaptive methods; namely, kernel mean matching (KMM) [31], Kullback–Leibler importance estimation procedure (KLIEP) [32], and unconstrained least-squares importance fitting (uLSIF) [33] were combined. It is used to alleviate the problem that the sample feature distribution of source domain and target domain does not match. Zong et al. [34] introduced the domain-adaptive least-squares regression (DALSR) method to learn a regression coefficient matrix using the source domain sample information and the auxiliary set selected from the target database. At the same time, the mean and variance of the speech sample in the source and target domain after regression are also constrained to make the sample in the two domains have similar feature distributions.

3. Subspace Learning and Joint Distribution Adaptation

3.1. Problem Definition

Assume that and denote the number of samples in the source and target domains, respectively, represents the dimensionality of the microexpression feature vector, and and are the microexpression samples in the source and target domains. Given a labeled source domain and an unlabeled target domain , assume that the feature space and the category space are the same; that is, and . The margin distribution is different . Moreover, the mapping relationship is not the same in the source and target domains.

For better cross-database MER, we fuse data distribution and subspace learning methods to reduce domain differences. By mapping the source and target domains to the corresponding subspaces, we then maximize the target domain sample variance and preserve the source domain discriminative features while minimizing the two subspace differences and the subspace data distribution differences. The details are described next.

3.2. Subspace Learning and Joint Distribution Adaptation

Since the source domain contains not only sample information but also label information, we should make the best use of the discriminatory features of the source domain to achieve the best possible differentiation between different classes. By using the minimization of intraclass scatter and the maximization of interclass scatter, similar samples in the source domain can be close to each other, while different samples can be far away from each other as far as possible, thus facilitating a learning algorithm with good performance.where is the within-class scatter matrix, is the center matrix of class in the source domain, , is the number of class , is the row vector of unit 1, and is the identity matrix.where is the interclass scatter matrix, is the sample mean of class in the source domain, and is the sample mean of the source domain.

In order to preserve the data attributes to the classification of the target domain as much as possible, it is necessary to project the feature dimensions onto the relevant dimensions and therefore maximizing the target domain variance in the relevant subspace [35].

Maximize the target variance as follows:where is the centering matrix and is a row vector of unit 1.

Similar to the principle of data-centric domain adaptive methods, we compare the data differences between two domains by MMD distance to minimize the maximum mean difference between the source and target domains in k-dimensional embedding and further improve the marginal distribution differences.

Because the label of the target domain is unknown, the conditional distribution of the target domain cannot be obtained directly, so the method proposed by [16] is usually used. Pseudo labels are obtained by using the classifier in the source domain to predict directly on the target domain. Since some pseudo labels may be incorrect and thus lead to poor recognition, an iterative approach is used to continuously improve the classification accuracy and thus reduce the difference between the two conditional distributions. The conditional distribution difference is

Therefore, by integrating equations (6) and (7), the distribution difference between source domain and target domain can be obtained:

To simplify equation (8), we write it as follows:where

Unlike the subspace learning method SA, which requires finding an alignment matrix and aligning the source and target domain subspaces to reduce the distance between the source and target domains, this method is mapped to their respective subspaces, and we optimize both and while making the two subspaces closer together.

3.3. Optimization Problem

By integrating the five equations (1), (3), (5), (9), and (11), we can get the final SLJDA method:where , , and are the hyperparameters by to constrain . Furthermore, the two coupled projections and are obtained using the following optimization function calculated:

Since can be stretched without changing the final result, we can simply keep the bottom constant and only calculate the top. So the final optimization problem becomes

Denote as the Lagrange multiplier, and then Lagrange function for equation (14) is

Setting the derivative , we can obtainwhere , and the matrix composed of eigenvectors corresponding to minimum eigenvalues obtained by the generalized value decomposition of equation (16) is . Then the subspaces and are obtained.

4. Experiments

4.1. Microexpression Database

In this chapter, we conduct wide-ranging experiments in two popularly known databases, CASMEII and SMIC, to evaluate the performance of this method in a comprehensive manner. One of them, CASMEII [36], established by the Institute of Psychology of the Chinese Academy of Sciences, requires subjects not only to watch videos with large mood swings but more importantly to simultaneously attempt to mask their emotions. At the same time, the subjects’ expressions were recorded at a frame rate of 200 fps without watching the video. With this elicitation mechanism, 247 video sequences from 26 individuals were acquired. Seven microexpression categories (Happy, Surprised, Disgusted, Depressed, Sad, Scared, Other) were included within this dataset. The SMIC dataset [37, 38] was established by the University of Oulu, Finland, using a similar induction mechanism to ensure the reliability of the data. The database contains 164 video sequences of 16 individuals (10 males and 6 females) containing 3 categories of microexpressions categories (Positive, Surprised, Negative). SMIC contains three datasets SMIC (HS, VIS, and NIR). The SMIC (VIS) and SMIC (NIR) were recorded in the Visual Identity System (VIS) and Near Infrared (NIR) environments, respectively, using a camera with a frame rate of 25 FPS. The results were also recorded by HS using a 100 FPS high-speed camera, and the three will be used as separate datasets in this experiment. As shown above, the SMIC and CASMEII databases contain inconsistent microexpression categories. To face this problem, we selected and reannotated the categories in CASMEII. Specifically, firstly, the Happy sample is changed to Positive, the Surprised sample is kept unchanged, then the Other category is deleted, and finally the Disgusted and Depressed samples correspond to Negative. The details are shown in Table 1.

4.2. Experimental Setup

Regarding SMIC (VIS, HS, and NIR) and CASMEII, we designed 24 sets of cross-database microexpression experiments. Among them, no. 1–no. 6 and no. 13–no. 18 are treated as the first type of experiment, which is labeled as type-1, and no. 7–no. 12 and no. 19–no. 24 are treated as the second type of experiment, which is labeled as type-2. We use LBP-TOP [39] and HIGO-TOP [40] spatiotemporal descriptors for feature extraction. For the former, the neighborhood radius R and neighborhood point P are set to 1 and 8, respectively. HIGO-TOP has only one important parameter P, which is set to 8. Whether it is LBP-TOP or HIGO-TOP, the feature vector subsets of each plane are sequentially concatenated to form a supervector, that is, to form a microexpression facial feature vector.

As we can see from Table 1, the CASMEII and SMIC databases have extremely unbalanced samples for each category. Therefore, in addition to the weighted average recall (WAR), we also used unweighted average recall (UAR) [41] as the measurement criterion to compare the performance of various methods in the CDMER experiment. WAR is the rate of correctness. UAR is the average of the recognition rates calculated for each category of samples, regardless of the number of samples in each category. In summary, the combined WAR and UAR provide a fairer measure of the true performance of a method when the categories are unbalanced.

In order to show the performance of SLJDA, we choose other domain adaptation methods which have better performance in cross-database recognition to carry out comparative experiments. These methods include Support Vector Machine (SVM) [42], Domain Regeneration in the original Feature Space with unchanged Target domain (DRFS-T), Domain Regeneration in the original Label Space (DRLS), Target Sample Re-Generator (TSRG), SA, Joint Distribution Analysis (JDA), and Transfer Joint Matching (TJM). Parameter settings of all the above methods in the CDMER experiment are shown as follows:(1)For SVM, set C = 1 (linear), and use it as a traditional method. The results of experiments conducted directly with SVM will be used as a baseline method for comparison.(2)DRFS-T, DRLS, and TSRG involve two important parameters, that is, and . Following this work, we search for the best values of these two parameters in and .(3)We set according to the author’s suggestion in [18], and JDA follows the original design of . In addition, optimal dimensionality reduction dimension . For SA, we will traverse all possible dimensions , that is, search .(4)Finally, with respect to SLJDA in this paper, we choose from , for .

4.3. Analysis of Experimental Results

Tables 25 show the results compared to the current better performing cross-database methods, and the bolded ones are the best performing methods in each group of experiments. Observing the table, we can find that the SLJDA method performs best in 8 out of 12 groups of experiments. In addition, we reperformed feature extraction with HIGO and compared it with the currently popular domain adaptive methods. As shown in Tables 6 and 7, more than half of the experimental results show that this method has a clear advantage over other methods.

Observing the table, we can find that the experimental results of no. 7–no. 12 (type-2) and no. 19–no. 24 (type-2) are better than those of no. 1–no. 6 (type-1) and no. 13–no. 18 (type-1). The analysis may be due to the fact that the differences between CASMEII and SMIC samples are greater than differences between three datasets of SMIC. This is because the SMIC (HS), SMIC (VIS), and SMIC (NIR) only differ in the camera pixels taken between them. However, between CASMEII and SMIC, the filming equipment, recording environment, ethnicity of the subjects, and so on were different. It is worth noting that SMIC (VIS) experimental results are better than SMIC (NIR) experimental results under the same conditions, both as source and target domain datasets, for example, no. 3 and no. 5. Combining 4.1, we analyze that this may be due to the image quality difference problem.

In addition, an interesting phenomenon prevails in all the methods covered in this paper; when CASMEII and HS are used as target domains, the difference between WAR and UAR of the former is greater than that of the latter. For example, in Tables 25, the difference between WAR and UAR of no. 5 and no. 6 is significantly larger than the difference between no. 8 and no. 10. Looking at Table 1, it can be observed that the CASMEII dataset and the SMIC (HS) dataset are unbalanced in terms of microexpression categories, with the former having a majority of negative emotions (91/148). At the same time, the proportion of negative emotions in SMIC (HS) (70/164) is smaller than that of CASMEII. Analysis of this reason may be due to the imbalance of categories in the dataset. Further observation of both SMIC (NIR) and SMIC (VIS) databases showed consistent proportions of microexpression categories. When SMIC (NIR) or SMIC (VIS) is the target domains (no. 7 or no. 11), the difference between WAR and UAR is smaller than that between SMIC (HS) and CASMEII, which are also the target domains. This further proves that the category imbalance of microexpression database has a certain influence on MER.

5. Conclusion and Discussion

In this paper, we integrate data distribution methods with subspace learning methods from practical situations as a way to reduce the differences between source and target domains. That is, the SLJDA method is used to perform unsupervised cross-database microexpression recognition. Extensive experiments are conducted on CASMEII and SMIC databases, and the experimental results show that this method significantly outperforms other state-of-the-art domain adaptive methods for unsupervised CDMER tasks. However, there are still some issues to be investigated. As shown above, category imbalance in microexpression database is an important factor affecting the results of CDMER. Therefore, to improve the recognition results, we need to work on reducing the impact caused by the database category imbalance.

Data Availability

The data used to support this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Y. Zhang and Y. Liu jointly designed the study. Y. Liu collected and analyzed the data. G. Li and H. Peng reviewed and edited the manuscript.

Acknowledgments

This work was supported by the Key Research Project of Henan Higher Education Institution (Project no. 17B440001), Henan University of Science and Technology Enterprise Commissioning Project (JG017), and Henan Provincial Science and Technology Tackling Program Grant Project 212102210504.