Abstract

Many architectures of face recognition modules have been developed to tackle the challenges posed by varying environmental constraints such as illumination, occlusions, pose, and expressions. These recognition systems have mainly focused on a single constraint at a time and have achieved remarkable successes. However, the presence of multiple constraints may deteriorate the performance of these face recognition systems. In this study, we assessed the performance of Principal Component Analysis and Singular Value Decomposition using Discrete Wavelet Transform (DWT-PCA/SVD) for preprocessing face recognition algorithm on multiple constraints (partially occluded face images acquired with varying expressions). Numerical evaluation of the study algorithm gave reasonably average recognition rates of 77.31% and 76.85% for left and right reconstructed face images with varying expressions, respectively. A statistically significant difference was established between the average recognition distance of the left and right reconstructed face images acquired with varying expressions using pairwise comparison test. The post hoc analysis using the Bonferroni simultaneous confidence interval revealed that the significant difference established through the pairwise comparison test was mainly due to the sad expressions. Although the performance of the DWT-PCA/SVD algorithm declined as compared to its performance on single constraints, the algorithm attained appreciable performance level under multiple constraints. The DWT-PCA/SVD recognition algorithm performs reasonably well for recognition when partial occlusion with varying expressions is the underlying constraint.

1. Introduction

With the emergence of an era of unprecedented technological advancements and greater electronic access to confidential information, individuals, organizations, and nations at large are in search of more secure ways of protecting their data and assets. The safest and most proven are the biometric methods which include fingerprint detection and face recognition.

According to Dharavath et al. [1], face recognition has stood out in recent times as a unique method of biometric recognition and is greatly sought after and implemented by industries and governments. This is largely because face recognition systems require minimal or no cooperation of subjects as compared to other identification methods based on biometrics such as fingerprint, iris, voice, and hand geometry.

A typical architecture of a face recognition system involves image acquisition, preprocessing, feature extraction, and recognition. In well-controlled environments such as laboratories, there is little concern with acquisition of degraded gallery images and one has prior knowledge of the existing illumination, pose, expressions, and occlusion conditions. However, test images are usually obtained from uncontrolled environments with the presence of one or a combination of the aforementioned conditions and therefore subject to the presence of noise and image degradation. These constraints inhibit the performance of face recognition algorithms.

Turk and Pentland [2] proposed a PCA-based approach (eigenface method) for face recognition and showed that the approach could achieve a reliable real-time recognition in minimally constrained environment.

According to Ekenel and Stiefelhagen [3], the performance of face recognition algorithms drops significantly when run against occluded face images (upper and lower face occlusions). They discovered that a significant component that accounts for the decline in the performance of the algorithms is due to registration errors with the loss of discriminative features contributing least to the decline in the algorithms’ performance.

In recent years, sparse-based representation of faces has emerged as a successful approach to finding face recognition systems robust to variations in facial expressions and occlusions. Wang et al. [4] applied sparse-based classification to study facial expression recognition (FER) and demonstrated that incorporating appearance-based features could improve the performance of FER. However, according to Lee et al. [5], their approach did not consider intraclass variation in appearance and hence, such sparse representations are likely to contain coefficients corresponding to both similar expression and similar appearance which is not desirable in FER. Besides, they asserted that the use of appearance-based features may not be a straightforward method.

Yang et al. [6] proposed a regularized robust coding (RRC) model for recognizing faces by regressing a given signal with regularized regression coefficients. They showed through a comparative study that their method is more effective and robust to occlusions, pixel corruption, disguises, and big expression variations.

Although many architectures of face recognition systems have evolved to tackle the challenges posed by illumination variations, occlusions, poses, and expressions [7], most of them [811] have focused on single constraints at a time. According to Drira et al. [12], multiple constraints such as occlusions and facial expressions deteriorate the performance of face recognition algorithms significantly. Specifically, human facial expressions convey information such as one’s state of health and emotional disposition, and it is usually a nonverbal means of communication in several cultures. The dynamic nature of facial expressions coupled with its inherent intrasubject variability makes the recognition of faces under varying expressions a challenging task [13].

Also, facial occlusions can take the form of partial or near-total blockade, corruption of pixels, or the presence of random missing values. Approaches used in the case of partial occlusions in face images included occlusion-insensitive, local matching, and reconstruction methods [14].

Asiedu et al. [15] assessed the performance of Principal Component Analysis and Singular Value Decomposition algorithm using Discrete Wavelet Transform (DWT-PCA/SVD) as a preprocessing mechanism on reconstructed face image database. The reconstruction of the half-face images was done leveraging on the property of bilateral symmetry of frontal faces. They found the DWT-PCA/SVD algorithm to be suitable for recognizing face images under partial occlusion.

Despite the remarkable results attained by most of these methods, it is obvious that there is a trade-off between the complexity of the architecture of such recognition systems and their efficiency. Also, since most of these algorithms are assessed on single constraints, their performances cannot be guaranteed under multiple constraints.

In this study, we explore the performance of the DWT-PCA/SVD algorithm under multiple constraints (partial occlusion and varying facial expressions).

The rest of the paper is organized as follows: Section 2 (material and methods) discusses the data acquisition process, the adopted statistical or mathematical methods, the research design, and implementation. In Section 3 (results and discussion), we present the results of the experimental runs and evaluation of the algorithm. We examine the findings of the study in comparison with existing works in literature and finally conclude by summarizing the overall achievements of the study with some recommendation and directions for future developments in Section 4 (conclusion and recommendation).

2. Materials and Methods

2.1. Source of Data

The Cohn-Kanade AU-Coded Facial Expression (CKFE) and Japanese Female Facial Expression (JAFFE) databases were adopted for experimental runs of the study algorithm.

The CKFE and JAFFE databases contain frontal face images of twenty-six (26) and ten (10) subjects captured along seven universally accepted principal emotions (neutral, angry, disgust, fear, sad, surprise, and happy), respectively. The neutral expressions of the thirty-six (36) individuals are captured into the train-image database for training of the study algorithm. Figure 1 shows the face images of subjects in the train-image database from CKFE and JAFFE.

Half-face (partially occluded) images acquired along six universal principal emotions (angry, disgust, fear, sad, surprise, and happy) for each of the 36 individuals are captured into the test-image database. It is worth noting that these images are captured under multiple constraints (partial occlusion and varying expressions).

For the purpose of this study, half-face occluded images (total blockage left or right portions of the face) are referred to as partially occluded face images. Therefore, the constraints under study are as follows: (i)Half-face (partially occluded) images (total blockage of left or right portions of the face)(ii)Varying expressions

By the term multiple constraints, we mean half-face images (partially occluded face images) acquired under varying expressions.

The form of occlusion used in the study is synthetic and was created in the frontal face image database used to benchmark the face recognition system. Figures 2 and 3 contain samples of subjects from CKFE and JAFFE captured into the test-image databases.

Specifically, Figure 2 shows a sample of left half-face images acquired under varying expressions from CKFE and JAFFE databases (test-image database 1). Figure 3 contains right half-face images acquired under varying expressions from CKFE and JAFFE databases (test-image database 2).

All the images in the train-image database and test-image databases were resized into and dimensions, respectively. This makes the images uniform. The data types were digitized into gray-scale precision for preprocessing. This makes the matrices conformable and enhances easy computations.

2.2. Image Reconstruction

In this section, we adopt the reconstruction method introduced by Asiedu et al. [8]. For a left half-face image, the reconstruction is done as follows: (i)Rotate left half-face image through and denote it as (ii)Rotate the transpose of the left half-face image through and denote it as (iii)Concatenate and as

Similarly, for a right half-face image, the reconstruction is performed using the following steps: (i)Rotate right half-face image through and denote it as (ii)Rotate the transpose of the right half-face image through and denote it as (iii)Concatenate and as

Figure 4 shows the left and right reconstructed images acquired with varying expressions, respectively.

2.3. Research Design

The recognition system shown in Figure 5 depicts the stages in the recognition of face images under multiple constraints (partial occlusions and varying expressions). Generally, a recognition module consists of preprocessing, feature extraction, and recognition phases. When subjects in the train-image database are sent to the recognition module, they are first preprocessed using Discrete Wavelet Transform (DWT) and mean centering mechanisms. The preprocessed images are passed to the feature extraction compartment. During feature extraction, unique features are extracted using the Principal Component Analysis and Singular Value Composition (PCA/SVD) algorithm. The extracted features are stored in memory to be used for recognition. As stated earlier, there are two test-image databases under study. These are left half-face images acquired under varying expressions (test-image database 1 shown in Figure 2) and right half-face images acquired under varying expressions (test-image database 2 shown in Figure 3). When an unknown face image from either of the test-image databases is presented for recognition, the image is first reconstructed and then preprocessed (using DWT and mean centering mechanisms) and passed to the feature extraction unit. Unique features of the test image are also extracted and passed to a classifier to be matched with the stored knowledge from the train-image database. The recognition phase begins with matching the extracted unique features from the test image to the stored created knowledge from the train images. During classification, the minimum recognition distance is preferred as it signifies a closer match. Figure 5 shows a design of the recognition module/system used for the study.

2.4. Preprocessing

According to Dharavath et al. [1], image processing techniques enhance the quality of a captured image and improve the recognition rate. They further stated that, irrespective of the resolution and illumination of an image and the image preprocessing technique used, image preprocessing has an unequivocally significant impact on face recognition.

Various processing techniques have been introduced to enhance captured images before they are taken through a feature extraction process for identification. These include face detection and cropping, image resizing, image normalization, and image denoising and filtering. Among the existing image enhancement procedures, filtering techniques have become very popular over the years for addressing the problem of noise removal and edge enhancement [16, 17].

In this study, mean centering and Discrete Wavelet Transform (DWT) mechanisms were adopted as preprocessing mechanisms to remove acquired noise and suppress unwanted distortion of image features. The DWT and mean centering mechanisms are discussed in Subsection 2.4.1 and Subsection 2.4.2, respectively.

2.4.1. Discrete Wavelet Transform (DWT)

Wavelet-based transforms have been successfully employed to convert signals from the spatial domain into the frequency domain in order to determine the essential components or a subset of features required for recognition, particularly in the areas of data compression and image denoising.

According to Wadkar et al. [18], wavelet-based transforms derive their utility from their complete theoretical framework, the flexibility in the choice of bases, and computational efficiency. Discrete Wavelet Transforms perform a multiresolution analysis of a face image within both time and frequency domains, and as such, both spatial and frequency characteristics of the image are preserved.

Given an input face image, DWT decomposes the image into four subbands: LL, LH, HL, and HH representing the approximation coefficients, horizontal, vertical, and diagonal components, respectively. The LL subband represents the lower resolution estimate of the original image and contains the global information of the face whilst the LH, HL, and HH subbands represent local feature information such as the eye, mouth, and edge information [9, 19].

Among the four subbands, the LL subband is least susceptible to noise or unwanted signals, making it the most stable subband, and can therefore be further decomposed to obtain finer details, whilst the HH subband is most susceptible to noise, rendering it the most unstable subband. The HL subband is particularly useful in facial recognition problems involving expressions (as they contain facial expressions features) whilst the LH subband contains information of facial poses.

In DWT, one can choose among different wavelets depending on the characteristics of the image that is of interest. Notable among these are the Haar, Morlet, Morse, Daubechies, Coiflet, and Symlet. Jyotsna et al. [20] conducted experiments for DWT wavelet selection on the Symlet family and on four other different wavelet families (Daubechies, Coiflets, Discrete Meyer, and Biorthogonal wavelet family). They found Symlet-6 wavelet to be the best recording the highest accuracy of 96.67% against other Symlet family members. The mean computational time of the Symlet-6 wavelet was found to be 3.69 seconds which was slightly higher than the mean computational time for the Haar wavelet (3.26 seconds).

In this study, we adopted the Haar wavelet due to its orthogonal property which aids in the preservation of distances after transformation and also because of its computational simplicity.

Now, consider a vectorized image of dimension , . Then, the single-level Haar transform decomposes into two signals, (mean coefficient row vector) and (detail coefficient row vector) each of length . The components of (mean coefficient row vector) and (detail coefficient row vector) are and ; , respectively, and are expressed in matrix form as

From equation (3), the components of the mean coefficient vector are given as

and the components of the detail coefficient vector are

Next is to concatenate the vectors and into another -vector, which is a linear matrix transformation of given as

The transformation exposes the hidden noise in the image for filtering. In this study, we adopted the Gaussian filter, because the Gaussian mixture is isotropic and can represent data distributions by a mean vector and a covariance matrix [21]. Most importantly, Gaussian noise is the default noise acquired due to illumination variations.

The vector is inverted to after filtering. The inverse transformation of the vector can be represented in matrix form as

The components and can be explicitly written as shown in equations (8) and (9), respectively.

Figure 6 shows the DWT cycle using the Haar wavelet.

2.4.2. Mean Centering

Consider an image space , whose entries are the vectorised form of the image matrix of subjects in the study database. Now, define as centering matrix given as where is the identity matrix and is matrix with all entries equal to 1.

Mean centering of an image is simply done by subtracting the mean image from the original image under consideration. The th mean centered image, , is given by where , is the mean image and is the mean centered matrix of the face space.

2.5. Feature Extraction

Face images belong to high-dimensional spaces. However, not all facial features are discriminatory. Therefore, including such features in a recognition module creates redundancy and increases the computational cost of recognition modules. According to Iwendi et al. [22], the main focus of feature optimization is not only to decrease the computational cost but also to find such feature subsets that can work with different classifiers to produce better results. Methods aimed at selecting relevant features from the feature space of face images therefore play a vital role in the efficiency and accuracy of recognition algorithms.

Principal Component Analysis (PCA) is one such dimensionality reduction technique that has been used to obtain compact representations of data [2]. PCA, also called Karhunen-Loeve expansion, obtains basis vectors for a subspace of the facial feature space from its covariance matrix such that each face can be approximated as a linear combination of the basis set, where the weights in the linear combinations represent global facial features [23].

According to Martinez and Kak [24] and Reddy et al. [25], PCA is less sensitive to different data sets and it has outperformed LDA in most instances. In this study, we adopted PCA to extract relevant facial features and for dimensionality reduction based on the above merits. As indicated earlier, the DWT-PCA/SVD algorithm was used to train the image database to extract unique face features for recognition.

In the feature extraction unit, the primary objective according to Asiedu et al. [9] is to find a set of orthonormal vectors, , which best describe the distribution of the image data. We choose vector such that is a maximum subject to the orthonormality constraints, where , are the eigenvalue and eigenvector pair extracted through Singular Value Decomposition (SVD) of the dispersion matrix, .

The SVD is related to the theory of diagonalizing a square matrix. In a specific case where the matrix under consideration is a symmetric matrix, the decomposition is called the eigenvalue decomposition (EVD) [9].

The dispersion matrix is given by

Through SVD decomposition of , we obtain two orthogonal matrices and and a diagonal matrix . The eigenfaces are then computed as where is the th column vector of .

As stated earlier, unique face features are extracted and stored in memory for classification. Specifically, the extracted features (principal components) for the th image in the train-image database are given as

By extension, the extracted features (principal components) of all subjects in the train-image database are represented by . These created knowledge are stored in memory for matching purposes.

2.6. Recognition Process

The recognition phase is the final stage in the recognition module. In the recognition unit, the extracted features of an unknown face (from either of the test-image databases shown in Figures 2 and 3) are matched with the stored features from the train-image database. This is done in the classification unit. For an unknown face image selected from the test-image database, the extracted principal components are computed as with . The recognition distances (euclidean distances) between the principal components of the train images and the principal components of the unknown image are computed as

The train image that corresponds to the minimum euclidean distance and , is chosen as the closest match to the unknown test image.

3. Results and Discussion

In this section, we present a sample of the matching results and the numerical and statistical evaluations of the study algorithm.

Figure 7 shows the results (recognition distance and decision) of matching some selected test images (left reconstructed with varying expressions) to the train image database.

It can be seen from Figure 7 that there were five mismatches (wrong matches) for four subjects acquired from left reconstruction of the face images with varying expressions. These represent the recognition results of four individuals out of a total of 36 individuals in both the CKFE and JAFFE databases. Figure 8 shows the recognition results of some selected test images (right reconstructed with varying expressions) from both CKFE and JAFFE databases.

From Figure 8, there were three mismatches for the same sample of four individuals acquired from right reconstructed face images with varying expressions. These represent the results of four subjects out of the 36 individuals captured into the study databases.

3.1. Numerical Evaluations

After 36 experimental runs of the DWT-PCA/SVD algorithm using left reconstructed face images with 6 varying expressions as test images, there were a total of 22 mismatches on the CKFE database and 27 mismatches on the JAFFE database. This constitutes a total of 49 wrong matches out of 216 test images passed through the recognition module. In effect, the total number of correct matches was 167 out of 216 test images. The average recognition rate of the study algorithm on left reconstructed test images with varying expressions (multiple constraints) is 77.31%.

Also, after 36 experimental runs of the DWT-PCA/SVD algorithm using right reconstructed face images with 6 varying expressions as test images, there were a total of 15 wrong matches on the CKFE database and 35 wrong matches on the JAFFE database. This gives a total of 50 mismatches out of 216 test images sent for recognition. The total number of correct recognitions was therefore 166 out of 216 test images. The average recognition rate of the study algorithm on right reconstructed face images with varying expressions (multiple constraints) is 76.85%.

The average computational time for the recognition of all 216 images was approximately 14 seconds each for test-image databases 1 and 2.

3.2. Statistical Evaluations

According to Johnson et al. [26], a rational approach to comparing two treatments or the presence and absence of single treatment is to assign both treatment to the same or identical units. Specifically for this study, the treatments are left reconstruction of partial occluded face images (captured in Figure 4(a)) and right reconstruction of partial occluded face images (captured in Figure 4(b)). Also, the face images were acquired under varying expressions.

The task here is to assess the performance of the study algorithm on left and right reconstructed face image databases acquired with varying expressions.

The responses are considered paired because they are captured for the same individuals under either left or right reconstruction procedures. The multivariate pairwise comparison method is adopted to assess the performance of the study algorithm on left and right reconstructed face-image databases acquired with varying expressions. The paired responses may then be analysed by computing their differences, thereby eliminating much of the influence of extraneous unit-to-unit variation.

Now, let and , , , be the recognition distances for the individual with expression in a left and right reconstructed database (Figures 4(a) and 4(b)), respectively; then, the observed paired difference is given by for and .

The underlying assumption of the pairwise comparison test is that the paired difference should be multivariate normal with mean and , .

For a population, reject the null hypothesis, , if the following is observed: where is the mean of the observed difference and is the upper percentile of an -distribution . The Bonferroni simultaneous confidence intervals for the individual mean differences are where is the th diagonal element of . The distribution of the observed differences between the recognition distance of the left and right reconstructed face images with varying expressions was not multivariate normal. The Doornik-Hansen test statistic value of the logarithmic transformation of the observed difference is with a corresponding value of . This means the logarithmic transform of the observed difference is multivariate normal. Figure 9 shows a chi-square Q-Q plot of the transformed observed difference.

It is evident from Figure 9 that the transformed observed difference is multivariate normal since there is no consistent deviation from the unit slot.

Upon satisfying the assumption of multivariate normality, the pairwise comparison test was performed on the transformed observed difference. Table 1 shows the results of the pairwise comparison test.

From Table 1, the test statistic of the pairwise comparison test is 17.20518 with a corresponding value, . This indicates that the null hypothesis of zero mean difference is not tenable at level of significance. We can therefore conclude that there exists statistically significant difference between the average recognition distance of left and right reconstructed face images with varying expressions.

We subsequently conducted a post hoc test to specifically assess the achieved significance. Table 2 shows the results of a 95% Bonferroni simultaneous confidence interval (CI) constructed for this assessment.

It can be seen from Table 2 that there exists a statistically significant difference between the average recognition distance of the left and right reconstructed face images with sad expressions. Specifically, the average recognition distance of the left reconstructed images with sad expressions (1368.74) was lower than the average recognition distance of the right reconstructed images with sad expressions (1737.26). It is worthy to note that a relatively lower recognition distance is preferred as it signifies a closer match.

There were no statistically significant differences between the average recognition distance for left and right reconstructed images with angry, disgust, fear, happy, and surprise expression.

4. Conclusion and Recommendation

The study sought to evaluate the performance of the DWT-PCA/SVD recognition algorithm under multiple constraints (partial occlusion and varying expressions). We leveraged on the property of bilateral symmetry of the face image to reconstruct half-face images for recognition. Numerical evaluation of the study algorithm gave average recognition rates of 77.31% and 76.85% for recognition of left and right reconstructed face images with varying expressions. From the numerical results, it can be concluded that when varying expressions are the underlying constraints, left reconstructed face images have relatively higher average recognition rate as compared to the right reconstructed face images using the DWT-PCA/SVD recognition algorithm. This finding is consistent with those of Singh and Nandi [11], and Asiedu et al. [8, 15] with the exception that their study focused on single constraints. The relatively lower average recognition rates of the DWT-PCA/SVD algorithm found in this study reflect the use of multiple constraints (partial occlusion and varying expressions).

The multivariate statistical evaluation method introduced revealed that there exists a statistically significant difference between the average recognition distance of the left and right reconstructed face images with varying expressions. This is a notable result given the fact that the numerical evaluation gave almost equal recognition rates for the left and right reconstructed face image with varying expressions. Specifically, the post hoc analysis using the Bonferroni simultaneous confidence interval revealed that the significant difference established through the pairwise comparison test was mainly due to the sad expressions. That is, there exists a statistically significant difference between the average recognition distance of left and right reconstructed face images with sad expressions. There were no significant differences between the average recognition distance for left and right reconstructed face images with angry, disgust, fear, happy, and surprise expressions.

The study recommends the DWT-PCA/SVD recognition algorithm as suitable for recognition when partial occlusion with varying expressions is the underlying constraint.

Although the DWT-PCA/SVD algorithm performed creditably well on multiple constraints, it is evident from the results of the study that the algorithm’s performance is partially hindered by the existence of multiple constraints. Future study will focus on enhancing the algorithm’s performance in the presence of multiple constraints.

Data Availability

The image data supporting this study are from previously reported studies and datasets, which have been cited. The processed data are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest.