A Smoothed Matrix Multivariate Elliptical Distribution-Based Projection Method for Feature Extraction

Qiu, Hong; Wang, Renfang; Sun, Dechao; Liu, Xinwei; Zhang, Liang; Liu, Yunpeng

doi:https://doi.org/10.1155/2022/2551137

Computational Intelligence and Neuroscience

On this page

Abstract Introduction Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 2551137 | https://doi.org/10.1155/2022/2551137

A Smoothed Matrix Multivariate Elliptical Distribution-Based Projection Method for Feature Extraction

Hong Qiu,¹Renfang Wang,¹Dechao Sun,²Xinwei Liu,¹Liang Zhang,¹and Yunpeng Liu³

Academic Editor: G. Thippa Reddy

Received25 Apr 2022

Revised17 Jul 2022

Accepted18 Aug 2022

Published30 Sept 2022

Abstract

Big data has the traits such as “the curse of dimensionality,” high storage cost, and heavy computation burden. Self-representation-based feature extraction methods cannot effectively deal with the image-level structural noise in the data, so how to character a better relationship of reconstruction representation is very important. Recently, sparse representation with smoothed matrix multivariate elliptical distribution (SMED) using structural information to handle low-rank error images caused by illumination or occlusion has been proposed. Based on SMED, we present a new method named SMEDP for feature extraction. SMEDP firstly utilizes SMED to automatically construct an adjacency graph and then obtains an optimal projection matrix by maximizing the ratio of the local scatter matrix and the total scatter matrix in the PCA subspace. Experiments on the COIL-20 object database, ORL face database, and CMU PIE face database prove that SMEDP works well and can achieve considerable visual and recognition performance than the relevant methods.

1. Introduction

Big data analytics as the premise and the foundation of artificial intelligence can easily cause the problem of “curse of dimensionality” [1], high storage cost, and heavy computation burden [2]. How to deal with such high-dimensional data is always a vital issue in many applications, including machine intelligence, data mining, pattern recognition, and image processing [3–5]. Feature extraction can effectively solve the above problems by reducing the size of a data set while preserving relevant information. Thus, it has become a research hotspot [6–8].

Principle component analysis (PCA) [9]and linear discriminant analysis (LDA) [10] are two commonly used linear feature extraction methods. The former projects the high-dimensional data by maximizing the variance while the latter maximizing the ratio of the between-class scatter matrix and the within-class scatter matrix. Although they are simple and effective for some practical applications, they may fail to discover the underlying manifold structures of the observed data that are nonlinear.

To discover the intrinsic low-dimensional embedding of the observed data, researchers developed many manifold learning methods. Isometric feature mapping (ISOMAP) [11], Laplacian eigenmaps (LE) [12], and locally linear embedding (LLE) [13] are the most representative methods. They try to preserve the intrinsic geometrical distribution of the high-dimensional data. ISOMAP takes the pairwise geodesic distance between samples as the feature of manifold structure, and LE builds the neighborhood information graph to maintain the neighborhood relationships among the samples, while LLE believes that local relations characterize the manifold structure of data. Although they are excellent in some respects, there are still some problems, such as sensitivity to noise and “out-of-sample” problem. Thus, as improvement methods, neighborhood preserving embedding (NPE) [14], locality preserving projections (LPP) [15], and so on have been proposed. NPE and LPP are the linearized versions of LE and LLE, and they focus on the local information of the data and linearize the nonlinear structure to handle the “out-of-sample” problem. But apparently, the k-nearest graph or the ε-ball graph they use is very sensitive to parameter settings and noise interference [16].

Numerous researchers study it constantly to solve these problems. They found that the methods mentioned above can be unified with a graph embedding framework [17]. It is critical to construct adjacency graphs in this framework, which can effectively reveal the essential structure of the data. More recently, sparse representation (SR)-based methods are adopted to automatically construct adjacency graphs, since it has the advantages of sparsity, robustness to noise, and datum-adaptive neighborhood. For instance, Qiao et al. [18] developed a sparsity preserving projections (SPP) by using a modified SR-based classification (SRC) [19] to automatically build the graph. And then, the projections are obtained such that the sparse reconstructive relationship of the data can be best preserved. But it is a time-consuming process since SPP aims to solve the L₁-norm optimization problem. Therefore, Yang et al. [20] solved the L₂-norm optimization problem instead of the L₁-norm optimization to reduce the running time, and proposed a collaborative representation-based projection (CRP). It benefits from the collaborative representation classifier (CRC) [21] and can obtain a comparable result with SPP. Note that both SPP and CRP choose their neighborhood automatically and contain natural discriminating information, hence can be more practical than some manifold learning methods.

The limitations that still exist though SPP and CRP work well in some applications. First, they are all vector-based methods. That means they need to stretch the image matrix into a vector in advance, which will ignore the spatial structural information of the data. Second, for many error images caused by illumination variations or occlusions, the assumption of independence of error variables may not be appropriate in some practical situations. Many SR works attempt to handle the above problems. Yang et al. [22] and Xu et al. [23] studied the illumination variations and occlusions caused error images and observed that the structure of error variables is low rank or approximately low rank [24]. They pointed out that low-rank constraints can be used to describe the structure noise, whereas it is NP-hard in solving the rank minimization problem [25]. A general way to address such an issue is to substitute the rank function with some tractable surrogate.

To this end, Yang et al. utilized nuclear norm to characterize the illumination variations and occlusions caused error images and developed a nuclear norm-based matrix regression model (NMR) [22]. Based on NMR, Yang et al. [26] explored NMR to construct the graph and gave a NMR-based projection (NMRP). NMRP has shown its promise in feature extraction. Qian et al. [27] integrated error detection and error support into one model and provided a robust nuclear norm regularized regression (RNR). Xie et al. [28] described the structural noise and the mixed noise by the robust nuclear norm and proposed a novel robust matrix regression (RMR). Significantly, these methods all assumed that each pixel of noise follows Gaussian or Laplacian distribution independently, which is not flexible for practical application. Therefore, some researchers are devoted to the study of low-rank representation (LRR) [29]. Fang et al. [30] improved the latent low-rank representation (LatLRR) [31], which is an excellent LRR-based feature extraction method. They proposed an approximate low-rank projection learning (ALP) and an extended ALPL (EALPL) to learn an optimal projection that can extract more discriminant features than LatLRR. Then, combining with rigid regression, they extended LatLRR to the supervised scenario and proposed a supervised ALPL (SALPL). Zhang et al. [32] introduced orthogonal matrix, row sparsity constraint on the projection matrix, and weighted truncated Schatten p-norm (WTSN) [32] to improve the robustness of the LRR model. However, these methods cannot handle the “out-of-sample” problem.

Recently, Qiu et al. [33] used a matrix multivariate elliptical distribution to characterize the error image and developed a sparse representation with smoothed matrix multivariate elliptical distribution (SMED). Compared with Gaussian or Laplacian distribution, a matrix multivariate elliptical distribution is more suitable for describing the structure noise. Besides, it introduces regularization terms to smooth the objective function, which makes the model easily to get a globally optimal solution. In short, SMED considers both of the global structure and the local structure based on sparse and low-rank representation. Thus, in this paper, we plan to explore SMED to automatically construct an adjacency graph and then learn an optimal projection matrix in the PCA subspace. The proposed method is named a smoothed matrix multivariate elliptical distribution-based projections (SMEDP). Since the projection matrix can be got directly, SMEDP can solve the “out-of-sample” problem well.

Figure 1 is the flowchart of our method, and the major contributions of this paper include: (1) in order to avoid the adverse effects produced by neglecting structural information, we propose a method named SMEDP. SMEDP can learn an adaptive graph with a modified SMED. (2) A method of iteratively reweighted least squares is designed to obtain the globally optimal weight matrix, which can be used to calculate the final projection matrix. (3) Comprehensive experiments have been carried out to indicate the superiority of SMEDP when compared with several relevant methods.

The rest of the paper is organized as follows. Section 2 gives a brief review of SMED. Section 3 introduces the SMEDP for feature extraction in detail. In Section 4, extensive experiments on three databases are presented to demonstrate the effectiveness of SMEDP. Section 5 draws the conclusions and gives the prospect of the future work.

2. SMED

The idea of SR-based methods is to derive a series of appropriate representation coefficients. In the following, we give an overview of SMED [33].

Given a dataset of n training image matrices {X₁¹, X₂¹, …, X_n1¹, X₁², X₂², …, X_n2², …, X₁^C, X₂^C, …, X_nC^C}∈R^p^×^q and an image matrix Y ∈ R^p×q, we can represent Y linearly using training image matrices, that is,where X_i^c represents the i-th training sample from the cth class, α = [α₁, α₂,…, α_n]∈Rⁿ is a vector of representation coefficients, and E is the structural noise.

To simplify the formula, linear mapping from Rⁿ to R^p ^× ^q can be defined as follows:

Then, (1) becomes

(3) is a matrix variate-based linear model. Qiu et al. [33] have studied the error image and found that the correlations between pixels in E can be effectively alleviated by the proposed matrix multivariate elliptical distribution. Subsequently, a sparse representation with smoothed matrix multivariate elliptical distribution (SMED) model is developed to preserve the spatial information about the structural noise.

SMED assumes that E follows the matrix multivariate elliptical distribution and the set of representation coefficients follows the independent Gaussian distribution (l = 2) or Laplacian distribution (l = 1). Then, we have

Based on the definitions of nuclear norm and l_p-norm, (4) is equivalent to:where λ = 2/β, k > 0, and 0 < l ≤ 2. Since both two terms in (5) are nonsmooth, SMED introduces regularization terms to smooth them:where μ is a penalty parameter, I ∈ R^q ^× ^q is an identity matrix, and 1∈Rⁿ is an all ones vector.

We obtain the optimal solution α by solving the (6), and the corresponding reconstruction error of the i-th class is defined as follows:where represents the reconstructed image of Y, means the reconstruction of Y in class i, and δ_i(α) is a vector with only nonzero entries associated with the i-th class.

3. Algorithm Derivation of SMEDP

It is essential to construct adjacency graphs in graph-based feature extraction methods. Since SMED can better handle with variations of structural noise than several relevant methods, such as SRC and CRC, we explore SMED to automatically construct an adjacency graph and then seek to find a rule to get an optimal projection matrix. Subsequently, a method called smoothed matrix multivariate elliptical distribution-based projection method (SMEDP) is developed.

3.1. Constructing an Adjacency Graph

In this section, we will introduce how to construct an adjacency graph. Firstly, X = {X_i^c}∈R^m ^× ⁿ, where m = p × q is adopted as the vertices of the graph. Secondly, smoothed matrix regression model based on the modified SMED is presented, and the method of iteratively reweighted least squares is used to solve the model to obtain the optimal weight vector α_i for each X_i(i = 1, 2, …, n). The edges of the graph are composed of a weight matrix α whose elements are α_i. Subsequently, Algorithm 1 describes each step of the procedure in detail.

The smoothed matrix regression model based on the modified SMED is defined as follows:where the parameters k and l in (6) are set to 1, α_i = [α_i1, …, α_i,i-1, 0, α_i,i+1, …, α_in]^T, and the element α_ij (i≠j) in α_i represents the contribution of each X_i to reconstructX_j.

Motivated by SMED, we provide details of using the method of iteratively reweighted least squares to solve Equation (8) [33], and it can be reformulated as follows:

Let γ(α_i) = tr((X_i−X(α_i))^T(X_i−X(α_i)) + μ²I)^1/2 and δ(α_i)= (∑_j ₌ ₁ⁿ |α_ij|² + μ²)^1/2. Then, ξ(α_i, μ) = γ(α_i) + (λ/2)δ(α_i). The derivative of γ(α_i) iswhere R = ((X_i−X(α_i))^T(X_i−X(α_i)) +μ²I) ^−1/2. The derivative of δ (α_i) is as follows:where T=(Σ_j ₌ ₁ⁿ|α_ij|² + μ²)^-1/2.

With the derived γ(α_i) and δ (α_i), we get the derivative of ξ(α_i, μ). By setting it to 0, we obtain that:

Or equivalently,

As a well-known Sylvester equation, the computational complexity of (13) is O(n³) [34–36]. We can use MATLAB command lyap to get the weight matrix α = [α₁, α₂,…, α_n]∈R^n×n.

Algorithm 1 shows the detailed steps for solving α.

	Input: a dataset of n training image matrices{X_i^c}∈R^m×n, m = p×q, and an image matrix Y ∈ R^p×q
	Initialize t = 1, R = I ∈ R^n×n, T = e ∈ Rⁿ, λ > 0, μ > 0 and ε = 1×e⁻³
	while not converged do
(1)	Update α_i(t+1) as follows:

(2)	Update R_t+1 and T_t+1 as follows:


(3)	t = t+1.
(4)	If \|\|α_t+1−α_t \|\|_∞ ≤ ε, break.
	end while
	Output: the adjacent weight matrix α

3.2. Calculating the Projection Matrix

In this section, we will introduce how to calculate the projection matrix to improve classification performance. Firstly, we utilize PCA to reduce the dimensions of the raw data. Secondly, we seek to find an optimal projection matrix by maximizing the ratio of the local scatter matrix and the total scatter matrix in the PCA subspace. Subsequently, the final projection matrix is obtained based on the optimal projection matrix and the PCA-transformation matrix.

According to the ideas of CRP [20] and NMRP [26], the local scatter matrix is defined as follows:where x_i = Vec(X_i) means the vector form of X_i.

The total scatter matrix is expressed as follows:where is the mean vector of the training samples and the matrix is augmented by .

So, the objective function of SMEDP is expressed as follows:

Or equivalently,where α_β = α+α^T-α^Tα. Then, generalized eigenvector decomposition is adopted to solve Equation (17). The optimal projection matrix P can be calculated bywhere P = [p₁, p₂, …, p_d] is the first d largest eigenvalues of Equation (18).

3.3. SMEDP

In this subsection, we summarize the details of SMEDP by Algorithm 2.

	Input: a dataset of n training image matrices{X_i^c}∈R^m×n, m = p×q, and an image matrix Y ∈ R^p×q
(1)	Use Algorithm 1 to calculate the weight vector α_i for each X_i.
(2)	Apply PCA to reduce the dimensions of X = [Vec(X₁¹), Vec(X₂¹), …, Vec(X_n1¹), Vec(X₁²), Vec(X₂²), …, Vec(X_n2²), …, Vec(X₁^C), Vec(X₂^C), …, Vec(X_nC^C)] and denote W_PCA as the PCA-transformation matrix.
(3)	Compute the local scatter matrix Xα_βX^T and the total scatter matrix (X−) (X−)^T in the PCA subspace.
(4)	Calculate the P = [p₁, p₂, …, p_d] of corresponding to the first d largest eigenvalues.
	Output: the optimal projection matrix P and the final projection matrix W=P^TW^T_PCA

4. Experiments

We conduct experiments on COIL-20 object database, ORL face databases, and CMU PIE face databases to evaluate the performance of SMEDP. All the experiments are implemented in the same environment (CPU: Intel i7+ Lenovo, RAM: 8 GB, MATLAB: 2020b).

4.1. Experiments on COIL-20 Object Database

COIL-20 object database [37] contains 20 objects. For each object, 72 images are taken from different view directions. We use a subset of the COIL-20 for our experiments. This subset includes 50 images of 10 objects (each object has 5 images). All images were in grayscale and resized to 40 × 40 pixels. Figures 2 and 3 show some images.

We test the visual performance of SMEDP and compare it with the classical methods, that is, PCA; some manifold learning methods, that is, LLE, ISOMAP, and LPP; and SR-based methods, that is, SPP, CRP, and NMRP. The number of nearest neighbor point k in LEE, ISOMAP, and LPP is, respectively, set to 15, 20, and 20. We select Justin Romberg’s l1qc_logbarrier to calculate L1-norm in SPP. Besides, the parameters in SPP, CRP, NMRP, and SMEDP are determined by grid search. In experiments, we start by using PCA to reduce the dimensions of the data to 70, and then adopt the above methods to convert the 70-dimensional representation to a 2-dimensional map. Figure 4 shows the resulting map, where the color of a feature point in the scatterplot represents the class information of data.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

From Figure 4, we can find that: (1) PCA confuses the projected data since it has a large variance. (2) LEE and ISOMAP fail to effectively separate data points, which are of different classes. Although they take into account the local structure of the data, the Euclidean distance measure or the geodesic distance measure cannot determine the real local structure. (3) The sparsity properties could enhance the visual performance, so the separation degree of SPP, CRP, and NMRP is better than LEE and ISOMAP. About SPP and CRP, CRP generates a smaller intraclass scatter in our experiment. This result demonstrates that it is the collaborative representation that plays an important role in the effectiveness of SRC. (4) The matrix-based methods usually surpass the vector-based methods (e.g., NMRP and our SMEDP method, which make use of the structural information outperform SPP). Notably, they enhance the performance of intraclass clustering. (5) NMRP can properly embed most classes, but some classes are still mixed. Its performance is inferior to CRP since the nuclear norm used in the matrix regression model is still a loose approximation to the natural rank constraint. (6) Due to the introduced regularization term in matrix regression model, our SMEDP method can achieve better results than others. Since SMEDP ignores the class label information, it also exists some overlapping phenomena among classes. We can try to absorb supervised information into the method to overcome the limitation further.

4.2. Experiments on ORL Face Database

ORL face database [38] consists 40 people. For each people, 400 images are taken from various illumination and facial expressions. All images were in grayscale and resized to 32 × 32 pixels for experiments in this subsection. Figure 5 shows sample images of one person.

4.2.1. Parameter Selection

Feature extraction methods need to select insensitive parameters [39]. We need to tune three parameters, that is, λ, μc, and ρ, for SMEDP. So, an experiment on parameter sensitivity is conducted in this subsection. Since it is difficult to tune the three parameters simultaneously, the strategy of grid search [39] is employed to select the parameter. The core idea of grid search is fixing one parameter and tuning the others. Specifically, 8 images per class are randomly selected in the ORL database for training and the remaining for testing. The number of principal components is set to 290 during the PCA step. Then, we perform SMEDP to convert the 290-dimensional representation to a 110-dimensional map. Finally, we use the NN classifier with cosine distance for classification. The experiment is run 50 times, and the results are reported in Figure 6.

(a)

(b)

(c)

Two cases are used to evaluate the effects of different parameter settings: (1) Fix ρ = 1.1, tune λ and μc from {0.0001, 0.001, 0.01, 0.1, 1}. Figure 6(a) shows the results, and it is obvious that the parameter λ and μc do affect the performance of SMEDP. When λ = 0.01, 0.1, 1 and μc = 0.01, 0.001, 0.0001, the performance of SMEDP changes a little and the recognition rate is more than 90%. When λ = 0.1 and μc = 0.01, SMEDP achieves the best effect; (2) fix λ = 0.1 and μc = 0.01, vary ρ from {0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0, 1.05, 1.1, 1.15, 1.2, 1.3, 1.35, 1.4, 1.45, 1.5, 1.55}. Figure 6(b) gives the average recognition rate curves of SMEDP with different values of ρ. From Figure 6(b), we can find that SMEDP is robust with different ρ, which means parameter ρ has a certain degree of influence on the recognition rate, but the influence is insignificancy. To facilitate the analysis of data, the part of Figure 6(b), where the recognition rate ranges from 87% to 93%, is shown in Figure 6(c). Besides, from the results, we can find that ρ with a value of 1.1 produces better performance. Overall, the above analysis provides a guidance of parameters selection for SMEDP in follow-up experiments.

4.2.2. Classification Results with Different Expressions and Illuminations

We construct classification experiments to evaluate the robustness of SMEDP with different expressions and illuminations. We also compare SMEDP with PCA, LPP, SPP, CRP, and NMRP. The parameters of each method are set as follows: (1) nearest neighbor point k in LPP is set to 20; (2) Justin Romberg’s l1qc_logbarrier is used to calculate L1-norm and λ is set to 1e-3 in SPP; (3) in CRP and NMRP, λ is set to 0.1 and 1, respectively; (4) for SMEDP, the parameters λ, μc, and ρ are set as 0.1, 0.01, and 1.1, respectively. We run the experiments 50 times. In each run, l (= 5, 6, 7, and 8) images are randomly selected from each subject as a training set and the rest 10-l images are treated as a test set. Besides, SPP, CRP, NMRP, and SMEDP all apply a PCA preprocess step to reduce the dimensions of the raw data. For l (=5, 6, 7, and 8), the numbers of principal components are set to 170, 210, 250, and 290, respectively. Finally, the numbers of subspace dimensions by all the methods except SPP are varied from 50 to the numbers of principal components at intervals of 30, 40, 50, and 60. Since SPP is sensitive to large variations in pose, the numbers of subspace dimensions for ORL database are varied from 30 to 50 to maintain a relatively high level of recognition rate. Figure 7 shows the average recognition results of different methods, and Table 1 (the best performing method has been highlighted) gives the average maximal recognition rate of each method.

(a)

(b)

(c)

(d)

From the results, we can see that SMEDP performs the best, which verifies that it can preserve the primary discriminant information for classification. Specifically, SMEDP can outperform the others by about 2% improvement on l (= 5, 6) and about 1% improvement on l (= 7, 8). Since NMRP and SMEDP get an adjacent weight matrix by matrix regression, their results are closer. Moreover, CRP, which is based on vector regression, achieves a performance comparable to NMRP in most cases. This is mainly because collaborative representation can improve the discriminating capability to some extent. Since the limitations of PCA, LPP, and SPP mentioned above, the classification performance of them is poor.

4.3. Experiments on CMU PIE Face Database

CMU PIE face database [40] consists 41,368 images of 68 people. All images are taken from different poses, illumination, and facial expressions for each individual. We choose C09 (one of five subsets of near frontal poses, which contains 1632 images) for the test. All images were in grayscale and resized to 64 × 64 pixels. Figure 8 shows sample images of one person.

4.3.1. Classification Results with Different Illuminations

For evaluating the robustness of SMEDP to the data with different illuminations, we conduct experiments and systematically compare it with the methods mentioned in Section 4.2.2. Specifically, we choose the first l (=7, 9, 11, 13) images of one person for training and the rest 24-l images for testing. It is worth noting that LPP, SPP, CRP, NMRP, and SMEDP all involve the selection of parameters, and they are adjusted based on grid search. In this subsection, they are set as follows: (1) nearest neighbor point k in LPP is set to 20; (2) in SPP, Justin Romberg's l1qc_logbarrier is used to calculate L1-norm and λ is set to 1e−4; (3) in CRP and NMRP, λ is set to 0.01 and 0.1, respectively; (4) for SMEDP, λ, μc, and ρ are set as 0.01, 0.01, and 1.1, respectively. PCA is first used to preprocess the images. We set the dimensions of the PCA subspace to 300, 250, 200, and 150, which correspond to l (=7, 9, 11, 13), respectively. Figure 9, Tables 2 and 3 show the results. Note that numbers in bold mean the best results in Tables 2 and 3.

(a)

(b)

(c)

(d)

From the experimental results, it can be seen that the recognition rate of these methods generally increases with the growth of the feature dimensions. Our proposed SMEDP always obtains the best results, which demonstrates its robustness to illumination variations. One can also note that SMEDP can outperform the others by about 2% improvements on 150 dimensions when l = 7 and 11. Besides, NMRP and CRP are comparable to SMEDP in some cases since the former takes advantage of matrix structure and the latter uses collaborative representation to preserve the primary discriminant information. SPP is superior to PCA and LPP, but inferior to SMEDP and NMRP. The superiority is distinctive because the sparsity properties are considered in the construction of graph. PCA and LPP perform worse than others. The reason is that they only take global structure or local structure of data into account.

4.3.2. Classification Results with Block Occlusions

In this subsection, we verify the robustness of SMEDP. We use the similar experiment setting as in Section 4.3.1. First 13 samples are selected as a training set and the rest samples with random occlusions of black blocks or white blocks are treated as a test set. The block size determines the occlusion level of an image. We use PCA to reduce the dimension of data to 150, and then, classification tests were carried out on the test samples with varying degrees of random occlusions. Figure 10 shows the results.

(a)

(b)

From Figure 10, it is evident that the recognition rates of PCA and LPP are low whether the samples under the occlusions of black blocks or white blocks. It means that they cannot handle the structural noise very well. The proposed SMEDP is superior to some SR-based feature extract methods (SPP and CRP). But the performance of NMRP is very close to SMEDP when the occlusion level is no more than 30%, that is, because they are all based on matrix regression, which can get more discriminative information from corrupted data. As the level of occlusion rises, NMRP no longer has an advantage, and the smoothed matrix multivariate elliptical distribution plays a role in SMEDP. Overall, this experiment confirms SMEDP is more robust than the others with block occlusions.

5. Conclusions

A method called smoothed matrix multivariate elliptical distribution-based projections (SMEDP) is proposed in this paper. Specifically, we use a modified SMED to automatically construct an adjacency graph and then learn an optimal projection matrix in the PCA subspace. Since SMED considers both the global structure and the local structure based on sparse and low-rank representation, SMEDP can get more discriminative information from corrupted data. Experimental results on three databases show that SMEDP achieves considerable visual and recognition performance. Specifically, its robustness to illumination variations and block occlusions.

However, each method has its own strengths and weaknesses. SMEDP separates graph embedding and projection learning into two independent processes, which makes it difficult to learn the optimal projection matrix once the graph embedding is invalid. Furthermore, although SMEDP contains the natural discriminating information, it ignores the label information of the data, which will still affect the discriminative ability of the model. In the future, we will consider to simultaneously perform graph embedding and projection learning in a unified objective function and introduce the class label information in graph construction to extend SMEDP to the supervised scenario. We believe that the improvement of SMEDP will have better performance for pattern recognition task.

Data Availability

The dataset can be accessed upon request (https://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html).

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61906170 and 62106228), Project of the Science and Technology Plan for Zhejiang Province (LGF21F020023, LQ22F020003, and LGF21F020022), Plan Project of Ningbo Municipal Science and Technology (2021Z050 and 2022S002), and Ningbo Natural Science Foundation (202003N4324 and 2021J175).

References

Z. Liu, J. Wang, G. Liu, and L. Zhang, “Discriminative low-rank preserving projection for dimensionality reduction,” Applied Soft Computing, vol. 85, pp. 105768–105811, 2019.
View at: Publisher Site | Google Scholar
A. Abbasi, A. R. Javed, C. Chakraborty, J. Nebhen, W. Zehra, and Z. Jalil, “ElStream: an ensemble learning approach for concept drift detection in dynamic social big data stream learning,” IEEE Access, vol. 9, pp. 66408–66419, 2021.
View at: Publisher Site | Google Scholar
H. Cai, V. W. Zheng, and K. C. C. Chang, “A comprehensive survey of graph embedding: problems, techniques, and applications,” IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 9, pp. 1616–1637, 2018.
View at: Publisher Site | Google Scholar
W. Zhou, J. Yong, and F. He, “A collaborative feature learning method for accurate and robust tracking,” in Proceedings of the 2019 IEEE 23rd International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 221–226, Porto, Portugal, May2019.
View at: Google Scholar
Z. Liu, Xi. Wang, J. Pu, L. Wang, and L. Zhang, “Nonnegative low-rank representation based manifold embedding for semi-supervised learning,” Knowledge-Based Systems, vol. 136, pp. 121–129, 2017.
View at: Publisher Site | Google Scholar
H. Guo, H. Zou, and J. Tan, “Semi-supervised dimensionality reduction via sparse locality preserving projection,” Applied Intelligence, vol. 50, no. 4, pp. 1222–1232, 2020.
View at: Publisher Site | Google Scholar
Y. Liu, R. Zhang, F. Nie, X. Li, and C. Ding, “Supervised dimensionality reduction methods via recursive regression,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 9, pp. 3269–3279, 2020.
View at: Publisher Site | Google Scholar
L. Zhu, Q. Hu, J. Yang, J. Zhang, P. Xu, and N. Ying, “EEG signal classification using manifold learning and matrix-variate Gaussian model,” Computational Intelligence and Neuroscience, vol. 2021, pp. 1–12, 2021.
View at: Publisher Site | Google Scholar
M. Turk and A. Pentland, “Eigenfaces for recognition,” Journal of Cognitive Neuroscience, vol. 3, no. 1, pp. 71–86, 1991.
View at: Publisher Site | Google Scholar
Q. L. Ye, C. X. Zhao, H. F. Zhang, and X. B. Chen, “Recursive concave-convex Fisher linear discriminant with applications to face, handwritten digit and terrain recognition,” Pattern Recognition, vol. 45, no. 1, pp. 54–65, 2012.
View at: Publisher Site | Google Scholar
J. B. Tenenbaum, V. D. Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
View at: Google Scholar
M. Belkin and P. Niyogi, “Laplacian Eigenmaps for dimensionality reduction and data representation,” Neural Computation, vol. 15, no. 6, pp. 1373–1396, 2003.
View at: Google Scholar
S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326, 2000.
View at: Publisher Site | Google Scholar
X. He, D. Cai, S. Yan, and H. Zhang, “Neighborhood preserving embedding,” Neural Computation, vol. 15, no. 6, pp. 1373–1396, 2003.
View at: Google Scholar
X. He, S. Yan, Y. Hu, P. Niyogi, and H. J. Zhang, “Face recognition using Laplacianfaces” 2005,” International Conference on Computer Vision(ICCV), vol. 2, no. 23, pp. 1208–1213, 2005.
View at: Google Scholar
W. Zhang, P. Kang, X. Fang, L. Teng, and N. Han, “Joint sparse representation and locality preserving projection for feature extraction,” International journal of machine learning and cybernetics, vol. 10, no. 7, pp. 1731–1745, 2019.
View at: Publisher Site | Google Scholar
S. Yan, D. Xu, B. Zhang, H. j. Zhang, Q. Yang, and S. Lin, “Graph embedding and extensions: a general framework for dimensionality reduction,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 1, pp. 40–51, 2007.
View at: Publisher Site | Google Scholar
L. Qiao, S. Chen, and X. Tan, “Sparsity preserving projections with applications to face recognition,” Pattern Recognition, vol. 43, no. 1, pp. 331–341, 2010.
View at: Publisher Site | Google Scholar
J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
View at: Publisher Site | Google Scholar
W. Yang, Z. Wang, and C. Sun, “A collaborative representation based projections method for feature extraction,” Pattern Recognition, vol. 48, no. 1, pp. 20–27, 2015.
View at: Publisher Site | Google Scholar
L. Zhang, M. Yang, and X. Feng, “Sparse representation or collaborative representation: which helps face recognition?” in Proceedings of the 2011 International Conference on Computer Vision, pp. 471–478, Barcelona, Spain, November2011.
View at: Google Scholar
J. Yang, L. Luo, J. J. Qian, Y. Tai, F. Zhang, and Y. Xu, “Nuclear norm based matrix regression with applications to face recognition with occlusion and illumination changes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 1, pp. 156–171, 2017.
View at: Publisher Site | Google Scholar
Y. Xu, X. Fang, J. Wu, X. Li, and D. Zhang, “Discriminative transfer subspace learning via low-rank and sparse representation,” IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 850–863, 2016.
View at: Publisher Site | Google Scholar
J. W. Liu, C. L. Peng, and L. X. Lin, “Structured sparse models”,” Chinese Journal of Computers, vol. 40, no. 6, pp. 1309–1337, 2017.
View at: Google Scholar
Z. Zha, X. Yuan, B. Wen, J. Zhou, J. Zhang, and C. Zhu, “A benchmark for sparse coding: when group sparsity meets rank minimization,” IEEE Transactions on Image Processing, vol. 29, pp. 5094–5109, 2020.
View at: Publisher Site | Google Scholar
W. Yang, J. Li, H. Zheng, and R. Y. D. Xu, “A nuclear norm based matrix regression based projections method for feature extraction,” IEEE Access, vol. 6, pp. 7445–7451, 2018.
View at: Publisher Site | Google Scholar
J. J. Qian, L. Luo, J. Yang, F. L. Zhang, and Z. C. Lin, “Robust nuclear norm regularized regression for face recognition with occlusion,” Pattern Recognition, vol. 48, no. 10, pp. 3145–3159, 2015.
View at: Publisher Site | Google Scholar
J. Xie, J. Yang, J. J. Qian, Y. Tai, and H. M. Zhang, “Robust nuclear norm-based matrix regression with applications to robust face recognition,” IEEE Transactions on Image Processing, vol. 26, no. 5, pp. 2286–2295, 2017.
View at: Publisher Site | Google Scholar
G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma, “Robust recovery of subspace structures by low-rank representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 171–184, 2013.
View at: Publisher Site | Google Scholar
X. Fang, N. Han, J. Wu et al., “Approximate low-rank projection learning for feature extraction,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 11, pp. 5228–5241, 2018.
View at: Publisher Site | Google Scholar
G. Liu and S. C. Yan, “Latent low-rank representation for sub- space segmentation and feature extraction,” in Proceedings of the 2011 International Conference on Computer Vision, pp. 1615–1622, Barcelona, Spain, November2011.
View at: Google Scholar
X. Zhang, Z. Tan, H. Sun, Z. Wang, and M. Qin, “Orthogonal low-rank projection learning for robust image feature extraction,” IEEE Transactions on Multimedia, pp. 1–13, 2021.
View at: Google Scholar
H. Qiu, W. Wang, and J. Zheng, “Sparse representation with smoothed matrix multivariate elliptical distribution,” Acta Automatica Sinica, vol. 45, no. 8, pp. 1548–1563, 2019.
View at: Google Scholar
P. Benner, R. C. Li, and N. Truhar, “On the ADI method for Sylvester equations,” Journal of Computational and Applied Mathematics, vol. 233, no. 4, pp. 1035–1045, 2009.
View at: Publisher Site | Google Scholar
H. Chen, L. Wang, and T. Li, “A note on the solvability for generalized Sylvester equations,” Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales. Serie A, Matemáticas, vol. 115, no. 2, pp. 64–67, 2021.
View at: Publisher Site | Google Scholar
B. Li, C. Y. Lu, C. C. Leng, and L. B. Jin, “Robust low rank subspace clustering based on local graph laplace constraint,” Acta Automatica Sinica, vol. 41, no. 11, pp. 1971–1980, 2015.
View at: Google Scholar
A. Hjouji, J. El-Mekkaoui, M. Jourhmane, and B. Bouikhalene, “New set of non-separable orthogonal invariant moments for image recognition,” Journal of Mathematical Imaging and Vision, vol. 62, no. 4, pp. 606–624, 2020.
View at: Publisher Site | Google Scholar
S. A. Angadi and V. C. Kagawade, “A robust face recognition approach through symbolic modeling of polar fft features,” Pattern Recognition, vol. 71, pp. 235–248, 2017.
View at: Publisher Site | Google Scholar
H. Qu, L. Li, Z. Li, and J. Zheng, “Supervised discriminant isomap with maximum margin graph regularization for dimensionality reduction,” Expert Systems with Applications, vol. 180, pp. 115055–115114, 2021.
View at: Publisher Site | Google Scholar
T. N. Le, D. B. Giap, J. W. Wang, and C. C. Wang, “Tensor-compensated color face recognition,” IEEE Transactions on Information Forensics and Security, vol. 16, pp. 3339–3354, 2021.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Hong Qiu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies