Abstract

In this paper, we propose a linear discriminative learning model called adaptive locality-based weighted collaborative representation (ALWCR) that formulates the image classification task as an optimization problem to reduce the reconstruction error between the query sample and its computed linear representation. The optimal linear representation for a query image is obtained by using the weighted regularized linear regression approach which incorporates intrinsic locality structure and feature variance between data into representation. The resultant representation increases the discrimination ability for correct classification. The proposed ALWCR method can be considered an extension of the collaborative representation- (CR-) based classification approach which is an alternative to the sparse representation- (SR-) based classification method. ALWCR improved the discriminant ability for classification as compared with CR original formulation and overcomes the limitations that arose due to a small training sample size and low feature dimension. Experimental results obtained using various feature dimensions on well-known publicly available face and digit datasets have verified the competitiveness of the proposed method against competing image classification methods.

1. Introduction

The variations in face images such as illumination change, random noise, orientation, and different facial expressions make the face recognition (FR) difficult and challenging. Recognition in the original image space degrades the accuracy and leads to computational complexity due to the curse of dimensionality. Different approaches such as eigenface [1], fisherface, and nearest subspace [2] have been devised to reduce the dimensionality and perform the classification in reduced feature space. In these methods, it is assumed that the same class of faces can be linearly mapped to a low-dimensional subspace [3]. However, the illumination change, noise, shadows, orientation, and depth of the face images often violate this subspace assumption. Conventional methods like principal component analysis (PCA) fail to achieve the desired results due to a gross error of arbitrary magnitude exhibited in real-world face images [4]. In recent years, pattern classification based on a sparse representation (SR) has attracted interest due to the superior results of sparse signal coding in the field of image reconstruction [3, 5]. Generally, the sparse representation is computed by representing the query sample as the linear combination of bases (involving sparse samples) over training dictionary, by employing sparsity constraint computed by minimization. It has been claimed that the robustness and accuracy for the classification is considerably improved due to sparsity constraint for computing representation [3, 5].

An image-based general classification algorithm called the sparse representation for classification (SRC) is proposed by Wright et al. [3]. The sparsest linear representation is recovered via minimization. The resulting representation automatically discriminates among various classes by using minimum reconstruction error. The robustness of SRC to change in illumination, occlusion, and noise has diverted the research towards image classification in the sparse domain. However, several recent studies [6, 7] questioned the role of SR in robust classification, and the working mechanism is not fully explained yet. Most of the sparse representation techniques focus on the sparsity constraints of sample representation computed via minimization and their aim is to improve the sparsest representation for the query sample. Weighted sparse representation-based classification (WSRC), an extension of SRC is proposed by Lu et al. [8] to improve discriminative ability for classification. The method overcomes the limitations that arise due to the linearity structure in low-dimensional subspace in traditional SRC formulation by employing data locality into the sparse representation. Gao et al. [9] have used locality preserving projection for linear approximation of high-dimensional data residing on a manifold and then used the sparse representation for hyperspectral image classification, to address the imbalance between high-dimensionality and limited training samples. A feature-based structural measure (FSM) is proposed by Shnain et al. [10]. The measure estimates a reliable similarity between similar human face images and discrimination among dissimilar images by incorporating statistical structure and edges as structural features. Iqbal et al. [11] have proposed a collaborative representation-based classification method for image classification. The method utilizes virtual samples and computed representation on local image patches to address the limited sample face recognition problem. Jiang et al. [12] have introduced a low-dimensional data representation model by integrating LLE and PCA with a closed-form solution. The resulting representation incorporates both local neighborhood structure and global structure and is robust to outliers. An improved principal component analysis (IPCA) method for face recognition was proposed by Zhu et al. [13]. The approach extracts useful features from the original feature space (original face images) through dimensionality reduction and then employed the least square regression approach for classification by considering the minimum reconstruction error. The experimental results show promising results compared with some state-of-the-art methods.

Many recent studies have questioned the role of minimization-based sparsity constraint in the classification and showed that the data generally used in various SR-based classification methods are unable to hold the sparsity assumption presumed [7, 14]. Zhang et al.[7] have argued the crucial role of representing the query sample as the linear combination by using all samples over the training dictionary hence called collaborative representation (CR). A simple minimization-based least square CR representation achieves competitive classification with less computational complexities as compared with sparse representation approaches. Solving the SR problem to learn large representation is considered computationally expensive with unknown algebraic solutions [6].

In dimensionality reduction, classification and clustering the proximity relationship (intrinsic locality structure) among data elements is important [3, 15]. The existing CR methods encode a new sample as a linear combination of previously learned features irrespective of data point proximity relationship which leads to poor classification [16].

In this work, a collaborative representation method is proposed to formulate the image classification problem to compute optimal linear representation (computed via weighted variant of minimization problem which having a closed form of algebraic solution) against the query image. The resulting representation scheme incorporates proximity relationship and distinctive data features into linear representation.

Linear regression-based classification techniques have gained a lot of interest in recent years. The classification problem is formulated to find the coefficients that minimize the reconstruction error between a probe image and its linear reconstruction from an over-complete basis.

For a classification problem, suppose there are n number of samples chosen from known classes such that the matrix is formed by using training samples from the kth class where represents the column vector formed from the jth sample (image) of dimension m. The training matrix is formed by concatenating samples from C known classes having m number of rows and n columns where shows the number of samples in training dictionary.

The nearest subspace (NS) [17] classifier looks for the minimal distance between sample Y and its principal projection. The assignment of the sample to the class is performed on the basis of minimal reconstruction error between Y and subspace spanned by data points of class , that is,where is a coefficient vector which is estimated using the least square regression method, that is,

In the sparse representation-based classification (SRC) [3] technique, which is a generalization of the NS method, each probe image is represented as the linear combination of all the training samples from an over-complete dictionary resulting in an ill-condition inverse problem. The solution is obtained by harnessing the sparsity of coefficient’s vector. That is, only a small number of solution vector elements have nonzero values. The resulting sparse representation (SR) scheme has shown very interesting robust face recognition performance.

The concept of collaborative representation (CR) has been introduced more recently in various studies [7, 1719]. The role of sparse representation in robust face recognition and pattern classification has raised many questions and there is still no clear explanation why sparsity constraints on coefficient vector produce robust face recognition results. However, an over-complete dictionary leads to infinite representations for the query sample. Since the query sample representation is formed from the collaboration of all samples in the dictionary, hence, the representation is referred as collaborative representations (CRs). Furthermore, it has been shown that the adoption of CRs has been more crucial for robust classification than the sparsity constraints using -regularization on coefficients. Adoption of CRs based on regularized least square (-regularization) to compute representation will lead to similar performance with much lower computational complexity in the classification.

In collaborative representation- (CR-) based classification [7, 14, 17] framework, the representation of a query sample Y is obtained by coding it collaboratively over the dictionary formed by using all training samples. The classification is considered as an optimization problem that aims to reduce the reconstruction cost between a test sample and the corresponding representation. The sparse representation-based classification (SRC) [3] approach is one example of CR-based classifiers. SRC assumed that the sparsest representation of the query image Y over fixed sized training dataset A will correctly classify the query sample Y. The sparse representation of A is obtained by minimization:where represent the sparsity constraints of coefficient vector x while λ is a small positive constant used to control the sparsity level. The resulting optimization problem is solved by using linear programming with unknown algebraic solution. Computing the sparse representation x, using equation (2), then the SRC will assign the query image to the class having the minimum reconstruction error among all classes:where is an n-dimensional vector. The nonzero coefficients of corresponds to the training class. The classification is performed subject to the following condition:

Liu et al. [8] have pointed out the performance degradation of SRC in a lower dimensional feature subspace is caused by limited discriminative information due to the linearity structure of data. To overcome this limitation, they proposed the weighted sparse representation-based classification (WSRC) method to integrate data locality and sparsity into formulation, that is,where W is a block diagonal matrix that is used to characterize the similarity structure between test sample and training data and is calculated as follows:where , s is the locality parameter. After obtaining more discriminate sparse code of the test sample representation, the recognition is performed using minimum reconstruction error.

Another approach for CR is to utilize the regularized least squares (CRC-RLS) [7] method. A query sample is encoded from all the training samples of dictionary A using minimization. CRC handles the classification as an optimization problem and reduces the reconstruction cost between query image Y and its estimated representation. The coding model of CRC irrespective of outlier robustness is given by:

Equation (7) has the following closed-form solution:where λ is the regulatory parameter and represents the identity matrix. The classification decision is based on the minimum regularized reconstruction error computed as follows:

The classification decision is taken as in equation (4).

For classification, the CRC performs best, in terms of computational efficiency, compared to SR-based approaches, see [14] for detail. It is the collaborative representation (CR) which makes CRC robust to classification and enhances its discrimination power. In the classification decision of CR, the minimum distance between collaborative components and the projection within the class is considered:where represents the perpendicular projection of the query sample on subspace span by training dictionary. In query sample classification decision, first CR checks the angle between the perpendicular projection and the principal signal (i.e., representation using coefficients from the correct class). This angle should be smaller among all other classes. Secondly, CR also checks the angle between the principal signal and the rest of the classes (other than correct class) should be larger. Equation (10) is the actual amount that plays an important role in classification. This double checking mechanism makes CR very effective and robust for classification problems.

3. Proposed Methodology

It has been observed in various research works [15, 20, 21] that the locality structure among data elements plays an important role for classification. In low-dimensional subspace, a reconstructed data point and its nearest neighbors for the nonlinear manifold will preserve the local geometry. Hence, the robust classification of observations (data points) can be achieved by modeling the nonlinear geometry of the manifolds. Furthermore, the discriminate representation can be enhanced by exploiting the distinctiveness of various features among data elements for accurate classification [15].

Motivated by the aforementioned observations, this paper integrates locality structures and feature distinctiveness among data points to compute linear representation. The aim is to learn discriminant representation by exploiting the distinctiveness of various features to reduce variations among data elements of similar class and using locality information to distinguish data elements between dissimilar classes. We propose an image classification method called adaptive locality-based weighted collaborative representation (ALWCR) with the ability to reconstruct the query image from optimal neighbors to further discriminate the classification. The proposed model is formulated as a unified weighted least square regularization:where represent the samples of the training dictionary A; , represent the locality structure similarities based on Euclidean distance between query Y and the sample in dictionary A; and is a diagonal matrix which describes the weight calculated for the feature () to mitigate the importance in representation. Intuitively, the solution coefficients for the computed representation vary if is dissimilar to other features and suppress the coefficients of similar features; λ is a small positive constant and act as a regularization parameter.

The proposed problem formulation, i.e., equation (11), has an algebraic solution using fixed values [22]. If there are sufficient training samples available, then the matrix is precalculated using variance among training samples. The coefficients of solution vector represent the contribution of samples chosen from the training dictionary during reconstruction of the query sample. The higher coefficient values represent the neighboring samples which span the same linear subspace as span by query sample Y and mitigate the importance of feature dimension. Intuitively, the first term is used to reduce the variance among data element of similar classes, and the weight matrix () is introduced for variance estimation. These weights are prelearned from training data by using the weighted least square regression (WLS) approach. For correct noise estimation, each training class should have enough samples to model various variations such as pose, illumination, and outliers among query samples. Therefore, if the training dictionary covers enough variations among testing and training samples, then noise is correctly estimated using weights which improves discrimination for classification. The second term penalizes those training samples which lie on different subspaces than the test sample. Therefore, training sample contribution in the linear representation of test samples (the training samples that lie on different subspaces as compared with the test sample) is reduced, which decreases the overall reconstruction error and improves classification results. In term of training samples, the test sample Y can be linearly represented as , where ξ represents the noise term. According to weighted least square (WSL) regression [23] for noise estimation,where represents the diagonal matrix that is proportional to covariance matrix ξ. The algebraic solution of equation (12) is given as

An iterative procedure [24] can be used to estimate . First, a least square optimization is used to compute the residual while (in (13)) can be initialized as a diagonal matrix to ,

In the second iteration, the coefficients are estimated using equation (13), then the weighted least square residual is computed, and the weight matrix is updated to :

Similarly, the next iteration leads towhere represents the weighted least square residual at the 3rd iteration of the procedure and is the coefficient vector computed using equation (13) by substituting . Finally, represents the updated weight matrix at the end of the current iteration. Note that will converge after some iterations.

3.1. Optimization Algorithm

First, the weights are learned, with respect to query sample Y, using (13)–(16). The algebraic solution of equation (11) can be determined by its partial differentiation w.r.t x and equating to zero:where

The term in equations (11) and (17) represents a coefficient vector having positive values. A reasonable solution of equation (17) can be estimated by replacing coefficients with the corresponding solution vector () elements aswhere is a diagonal matrix which represents the similarities between the test sample and the corresponding training samples drawn from dictionary A. To maintain an effective similarity between the query sample and training samples, the matrix Q can be normalized as . Thus, the matrix Q mitigates the neighborhood contributions in the solution space. Note that the diagonal matrix modulates the feature distinctiveness into solution.

The resultant representation vector expresses a test sample Y as the linear combination of optimal neighbors that spanned in the subspace of Y. The computed representation encodes the distinctive features and improves the final image classification. The proposed method performs the classification on the basis of regularized residual error of each class. The coding error for the class and query image Y is computed aswhere is the coefficient vector, whose nonzero coefficient is chosen from which corresponds to the class. The final assignment of a query sample to a class is given via

Figure 1 shows the working of the proposed solution for classification of a query sample, and Algorithm 1 summarizes the proposed method.

Input: A-unit -normalized data matrix; unit -normalized query samples matrix and , regularity parameter
Output: return labels of the query samples
 for to n do
  Use equation (19) and determine the similarity matrix Q; estimate weighted matrix
  Compute coefficients using equation (20), to estimate the residual and update as
  Update and solve the objective function in equation (11) to obtain linear representation ; compute coding error against each class using equation (21); assign the class label to query sample according to equation (22)
 End.

4. Experimental Results

Two real-world face databases of AR [25] and ORL [24] are used in experimentation to assess the proposed ALWCR method for face recognition. Both databases contain images of different poses, expressions, and illuminations. Further experiments are conducted on MNIST handwritten digit database [26] to test the effectiveness of the proposed method for image classification in general.

The conventional dimensionality reduction techniques, eigenfaces [1] and randomfaces [7], are used to map the original image space into various low-dimensional feature spaces. The performance of the proposed method is compared against several well-known classification algorithms, i.e., nearest neighbor (NN) [27] (as the baseline method), nearest subspace (NS) [2], collaborative representation-based classification (CRC) [7], sparse representation-based classification (SRC), and a newly proposed sparse representation-based method of WSRC [8].

The CVX [28] package is used to implement SRC and WSRC to solve minimization problem. In case of NN classifier [27], the neighborhood size is set to 1. For WSRC [8], the regularization parameter λ is set to and , respectively. The regularization parameter λ for CRC and the proposed ALWCR method is set to for all experiments unless specified.

4.1. Face Representation and Recognition
4.1.1. ORL Database

The ORL database contains 400 images of 40 different subjects and each subject has 10 images. The variations in images include pose changes and expression changes like smiling, open and close eyes, and facial details that include wearing glasses or without glasses. We have cropped and resized all images to a resolution of size for this experiment, and the intensity values of all images are scaled to unit length, i.e., -norm. The well-known two-set experimental protocol is followed for the given experimentation. A subset with images per class is randomly chosen to be a training set, while rest of the database is treated as a test set. Figure 2 shows few train and test samples from the ORL database. Figure 3 demonstrates the recognition results of the proposed method and other state-of-the-art methods for different dimensions generated by eigen and random projections.

The recognition rates on the ORL 5 training set versus dimensions are shown in Figures 3(a) and 3(b) using eigen and random projections, respectively. In the case of eigen projections (Figure 3(a)), the proposed ALWCR outperforms NN, NS, and original CRC methods, while it performs better than WSRC for many feature dimensions. For a random feature dimension (Figure 3(b)), the performance of the proposed ALWCR is far better as compared to WSRC, CRC, and NS. Figures 3(c) and 3(d) demonstrate the recognition results for the ORL 4-train dataset. It can be observed that ALWCR beats the other state-of-the-art methods in the eigen feature space (Figure 3(c)), while for random feature dimensions, its performance is comparable to the NS method.

The performance of linear representation techniques such as CRC and SRC is not dependent to the choice of an “optimal” feature selection used for dimension reduction. Simply a random feature transformation should lead to comparable performance as any other carefully engineered features such as eigen or fisher transformation provided sufficiently large number of linear measurements. However, in low-dimensional feature space, the discrimination information from linearity structure to date is not enough to correctly classify the data and severely affects the performance of linear representation approaches. The results from Figure 4 show that the ALWCR method outperforms competing methods including original CRC and WSRC approach across different feature dimensions using both eigen and random features. The performance improvement becomes more significant in low feature dimension because the linearity structure alone is not enough to correctly separate data among different classes, and encoding data locality and feature variance into representation help improves the discriminative power for classification. Furthermore, these results demonstrate the consistent performance of the proposed method across various feature dimensions compared to that of competing methods, especially WSRC and CRC.

4.1.2. AR Database

There are 4000 color images of 126 individual subjects in the AR database. These frontal view face images have wide variations in term of occlusions, facial expression, and illumination. We have used a subset of the AR database with similar settings as reported in [3, 7]. The subset contains 100 individual subjects (50 male and 50 female subjects) having 14 images per subject taken in two sessions. The sessions were separated over two weeks having 7 images containing different illuminations and expressions and are resized to .

In session 1, seven images are selected for each subject as a training set, while the other seven images are used as a testing set from session 2. Figure 4 demonstrates some instances of sample images taken from training and testing sets. Tables 1 and 2 show the recognition results for the AR dataset against different feature dimensions obtained by eigen and random projections, respectively.

It is clear from the results (Tables 1 and 2) that the proposed ALWCR method outperforms other methods and specially WSRC in random feature space. In a nutshell, the proposed ALWCR is stable across different dimensions and perfectly handles the limited train samples compared with other mentioned methods.

The proposed method is able to handle the limitations of original CRC which faces the recognition degradation due to the small sample size. Furthermore, the proposed method is computationally efficient than the SR-based methods due to its closed form solution.

4.2. Regularization Parameters and Data Dimensionality

To investigate how the regulatory parameter versus data dimensionality affects the performance of various methods, we used the AR database with the same settings as mentioned above. The images were cropped to pixel size. Training and testing data are -normalized prior to applying all considered methods. We use eigen and random projections to obtain low- and high-dimensional embeddings to compare CRC, SRC, and the proposed ALWCR methods in various feature dimensional spaces. For the proposed ALWCR method, we estimated the covariance matrix from the training data by using (14) for the first iteration, and then estimating using (20), we used three iterations in all experiments. To solve the singularity issue in the training sample matrix A, we used the singular value decomposition (SVD) method to compute matrix inverse, i.e., (14) (for estimate ) without specifying parameter λ.

From Figure 5, it can be seen that performance of the CRC method, i.e., -regularization in low-dimensional feature space is inferior than SRC, i.e., -regularization, which is much more effective in low-dimensional feature space due to the sparsity constraints on coefficients that improve the discrimination in classification [1]. CRC performance improves as the feature dimension increases, and in high-dimensional feature space, CRC outperforms SRC by including data variance and data locality structure into -regularization formulation, i.e., the proposed ALWCR method improves the results over those of the flat formulation, i.e., CRC in low-dimensional feature space.

The proposed approach outperforms CRC and SRC methods in term of classification accuracy in low-dimensional feature space and demonstrates competing results as compared with CRC and better than SRC in high-dimensional feature space. The effect of choosing a regulatory parameter has less influence on ALWCR compared to other methods on recognition performance. Figure 6 provides an overview of classification accuracy versus different feature dimensions on the AR dataset for different classifiers. The regulatory parameter is fixed to for all methods except NS. It can be seen that ALWCR outperforms all other considered methods in low-dimensional feature space and have competitive classification accuracy in high-dimensional feature space as well.

4.3. Digit Classification

To evaluate the proposed approach for general pattern classification performance against the several well-known methods, i.e., NS, CRC, and SRC, we have used MNIST handwritten digit dataset [27]. The training set consists of 60,000 samples of handwritten digits (from 0 to 9) while the testing set consists of 10,000 samples, where each sample is represented as an 8-bit gray scale image of size . We have chosen the same experimental setting as described in detail previously [14].

The training set was composed by randomly selecting 50 samples from each class to form a training dictionary of 500 samples. The testing set is formed by selecting 700 samples randomly from the original test set. Eigen and random projections are used to obtain various feature space dimensions. Figure 7 shows the classification results on the MNIST dataset against various feature dimensions computed using eigen and random embedding.

It can be seen from Figure 7 that the proposed ALWCR method outperforms all other considered methods in various feature space dimensions (see Table 3 for detail comparison of classification results on various feature dimensions). The ALWCR method generally outperforms the CRC method but has increased time complexity than CRC due to estimating variance among samples and incorporating locality structure. However, ALWCR is significantly faster than the SRC method and outperforms SRC in terms of speed and accuracy. See Table 4 for running time comparison of SRC and ALWCR.

5. Conclusions

In this paper, an adaptive locality-based weighted collaborative representation (ALWCR) is presented. The proposed (ALWCR) method effectively encodes the local geometry and feature variances among data elements and performs robust classification. It is an extension of original CRC formulation which significantly improves classification discrimination power. Performance of the proposed method is evaluated against standard benchmarks and state-of-the-art methods. Experimental results confirm that the proposed ALWCR is stable across different feature dimensions for different datasets and outperforms other methods in terms of recognition and computational cost.

Data Availability

The data used to support this study will be made available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

We are thankful to COMSATS University, Islamabad (Abbottabad Campus), and the School of Information Science and Technology, USTC Hefei, China, which fully supported us by providing all key resources during the implementation and all afterward phases of this project. We would also like to personally thank Prof. WuYang Zhou and Dr. Waqas Jadoon for their continuous encouragement and massive support both academically and socially during this project. This work was sponsored by the key program of National Natural Science Foundation of China (Grant no. 61631018).