Abstract
We propose a new collaborative neighbor representation algorithm for face recognition based on a revised regularized reconstruction error (RRRE), called the two-phase collaborative neighbor representation algorithm (TCNR). Specifically, the RRRE is the division of -norm of reconstruction error of each class into a linear combination of -norm of reconstruction coefficients of each class, which can be used to increase the discrimination information for classification. The algorithm is as follows: in the first phase, the test sample is represented as a linear combination of all the training samples by incorporating the neighbor information into the objective function. In the second phase, we use the classes to represent the test sample and calculate the collaborative neighbor representation coefficients. TCNR not only can preserve locality and similarity information of sparse coding but also can eliminate the side effect on the classification decision of the class that is far from the test sample. Moreover, the rationale and alternative scheme of TCNR are given. The experimental results show that TCNR algorithm achieves better performance than seven previous algorithms.
1. Introduction
As one of the most challenging problems in computer vision and pattern recognition, the face recognition technology has attracted much attention. A number of face recognition methods such as principal component analysis (PCA) [1], linear discriminant analysis (LDA) [2], Eigenfaces [3], Fisherfaces [4], Laplacianfaces [5], locality preserving projection (LPP) [6], and spectral clustering [7] have been extensively studied in recent years.
Recently, the sparse representation based classification (SRC) has been proposed for robust face recognition. The basic idea is that the test sample can be represented as a linear combination of all the training samples with sparsity constraint and then can be classified by exploiting the reconstruction errors. To extend sparse representation to the problems of classification, Huang and Aviyente [8] present a theoretical framework for signal classification with sparse representation, which sparsely codes a signal over a set of redundant bases and classifies the signal based on its coding vector. Because minimizing the -norm is an NP hard problem, many algorithms usually formulate the sparse coding problem as the minimization of -norm of the reconstruction coefficients. For example, Wright et al. [9] use sparse representation for robust face recognition. A test image is first sparsely coded over the template images, and then the classification is performed by checking which class yields the least coding error. Moreover, there are many variations of the SRC. Hui et al. [10] exploit a -nearest neighbor (KNN) method to classify a test sample using sparse representation, which can reduce the computational complexity. Gao et al. [11] propose histogram intersection based KNN method to construct a Laplacian matrix and incorporates the Laplacian matrix into the objective function of sparse coding to preserve the consistence in sparse representation of similar local features. Kang et al. [12] present a kernel sparse representation classification framework and utilize the local binary pattern descriptor in the framework for robust face recognition. Mairal et al. [13] propose a joint dictionary learning and classifier construction framework. Deng et al. [14] propose an extended sparse representation based classifier (ESRC) algorithm and apply an auxiliary intraclass variant dictionary to represent the possible variation between the training and testing images. Gabor features [15] and Markov random fields [16] are also used to further improve the accuracy of SRC. In addition, Ji et al. [17] propose an improved sparse representation classification algorithm based on nonnegative constraint of sparse coefficient. Other nonnegative sparse representation algorithms can be found in [18–20]. Although SRC and its variations significantly improve the robustness of face recognition, they still need to solve an minimization problem on the whole dataset, which makes the computation expensive for large-scale datasets. Yang et al. [21] present a review of iterative shrinkage-threshold based sparse representation methods for robust face recognition. More sparse representations for computer vision and pattern recognition applications can be found in [22].
Zhang et al. [23] analyze the working mechanism of sparse representation based classification (CRC) and indicate that it is the collaborative representation but not the -norm sparsity that makes SRC powerful for face classification. Thus, a number of face recognition algorithms based on collaborative representation have been proposed. For example, Lee et al. [24] present an efficient sparse coding algorithm that is based on iteratively solving regularized least squares problem and an constrained least squares problem, which can significantly enhance the speed of sparse coding. Huang et al. [25] propose a face recognition algorithm based on collaborative image similarity assessment. Moreover, several variants of collaborative representation have been proposed in recent years by adding some additional regularization and/or constraints. Jadoon et al. [26] propose a collaborative neighbor representation algorithm for multiclass classification based on -minimization approach with the assumption of locally linear embedding. Naseem et al. [27] present a linear regression classification (LRC) algorithm by formulating the pattern recognition problem as a linear regression problem. Yang et al. [28] propose a regularized robust coding (RRC) model, which could robust regress a given signal with regularized regression coefficients.
The recently proposed two-phase test sample representation method uses a novel representation based classification algorithm to perform face recognition (TPTSR) [29, 30]. In this method, the first phase represents a test sample as a linear combination of all the training samples and exploits the representation ability of each training sample to determine the nearest neighbors for the test sample. The second phase represents the test sample as a linear combination of all the nearest neighbors and uses the representation results to perform classification. Moreover, He et al. [31] also propose a two-stage sparse representation (TSR) for robust face recognition on a large-scale database. They first learn a robust metric to remove the side effect of the noise and outliers in the images and use the KNN method based on the learnt metric to select a subset of the training samples. In the second stage, they use the nonnegative sparse representation to compute the representation coefficients of the test sample on the subset and use the representation results to perform classification. In addition, a general regularization framework is proposed in [32], which gives a unified view to understand previous sparse methods.
In this paper, a two phase improved collaborative neighbor representation algorithm is proposed. In the first phase, the test sample is represented as a linear combination of all the training samples. By incorporating the neighbor information into the objective function of sparse coding, our algorithm can preserve locality and similarity information of sparse coding. Then, we calculate the revised regularized reconstruction error (RRRE) of each class and use them to determine classes that have the smallest RRRE values among all the classes. In the second phase, we use the classes to represent the test sample again and calculate the collaborative neighbor representation coefficients again. We calculate the RRRE value of each class in the second phase and use them to perform face recognition. The experimental results show that our algorithm is very competitive on the FERET, ORL, and AR databases.
This paper is organized as follows. Section 2 describes the structure of the TCNR algorithm. Section 3 describes the rationale and alternative scheme of TCNR. The experiments results are reported in Section 4. Finally, the conclusions are presented in Section 5.
2. The Proposed Algorithm
In this section we describe the proposed algorithm; we assume that there are classes of training samples , where , and is the number of training samples in each class. is the total number of training samples. If a sample is from the th class , we take as the class label of the sample.
2.1. Two-Phase Collaborative Neighbor Representation Algorithm (TCNR)
In the first phase, if a test sample belongs to one of the labeled classes in the training samples set , then, we use all the training samples to represent the test sample . We assume that the following equation is approximately satisfied: where is the coefficient vector, and is the training samples set. There are usually two methods for solving (1). One is the iterative method such as LSMR [33] and LSQR [34]; the other is directed method by using or ( is a small positive constant and is the identity matrix). In general, iterative methods can obtain a better representation results than directed methods, but they require much more computational complexity and are time consuming. As to directed methods, they can reduce computing complexity, but they may not be better representation results. Therefore, the optimization function is given as follows:
The optimal solution of (2) can be derived (see [27]) in the sense of the smallest reconstruction error by where is the training samples set and and are the regularization parameters. is the coefficient vector and is the representation coefficient of the th training sample. is the th training sample. is the identity matrix. Moreover is the diagonal matrix whose only nonzero diagonal entries represent the distance between the test sample and each training sample.
The CNRC method has less computational complexity than iterative methods and provides a reasonable representation results for the test sample. Therefore, we use (3) to obtain the representation coefficients. After getting the collaborative neighbor representation coefficient vector , we then compute the reconstruction error (RE) of each class as
In order to increase the discrimination information, Jadoon et al. [26] used the regularized reconstruction error (RRE) for classification. Consider
Inspired by a stopping ruler in [35, 36], we propose a revised regularized reconstruction error (RRRE) by using where, is a coefficients vector, whose only nonzero entries are the coefficients in that are corresponding to the th class. is the test sample and is the training samples of th class. is the reconstruction error of the th class. is the regularized reconstruction error of the th class. and are the regularization parameters. is the revised regularized reconstruction error of the th class. The RRRE can be considered as a regularized reconstruction error to increase the discriminate information for face recognition. Section 3 gives a rational interpretation of (7). Furthermore, we exploit the value of each class to identify the classes that have the smaller values among all the classes and denote them as .
In the second phase, we represent the test sample as a linear combination of . Accordingly, we assume that the following equation is approximately satisfied: where are the training samples of determined classes, is the representation coefficients vector, and , is the representation coefficients vector of th class, where is the number of training samples in each class. Therefore, (8) also shows that the training samples of determined classes make their own contributions to represent the test sample, respectively. Then we can solve by using (3). After getting the collaborative neighbor representation coefficient vector , we then calculate the RRRE by using (7) in the second phase.
In general, a lower value of means that the th class has a more important contribution for representing the test sample. Therefore, we use the following to perform face recognition:
If the th class has the lowest RRRE value among all of the classes, then the test sample is classified to the th class.
The main steps of TCNR algorithm can be summarized as follows.
Step 1. Use all the training samples to represent the test sample, and solve (1) by using (3).
Step 2. Calculate the RRRE value of each class using (7), and determine the classes that have the smaller RRRE values among all of the classes.
Step 3. Use the training samples of determined classes to represent the test sample again, and solve (8) by using (3).
Step 4. Calculate the RRRE using (7) in the second phase.
Step 5. Identify the class label of the test sample using (9).
2.2. Alternative Scheme of TCNR (ATCNR)
In this section, we present an alternative scheme of TCNR (ATCNR). ATCNR is also composed of two phases, and its second phase is identical to that of TCNR. The main difference is that the TCNR algorithm determines classes that have the smaller RRRE values among all of the classes in the first-phase, while the ATCNR algorithm determines training samples.
In the first phase of ATCNR, we first use (3) to obtain the collaborative neighbor representation coefficient vector . Then we calculate the RRRE values between the test sample and each training sample. Let denote the training samples. If is the th training sample, then the RRRE between the test sample and the th training samples is where is the th training sample. is the representation coefficient of the training sample , is the test sample, and and are the regularization parameters. is the revised regularized reconstruction error of the th training sample. Then, we select that have the smallest RRRE values among all of the training samples.
In the second phase, we represent the test sample as a linear combination of the determined training samples. Consider where is the th training sample. is the representation coefficient of the training sample and is the test sample. Then we can solve (11) by using (3) and calculate the RRRE value of each training sample by using (10) in the second phase. As well as TCNR, we use (9) to perform face recognition.
The main steps of ATCNR algorithm can be summarized as follows.
Step 1. Use all the training samples to represent the test sample, and solve (1) by using (3).
Step 2. Calculate the RRRE value of each training sample using (10), and determine the training samples that have the smallest RRRE values among all of the training samples.
Step 3. Use the determined training samples to represent the test sample again, and solve (11) by using (3).
Step 4. Calculate the RRRE values of training samples using (7) in the second phase.
Step 5. Identify the class label of the test sample using (9).
3. The Rationale of TCNR
The rationale of our algorithm is as follows.
Firstly, we use all the training samples to represent a test sample and calculate the distances between the test sample and all the training samples. If the test sample is the same class as the training sample, then the distance will be small. It is reasonable to assume that the class close to the test sample has small distance and has its own contribution to minimize the object function of sparse code. So, we incorporate the distance information into the objective function of sparse code, which can preserve locality and similarity information of sparse code. Moreover, as shown in [33, 34], in the least squares algorithm, the (, ) is a good stopping rule to obtain an optimal representation result. Because the values of and are all positive, we can use as the stopping rule. Inspired by this idea, we calculate the RRRE value of each class and use the RRRE to determine the classes that have the smallest values among all of the classes in the first phase. It is assumed that the -norm of reconstruction coefficients can increase the discrimination information for classification [23]. As a result, the RC is a linear combination of the -norm of reconstruction coefficients with the -norm of test sample and training samples, which is better than sole -norm of reconstruction coefficients. If the th class is close to the test sample, then the will be small and the will be big. Therefore, It is reasonable to assume that the RRRE can be used to measure the representation performance and enhances the discrimination information for classification. So, in the first phase, we can preserve locality and similarity information of the class that is close to the test sample, and eliminate the side effect on the classification decision of the class that is far from the test sample. As a result, we may obtain higher accuracy.
Secondly, the second phase of TCNR needs to assign one of the candidate class labels to the test sample, and which is identical to that of the first phase. However, in the first phase of TCNR, we have converted the -class problem to a -class problem . In general, for a classification problem, the more the classes are, the lower the maximum possible classification accuracy is [30].
The following gives two examples of the TCNR algorithm.
Figure 1(a) shows a distribution of RRRE in the first phase. The test sample which comes from the first class in the ORL face database is displayed in Figure 1(b). Figure 2 shows the distribution of RRRE, which is selected by using the five smallest values of RRRE in Figure 1(a). Figure 3 shows the distribution of RRRE in the second phase of Figure 2. Figures 1(a) and 3 show that the first class has the smallest RRRE value among all of the classes. According to (9), we can classify the test sample to the first class. It is a correct classification result.

(a)

(b)


Figure 4(a) shows the other distribution of RRRE in the first phase. Figure 4(b) is the test sample which comes from the 36th class in the ORL face database. Figure 5 shows the distribution of RRRE, which is selected by using the five smallest values of RRRE in Figure 4(a). Figure 6 shows the distribution of RRRE in the second phase of Figure 5. Figure 4(a) shows that the 5th class has the smallest RRRE value among all of the classes. However, the test sample comes from the 36th class. It is a wrong classification result. Figure 6 shows that the 36th class has the smallest RRRE value among all of the classes. Therefore, we classify the test sample to the 36th class. It corrects a wrong classification result in the first phase.

(a)

(b)


4. Experimental Results
In order to test the efficiency of our algorithm presented above, we perform a series of experiments using the FERET [35], ORL [36], and AR [37] face databases. Sample images from the three face databases are displayed in Figures 7, 8, and 9.



If samples of the samples per class are used for training and the remaining samples are used for testing, there are possible combinations. The FERET face database contains 1400 images of 200 persons (7 images per person). The resolution of FERET images is 40 × 40. We use four images of each person as the training samples, and there are 35 training and test sample sets. The ORL face database contains 400 images of 40 persons (10 images per person). The resolution of ORL images is 46 × 56. We use six images of each person as the training samples, and there are 210 training and test sample sets. The AR face database contains 3120 images of 120 persons (26 images per person). The resolution of AR images is 40 × 50. As the AR face database contains too many samples of each person, we select the first 13 images of each person as the training sample set and the rest as the test sample set.
Our algorithms include First-phase improved collaborative neighbor representation classification algorithm (FCNR), two-phase improved collaborative neighbor representation classification algorithm (TCNR), and alternative scheme of TCNR (ATCNR). Moreover, we extend CNRC into two-phase face recognition algorithm, namely, two-phase collaborative neighbor representation classification (ECNRC) and alternative scheme of ECNRC (AECNRC).
4.1. Parameter Discussion
In TCNR algorithm, there are four parameters. We set and , which are the same as in [26, 29]. In order to set the parameters of and , we implement the TCNR algorithm on the ORL face database. We use five images of each person as the training samples, and there are 252 training and test sample sets. The values of and are chosen as 0.0001, 0.001, 0.01, 0.1, 0, 1, 10, and 100. Then there are 64 parameters sets. We perform the experiment with various sets to determine the values of and on the TCNR algorithm. Table 1 shows the average classification errors rate (%) of the TCNR algorithm versus different values of and on the ORL face database.
Table 1 shows that the performance of the TCNR algorithm is not stable and has ups and downs. But when , the TCNR algorithm can achieve the best performance. Therefore, we set in all the experiments on our algorithms. Moreover, when implementing the ECNRC and TCNR algorithms, is the number of determined classes in the first phase. As to the AECNRC and ATCNR algorithms, if we use images of each person as the training samples, then equals by .
4.2. Performance Comparison
To show the superior performance of our algorithm, we use seven state-of-the-art algorithms for comparison. They are the CNRC [26], TPTSR [29], SRC [9], LRC [27], CRC [23], RRC [28], and TSR [31].
Table 2 shows the average classification errors rate of the FCNRC, CNRC, SRC, LRC, CRC, and RRC algorithms on the FERET, ORL, and AR face databases.
When all the training samples are selected in the first phase, the TCNR algorithm is equal to the FCNR algorithm. Table 2 shows that the average classification errors rates of the FCNR algorithm are lower than those of the CNRC, SRC, CRC, and RRC on the FERET, ORL, and AR face databases, except that there is a slight higher 0.13% than the CNRC algorithm on the ORL face database. The average classification errors rates of the FCNR algorithm are higher than those of the LRC algorithm on the ORL and FERET face databases. However, the classification error of FCNR algorithm is lower 10.77% than the LRC algorithm on the AR face database.
For the FERET face database, when varies from 10 to 200 ( varies from 40 to 800), different average classification errors rates can be obtained. Figure 10 shows the average classification errors rates of the TCNR, ATCNR, ECNRC, AECNRC, TPTSR, and TSR on the FERET face database.

For the ORL face database, when varies from 5 to 40 ( varies from 30 to 240), different average classification errors rates can be obtained. Figure 11 shows the average classification errors rates of the TCNR, ATCNR, ECNRC, AECNRC, TPTSR, and TSR on the ORL face database.

For the AR face database, when varies from 10 to 120 ( varies from 130 to 1560), different average classification errors rates can be obtained. Figure 12 shows the average classification errors rates of the TCNR, ATCNR, ECNRC, AECNRC, TPTSR, and TSR on the AR face database.

Figures 10, 11, and 12 show that the average classification errors rates of our algorithms, including TCNR, ATCNR, ECNRC, and AECNRC, are lower than those of TPTSR, SRC (l1-ls), CNRC, LRC, CRC, and RRC-L2 in most cases on the FERET, ORL, and AR face databases. Moreover, the average classification errors rates of TCNR algorithm are lower than those of TSR in most cases on the FERET, ORL, and AR face databases. Furthermore, it can be observed that there is a sharp increase in the average classification errors rates of the TPTSR algorithm as the increasing of . However, the experimental results show that there is a slight increase in the average classification errors rates of TCNR and TSR with the increasing of . It demonstrates that TCNR and TSR are less sensitive to the variances of .
Moreover, the average classification errors rates of the TCNR algorithm are generally lower than those of the ATCNR algorithm on the three face databases. Though both the TCNR and ATCNR algorithms adopt the two-phase face recognition method and use the RRRE information, the difference is that the TCNR algorithm uses the RRRE to determine the classes of training samples, but the ATCNR algorithm uses the RRRE information to determine training samples. This observation shows that it is a better way to select more important contribution classes for face recognition algorithm.
Furthermore, we observe that the average classification errors rates of the TCNR algorithm are always lower than those of the ECNRC algorithm on the three face databases. Though both the TCNR and ECNRC algorithms adopt the two-phase face recognition method and select the classes of training samples, the experimental results show that the RRRE has better discrimination ability for classification than the RRE.
Most important of all, the experimental results show that if we use suitable classes training samples to represent the test sample, the average classification errors rates of the TCNR algorithm can achieve 24.16%, 2.94%, and 21.73% on the FERET, ORL, and AR face databases, respectively.
There are three main reasons for the above results. Firstly, we incorporate the distance information into the objective function of sparse code and can preserve locality and similarity information of sparse code. It can be used to improve the performance of the sparse coding and increase the classification performance. Secondly, the RRRE is the division of RE to RC. Moreover, the -norm of reconstruction coefficients can increase the discrimination information for classification [23]. As a result, the RC is a linear combination of the -norm of reconstruction coefficients with the -norm of test sample and training samples and can increase the discrimination information much better than solely -norm of reconstruction coefficients. If the th class is close to the test sample, then the will be small and the will be large. Thus, the RRRE can be used to measure the representation performance and enhance the discrimination information for classification. Thirdly, our algorithm utilizes the two-phase representation method, which can eliminate the side effect on the classification decision of the class that is far from the test sample. Moreover, we have converted the -class problem to a -class problem for a classification problem, the more the classes are, the lower the maximum possible classification accuracy is.
In addition, TCNR is an extension of TPTSR, which is a supervised sparse representation method. Therefore, TCNR has all the advantages of the SRC algorithms and achieves a better performance than the CNRC, TPTSR, SRC, LRC, CRC, RRC, and TSR in most cases on the FERET, ORL, and AR face databases.
5. Conclusion
This paper proposes an improved collaborative neighbor representation based two-phase face recognition algorithm. A revised regularized reconstruction error (RRRE) is proposed for increasing the discrimination information. Our algorithm can preserve locality and similarity information of sparse code. Three standard face databases including the FERET, ORL, and AR are selected to evaluate our algorithm. Experimental results demonstrate that our algorithm is very competitive. We also show that TCNR is less sensitive to the variances of . Furthermore, our algorithm is based on the -norms, and it is computationally more efficient than the naïve -norm based sparse representation method.
Acknowledgment
This paper is partially supported by the Shenzhen Municipal Science and Technology Innovation Council (nos. JC201005260122A, JCYJ20120613153352732, and CXZZ20120613141657279).