Abstract

The matrix-based features can provide valid and interpretable information for matrix-based data such as image. Matrix-based kernel principal component analysis (MKPCA) is a way for extracting matrix-based features. The extracted matrix-based feature is useful to both dimension reduction and spatial statistics analysis for an image. In contrast, the efficiency of MKPCA is highly restricted by the dimension of the given matrix data and the size of the training set. In this paper, an incremental method to extract features of a matrix-based dataset is proposed. The method is methodologically consistent with MKPCA and can improve efficiency through incrementally selecting the proper projection matrix of the MKPCA by rotating the current subspace. The performance of the proposed method is evaluated by performing several experiments on both point and image datasets.

1. Introduction

Subspace analysis is helpful for a network in computer vision [1, 2] and data modeling problems [3] and social network [4]. AlexNet [5] is a pioneer in using principal component analysis (PCA), a basic subspace analysis method, to help complex networks improve their performance. It performs PCA on the set of RGB pixel values throughout the ImageNet training set to reduce overfitting on image data by data augmentation. PCANet [6] employs PCA to learn multistage filter banks in its architecture. The PCA makes the architecture be extremely easily and efficiently designed and learned. The PCA based analysis method can be flexibly applied to designing the architecture of neural networks [79]. In complex networks, the training image, convolution induced feature maps, and some channel splicing features can be considered as matrix-based data. Improving the matrix-based subspace method and studying the relationship between matrix-based PCA and the basic vector-based one may provide a reference for networks.

The regular PCA is a linear method in the sense that it only utilizes the first-order and second-order statistics. Therefore, its modeling capability is limited when confronted with highly nonlinear data structures. To enhance its modeling capability, the vector-based kernel PCA (KPCA) [10, 11] is proposed. It improves the performance of PCA in modeling by nonlinearly mapping the data from the original space to a very high-dimensional feature space, the so-called reproducing kernel Hilbert space (RKHS). The nonlinear mapping enables an implicit characterization of high-order statistics and can be flexibly combined with other learning methods [1214]. The key idea of kernel methods is to avoid the explicit knowledge of the mapping function by evaluating the dot product in the feature space using a kernel function. The vector-based KPCA takes vectors as input. The matrix data, such as a two-dimension image, is vectorized into a vector before being fed into kernel methods. Such vectorization ruins the spatial structure of the pixels that define the two-dimensional image. To bring back the matrix structure, matrix-based KPCA [15] is proposed by combining two-dimensional PCA (2DPCA) [16] with the kernel approach. The matrix-based KPCA can generalize the vector-based KPCA and can provide richer representations than vector-based KPCA.

The advantage of the matrix-based KPCA methods is that they enable us to study the spatial statistics in the matrix. However, the size of Gram matrix of matrix-based KPCA is much bigger than that of vector-based KPCA, which means that the batch problem of the matrix-based KPCA is much more serious than that of vector-based KPCA. For matrix-based KPCA, the Gram matrix has to be provided before the eigendecomposition process can be conducted. For extra data added, additional new rows and columns are required for a new Gram matrix, and the eigendecomposition has to be performed for the size grown matrix. In addition, the principal component vectors must be supported by all the matrices in input dataset. This induces a high cost for storage resources and computing workloads during conducting applications with large datasets and input matrix with big size.

In the last decade, several strategies and approaches have been proposed to improve the batch nature for the vector-based KPCA, such as the methods that kernelize the generalized Hebbian algorithm [17, 18], the methods that compute the KPC incrementally [1921], the greedy KPCA method [22, 23], and the adaptive KPCA method [24]. In recent years, the KPCA based method still has great vitality [2527], while the methods that counteract the huge batch nature of matrix-based KPCA are rare. The matrix-based KPCA can be speeded up using the idea proposed in improved KPCA [28]. However, this does not imply that the matrix-based KPCA computation is solved, since the size of eigendecomposition still depends on the size of the datasets. Consequently, an approach that adapts to the batch nature of matrix data with efficient computations is required.

This paper proposes an incremental matrix-based KPCA (IMKPCA) method to approximating the traditional one with less computation time and memory usage for extracting kernel principal vectors under a certain accuracy. The contributions of this paper are in three points:(1)The proposed method is implemented through the total scatter matrix to avoid directly operating on the Gram matrix of matrix-based KPCA. That is what inspires us the idea of the proposed IMKPCA.(2)The basis of our solution is incrementally adjusting the current eigenvector matrix to keep the total scatter of the matrix dataset. This work can be completed by decomposing the added matrix data into a component parallel to the previous eigenvector matrix and a component orthogonal to that. Meanwhile, the standard method computes the scatter matrix directly.(3)The proposed matrix-based feature can be used to study the spatial statistics in the matrix under an acceptable computational complexity and memory usage. However, the vector-based one cannot.

The rest of this paper is organized as follows. The preliminaries of matrix-based KPCA are briefly introduced in Section 2. Then, an incremental matrix-based kernel subspace method is presented in Section 3, followed by the experiment results shown in Section 4. Finally, the conclusion is presented in Section 5.

2. Preliminaries

and mean horizontal and vertical concatenations, respectively. The expressions of matrix and matrix dataset can be simplified by using and as in [15].

For a matrix , where and are the numbers of rows and columns, respectively, we define its kernelized matrix as , where is the -th column of and is a nonlinear mapping from to .

The dot product matrix of two kernelized matrices and is defined aswhere .

We begin by obtaining a matrix dataset: block matrix ; the idea of matrix-based KPCA is performing two-dimensional PCA [16] in feature space, where is the -th matrix data in the dataset.

Assuming that the dataset is kernelized to and that it is centered in feature space (we shall return to this point later), its total scatter matrix is estimated by

Given that the mapping function is implicit, the KPC cannot be computed by performing eigendecomposition on directly. The reason is that the nonlinear feature space’s cardinality is extremely large, and it usually makes the matrix rank deficient. The matrix-based KPCA circumvents the KPC by following the standard method in [15] and an eigendecomposition problem for the Gram matrixis done in practice to calculate the leading eigenvectors for . Suppose that is an eigenpair of the matrix ; that is, is a unit eigenvector with the corresponding eigenvalue as the -th largest eigenvalue or . Then the most significant matrix-based KPC in feature space takes the matrix form ofwhere are the -th columns of , is the matrix with columns of , and is a diagonal matrix whose diagonal elements are . Then the total scatter matrix has the form . For a given matrix as a test data, with a kernelized form , its -th principal component vector corresponding to is computed as

Equation (5) is a critical factor for the proposed IMKPCA because an explicit form of the function is not required. That means the principal component vectors of a given matrix data onto the matrix-based KPC can be solved entirely by kernel functions. Based on equation (5), for one projection direction, the IMKPCA outputs a principal component vector, and the dimensionality of the vector is equal to the number of the columns of the input matrix. In comparison, the vector-based KPCA outputs a principal component value for one projection direction [11]. In the next section, we show how to update matrix-based KPC in the feature space based on the new data’s projections.

3. Incremental Matrix-Based Kernel Principal Component Analysis

In this section, a recursive matrix-based KPC formulation is selected to improve the batch nature of matrix-based KPCA. We describe how the original matrix-based KPC can be updated in an incremental way. The detailed procedures and their superiority compared with the standard counterparts are shown in the following sections.

3.1. Recursive Formulation for Matrix-Based KPC

Assume that the current data has been given as a block matrix and the new data is given for update. We commence with the recursion of the total scatter matrix:

Based on equation (4), can be decomposed into a component parallel and a component orthogonal to ; i.e.,where is the matrix of projecting onto the matrix-based KPC and is the matrix of projecting onto the orthogonal complementary space of .

Based on the idea of KPCA [6, 18, 29], the principal component can be expanded approximately in terms of some training samples in the feature space. This means that can be reconstructed by the subspace spanned by , the previous matrix-based KPC, and can be modeled as noises. However, we havewhere can be calculated by replacing with in equation (5). Then, the current matrix-based KPC can be obtained by rotating the previous matrix-based KPC to preserve the total scatter of most faithfully.

3.2. Rotation of Matrix-Based KPC

The key observation is that the effect of on the previous matrix-based KPC can be presented by a rotation. Then, based on equations (7) and (8), we have

Substituting the previous matrix-based KPCA result and equation (9) into equation (6), we get

Denote the eigendecomposition of the matrix as , where is an orthonormal matrix and is a diagonal matrix; equation (10) becomes

Consequently, based on and equation (11), the matrix-based KPCA system of the total scatter matrix equation (6) is recursively given by

In equation (12), represents the directional variation of matrix-based KPC caused by and represents the component ratio change of the updated matrix-based KPC. From equation (12), we rotate matrix-based KPC to most faithfully preserve the total scatter of the given data when is linearly independent on the previous matrix-based KPC.

3.3. Recursive Formulation with Mean Updating

For the sake of simplicity, the centralized assumption in feature space is provided in former analysis. However, the assumption is often invalid. The reason is that the mean matrix of the mapped data in the feature space always changes when new data is added for updating. For discarding this assumption in feature space, we getwhere and denote the mean matrices of the previous mapped data and the current mapped data, respectively.

In the total scatter matrix , the mapped matrices data are centered with the current mean matrix rather than the previous mean matrix that is used in the total scatter matrix . Hence, the recursion formulation equation (6) is invalid for application. Fortunately, we have

Following the idea of the scatter matrix update in incremental learning [30], it can be easily achieved that the recursive formulation with the mean update for the total scatter matrix of matrix-based KPC has the following form:

The recursive matrix-based KPC formulation of equation (15) can be computed following the approach in the previous subsection, and the update formulations can be obtained completely by imitating equation (7) ∼ equation (12). The mapped matrix should be projected onto equations (7) and (8), i.e., the eigendecomposition of matrix in equation (10) should be replaced by the eigendecomposition of matrix for updating, where is the -th principal component vector of and can be calculated by equation (5) as follows:

The recursive formulation equation (12) comes to one with the mean update for the proposed matrix-based KPC.

In the subsequent experiment section, for a given matrix-based dataset, a portion of matrix data are chosen as original data and the remaining matrix data are used as new data successively to update the current principal components.

4. Experiments

Several experiments were conducted to examine four properties of the proposed method: (1) the effectiveness of approximating KPC under stationary data; (2) the influence of parameter on the proposed method under MNIST and Fashion MNIST databases; (3) the superiority compared with several reference methods under some well-known databases, the Fashion MNIST [31], ORL [32], YaleA [33], Extended [34], PF01 [35], and COIL 100 [36]; and (4) the efficiency compared with reference methods for fashion products from the Fashion MNIST database.

In order to measure the accuracy of the proposed method in approximating MKPCA and to assess the quality of the solution objectively, a distance measure based on the angles between principal kernel components is employed in this section as follows:where is the ground truth of the -th principal kernel component computed by a standard MKPCA method and is the -th extracted kernel principal component computed by the proposed method.

For visualizing the capacity of IMKPCA in describing the construction of given data, two-dimensional stationary set and matrix-based dataset both are necessary because a two-dimensional datum is a special matrix datum. In Sections 4.1 and 4.2, the experiments are mainly based on two generated two-dimensional datasets. In Sections 4.3 and 4.4, the experiments are based on matrix-based datasets. The Gaussian kernel function is used in this section. For two types of matrix data and , the dot product matrix of them is as that mentioned in Section 2.

4.1. Stationary Data Approximation Accuracy

The experiment is carried out on the 90 toy data [10] to serve in testing the effectiveness of the proposed method in approximation accuracy. The data are generated in stationary environment in the following way: -values have uniform distribution in , , and -values are generated from , , where is normal noise with standard deviation 0.2. We calculate the principal kernel components by standard MKPCA [15], vector-based KPCA [6], and the proposed method, respectively. The values of the test data projection onto the extracted kernel principal components are given in Figures 1 and 2. Figure 1 contains lines of constant principal component value (contour lines). Those contour lines indicate the structure of the data in the feature space. The red data are the total sample, and the constant projection values onto the first five KPC computing by the three methods are shown. Figure 2 illustrates the results of the proposed method in the computation process. The green dots are the original data, red dots are the appended data, and blue contours depict the constant projection values onto the first three KPC. It can be visually seen that the KPC exacted by IMKPCA converges to the ground truth KPC, even beginning with an unsatisfactory initialization. Figures 1 and 2 show that the proposed method is with equal authentic accuracy but a wider range of original data choice compared with MKPCA and vector-based KPCA for computing the principal kernel components of the generated stationary data.

4.2. Influence of Parameters

In this experiment, we investigate how the number of original data affects the effectiveness of IMKPCA. The measure equation (17) is used as a distance between subspace computed by IMKPCA and the ground truth computed by MKPCA. Two matrix-based stationary data are used. One consists of the first 200 training images of “T-shirt/top” in Fashion MNIST, and the other contains the first 120 sets of training data of digit “0” from the MNIST dataset. For each stationary data, two results are shown with and as original data capacity, respectively. Equation (12) is used to update the subspace in feature space, and the distance between the updated subspace and the standard one is recorded with radian. Figure 3 shows the result of each iteration under different , in which the distances between the first three kernel principal components are computed as a distance between two subspaces (represented by “MKPC-1”, “MKPC-2,” and “MKPC-3,” respectively).

Figures 3(a) and 3(b) show the results for image data of “T-shirt/top.” Figures 3(c) and 3(d) show the results for matrix data of digit “0.” In each subfigure, the result shows that the KPC extracted by IMKPCA converges to the ground truth when iteration process equation (12) continues. Figure 3 shows that setting to a large value helps IMKPCA improve the accuracy of approximating the ground truth.

4.3. Study of the Spatial Statistics in the Matrix Data

The experiment concerns the advantage of IMKPCA in studying the spatial statistics in the matrix data. We conduct the proposed method on digits from the MNIST dataset and extract the matrix-based principal component on both column-lifted and row-lifted stages using the Gaussian kernel function. The column-lifted stage means each column of the matrix data is kernelized as as that mentioned in Section 2. The row-lifted stage means processing the feature matrix extracted by the proposed method in the column-lifted stage as input. The result of the row-lifted stage represents the final extracted spatial statistics of the given dataset. Each digit is a matrix, while we kept only the first sixteen principal components that are displayed as a matrix. For comparing the proposed method with the vector-based KPCA, the vector should be an input. However, an image of the digit is represented as a matrix. Hence the images are vectorized into vectors before being fed into vector-based KPCA, and the feature of each digit, a sixteen-dimension vector, consists of the first sixteen principal component values. Figure 4 shows ten example images of digit “3” and its matrix-based principal components computed by the proposed IMKPCA. Figure 5 shows the result of vector-based KPCA for vectorized digit “3,” in which the vertical axis shows the projection values of the vectorized digit onto the principal component, and the horizontal axis is the index of the ten samples in Figure 4. The projection value of digit “3” onto the -th principal component is represented by “VPC.” Figure 6 shows ten example images of digits “1” to “5”; two samples for each digit and their matrix-based principal components are computed by the proposed methods. Figures 46 show that the matrix-based features can provide more significant information than that of vector-based ones. The vector-based KPCA outputs a vector as a feature, while the IMKPCA outputs a matrix. Typically, the matrix-based features of digit “3” present as a distinctive white line in matrix in Figure 4; the digits “1” to “5” are present as lines in Figure 6, while the vector-based ones are numerical values in Figure 5. In particular, digit “4” is present with two lines as it has two distinct line structures; digits “2” and “5” are similar in column-lifted stage (the second row in Figure 6) and different in row-lifted stage (the third row in Figure 6). The reason is that they are two different digits, but the vertical flip of digit “5” is similar to that of digit “2.”

In order to show the advantage of the spatial statistics for the matrix data as quantitative analysis, two image tests, a digit recognition test and an image database classification test, are conducted. For digits test, the first 100 training images and the first 500 test images of each digit in MNIST serve to test the proposed method’s capability in digits recognition. The experiment is repeated for the dataset with different training data ratios as original data for computing matrix-based KPC in the feature space. The extracted features in the column-lifted and row-lifted stages are used to digitize the digit recognition under the nearest distance classifier. For Gauss function, in column-lifted stage and in row-lifted stage for the proposed method, and for vector-based KPCA are chosen to retain the total scatter of the data in the feature space.

Table 1 presents the average digit recognition rates and the variances (the value follow ) for the chosen dataset. The ratios , , , and correspond to , , , and as the capacity of original data, respectively. In Table 1, the result of the vector-based method is unique because it does not need to choose original data for updating. Table 1 shows that IMKPCA has a prominent advantage compared with vector-based KPCA. The reason is that the vectorization ruins the pixels’ spatial structure that defines the image of digit from the MNIST dataset. Meanwhile, IMKPCA brings back the structure. In particular, the results obtained by features in the row-lifted stage are better than those of column-lifted one due to the extracted spatial statistics for matrix data. Additionally, using more samples as original data in the training period increases the recognition performance. For image database tests, the Fashion MNIST, ORL, YaleA, Extended YaleB, and COIL 100 databases are chosen to evaluate the classification. Figure 7 shows some image samples from each database. The Fashion MNIST database consists of 60,000 training images and 10,000 test images of 10 fashion products, such as “T-shirt/top,” “Trouser,” and “Dress”. In the results with this database, the first 200 training images of each fashion product, 2,000 training images, are chosen. The test images are all used for the test. The ORL database contains 400 frontal images of 40 subjects, and ten images were taken with a few movements. The YaleA database consists of 165 frontal images of 15 subjects with different emotions. Half of each subject’s ORL and Yale images are chosen for training and the remaining for the test in the experiment. YaleB face database consists of over 2,200 frontal images of 38 subjects with different illuminations and the same position. Ten images of each subject are randomly chosen for training and the others for testing. The COIL 100 database consists of 7200 images of 72 objects. In the experiment, ten images of each object are chosen for training, and the remaining images are used for testing. Each image from COIL 100 is resized to a matrix in the experiment for computer resource restriction. Table 2 shows the average database recognition rates and variances under ten different training sample sequences for the given six databases, in which “NC” indicates “Noncomputable.” The proposed IMKPCA performs better than the vector-based PCA, vector-based KPCA, and 2D PCA to recognition rate for the five databases, as shown in Table 2. The proposed IMKPCA and the MKPCA show similar performances for the test databases. The reason is that the proposed method is methodologically consistent with MKPCA, and it is incrementally approximating the ground truth of MKPCA. In comparison, the proposed method is more efficient and more computable than MKPCA. The major reason is that the size of the eigendecomposition of MKPCA still depends on the datasets’ size, such as the beyond computation for the COIL 100 database and Fashion MNIST database in Table 2.

4.4. Computational Efficiency Comparison

This section compares the computation time and resource consumption between the MKPCA, vector-based KPCA, and IMKPCA. The Fashion MNIST database is employed to compare the proposed method’s feature extraction efficiency with that of reference ones. For exact and reasonable comparison, the computer operating environment is the same, and the three methods are executed on a computer with 4G RAM and 3.2 G CPU. The input dataset is extracted from the training samples of “T-shirt/top.” MKPCA, vector-based KPCA, and IMKPCA are executed for datasets with different capacities. Since the resource consumption of MKPCA is large, we consider the case that computing time is longer than 400 seconds and memory overflow with 4G RAM as “Noncomputable.” For the proposed method, the computing time in this example mainly depends on equation (12), in which and are computed with computational complexity . The computing times of the standard MKPCA and the vector-based KPCA depend on eigendecomposition with computational complexity and , respectively, where is the number of pieces of training data and and are the row size and column size of the given matrix data, respectively. Figure 8 shows the maximum value, minimum value, and mean value of angles between the extracted kernel principal component and the ground truth of MKPCA for images of 10 fashion products. It indicates that the proposed method’s result can approximate the real projection matrix well for Fashion MNIST. In particular, the minimum values and mean values for ten fashion products are mostly smaller than 0.05. Figure 9(a) shows the computational time for Fashion MNIST database under different capacities. It indicates that the vector-based method and the proposed method need much less computational time and resource consumption than MKPCA under the same computer system with software Matlab2014. Figure 9(b) shows the computational times for the proposed method under different ratios of original data to the computed data. For each curve, the percentage of 1000 fashion products image is used to perform the proposed IMKPCA method. Figures 8 and 9 show that the proposed method needs more time than the vector-based KPCA method but the proposed method occupies some advantages as a whole; for example, it can retain the spatial structure of the data in the matrix and it can provide more information than the vectored-based method with a better property. The proposed IMKPCA method nearly linearly increases, while the standard MKPCA increased sharply according to the training samples.

5. Conclusion

In this paper, an incremental matrix-based kernel subspace method is proposed. The main innovation made by the proposed method is the idea of rotating matrix-based kernel principal components to most faithfully preserve the total scatter of the given data when new data are appended. The proposed method is still subject to the KPCA methodology, which makes it more reliable than most of the existing algorithms for improvement. For the proposed method, the feature extractor should also be associated with the eigendecomposition problem, while the eigenvalue has a different dimension compared with that of the vector-based KPCA method. The experiment results show that feature extraction based on the proposed method is more efficient than the standard MKPCA and the vector-based KPCA. On the other hand, the proposed method obtains efficient feature extraction relying on the linear relationship between the current principal component and new appended data. It means that the proposed method is efficient in processing stationary dataset. Future work will concentrate on how to adapt the principal components to the nonstationary matrix dataset.

Data Availability

The data that support the findings of this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Key Research and Development Program (no: 2019YBF1600700).