Abstract
Sparse subspace clustering (SSC) is one of the latest methods of dividing data points into subspace joints, which has a strong theoretical guarantee. However, affine matrix learning is not very effective for segmenting multibody nonrigid structure from motion. To improve the segmentation performance and efficiency of the SSC algorithm in segmenting multiple nonrigid motions, we propose an algorithm that deploys the hierarchical clustering to discover the inner connection of data and represents the entire sequence using some of trajectories (in this paper, we refer to these trajectories as the set of anchor trajectories). Only the corresponding positions of the anchor trajectories have nonzero weights. Furthermore, in order to improve the affinity coefficient and strong connection between trajectories in the same subspace, we optimise the weight matrix by integrating the multilayer graphs and good neighbors. The experiments prove that our methods are effective.
1. Introduction
Nonrigid structure from motion (NRSFM) is a hot topic in the field of computer vision. It is aimed at recovering both camera motion and nonrigid structure from two-dimensional images of a monocular camera. At present, there are many methods for solving this problem, which can achieve satisfied results [1–3]. In 2020, Jensen et al. introduce a benchmark and evaluate 18 different methods in sparse NRSfM [4]. However, most of the methods still assume that there is only one nonrigid structure in the scene. Unfortunately, in the real world, scenes tend to be much more complex; for instance, several activities, such as playing a basketball game and walking, could be performed on the same scene simultaneously. This is an example of scenes with multibody NRSFM. Therefore, the assumption of a single object is not practical. Therefore, the study of NRSFM reconstruction cannot be limited to the case of a single object. However, as it is very difficult to reconstruct the NRSFM with a single object, it is even more difficult to reconstruct a multibody NRSFM.
For this problem, we can easily refer to a scene of multiple rigid objects. In the case of multiple rigid objects, the problem can be divided into two independent steps: first, each object is divided from 2D coordinates [5, 6], and then, the latest reconstruction technology is used for reconstruction [7, 8]. Alternatively, we can regard multiple objects in the same scene as a whole, reconstruct the scene uniformly, and then divide the 3D coordinates. This idea can also be applied to multiple nonrigid motion scenes. However, this idea has a high segmentation accuracy requirement; furthermore, because of nonrigidity and overlapping, it is easy to obtain bad segmentation results.
Therefore, in order to improve the segmentation effect, in this study, we propose a multiple nonrigid motion segmentation algorithm based on sparse subspace clustering. Through subspace clustering, we can learn the affine matrix of the natural coding subspace and then obtain the number of deformable objects, different activities, and membership degree of each sample by using the idea of spectral clustering [9, 10]. To improve the segmentation effect of SSC on multibody NRSFM, we use random hierarchical clustering to select a group of trajectories as the set of anchor trajectories to represent the entire sequence. At the same time, to reduce the influence of the input parameters on the algorithm and make the algorithm more extensible, we used a method for estimating the number of anchor trajectories, . Finally, to improve the robustness of the algorithm, we optimize the weight matrix obtained using multilayer graphs [11] and the concept of the good neighbors [12] to improve the result.
2. Related Work
In this section, we will analyze the research status on multibody NRSFM and SSC.
2.1. Multibody Nonrigid Structure from Motion
The multibody nonrigid structure from motion (multibody NRSFM) is the inevitable extension direction of the research on nonrigid motion structure. At present, there are two main methods for analyzing a multibody nonrigid scene. The first is to separate the tasks of segmentation and reconstruction and tackle them separately. This approach can be regarded as the extension of the approach to multiple rigid scenes to multiple nonrigid scenes. In the second category, the clustering algorithm is added to the constraints of the model using the spatiotemporal union method, and the segmentation and reconstruction are processed simultaneously.
Russell et al. [13] proposed a method of simultaneously segmenting a complex dynamic scene containing a mixture of multiple objects into constituent objects and reconstructing a 3D model of the scene by formulating the problem as hierarchical graph-cut-based segmentation, where the whole scene is decomposed into background and foreground objects, with the complex motion of non-rigid or articulated objects modeled as a set of overlapping rigid parts. In 2014, in addition to performing temporal clustering, Zhu et al. [14] regarded modeling motion as a union of temporal subspaces. Although their work still focuses on a single object, it provides ideas for extending the research to multibody NRSFM. In 2016 and 2017, Kumar et al. [15, 16] proposed using the SSC algorithm to establish a spatiotemporal union model and segmented and reconstructed the multibody NRSFM simultaneously. In two nonrigid bodies on the same scene, it achieved good results. Now, with the development of artificial intelligence (AI) [17, 18] and deep learnings, more and more fields begin to contact deep learnings to explore the comparison between the neural network and early traditional algorithms. In reference [19], the neural network is used to deal with NRSfM problems, which proves that the neural network can also deal with NRSfM problems well.
2.2. Sparse Subspace Clustering
On multibody NRSFM scenes, each object is distributed in different subspaces. As time goes on, different feature points move to different trajectories. Although a principal components analysis (PCA) [20, 21] can find the low-dimensional structure in high-dimensional data, it does not take into account the situation where the dataset contains multiple structures. Therefore, for multiple nonrigid segmentations, the best approach is to select multiple subspaces instead of a single subspace. Therefore, subspace clustering [22] is a good way to solve this problem. The best solution to the problem of subspace clustering is based on a spectrum. First, calculate the similarity between all the feature points. Then, cluster the affine matrix by spectrum. Finally, we can achieve the goal of separating a specified number of low-dimensional subspaces in the data space. With the development of sparse representation, an interesting concept, the attribute of self-expression, has attracted significant attention in the spectrum-based methods: where , represents the input data, and represents the weight vector corresponding to . Equation (1) can be interpreted as follows. Each data point can be expressed as a sparse linear combination of other data points in the same subspace. The main representative methods of this part are sparse subspace clustering, low-rank representative (LRR) [23–25], and least square regression (LSR) [26]. In the independent and nonintersecting subspaces [27–29], the SSC has a strong theoretical guarantee in the case of noise and nonnoise. In the presence of noise and disjoint subspace, the LRR cannot be effectively modeled. Compared to the SSC and LRR, the LSR has the least computational cost, when the subspace is independent.
However, despite the theoretical guarantee and empirical success of the SSC, it is necessary to solve Equation (1) that is equal to the total number of data points for each data point. Therefore, when the data scale is large, the cost of solving this least absolute shrinkage and selection operator (LASSO) problem will be high. To solve the problem of the SSC’s poor performance on large-scale datasets, the scalable sparse subspace clustering (SSSC) is proposed in [30]. This method reduces the computational cost by randomly selecting a part of datasets as a group and can be successfully extended to large datasets. However, the algorithm is not stable, because it uses a random way to select support points. In reference [31], the author opted to not solve Equation (1) according to the idea of a greedy algorithm, using orthogonal matching pursuit (OMP) [32] instead to obtain the sparse representation of each data point. Although the OMP algorithm does boost the effectiveness of the SSC algorithm on large-scale data, it is based on a greedy algorithm. Therefore, the theoretical guarantee is weak and the computational cost is high. In order to solve the SSC problem better, reference [33] propose an iterative weighting (reweighted) minimisation framework which largely improves the performance of the traditional minimisation framework—RSSC. In reference [34], they propose a method for controlling the connectivity and sparseness of subspaces by combining the -norm and -norm. In [35], a subspace clustering algorithm based on Oracle, named elastic net subspace clustering (EnSC), is proposed. However, this method is sensitive to the initially sparse and connected parameters. Recently, the k-SSC [36] and SR_SSC [37] that reduce the scale of the LASSO problem from all the datasets to the anchor points set were proposed for selecting anchor points as support sets.
3. Estimation and Optimisation of Weight Matrix
Considering that there is an image sequence with frames, each frame contains the number of nonrigid motion and trajectories: , each is the 3D motion trajectory of the th point. The process of using the SSC algorithm to segment can be interpreted as follows: there are subspaces, and each trajectory in the sequence can be represented using the sparse linear combination of other trajectories in the same subspace (this can be understood as the same non-rigid object), as: . This is illustrated in Figure 1, where each column of is the trajectory of the 3D point (shown in green). As shown in the visualisation, can be represented by the linear combination of several other trajectories.
Because of the nonrigidity and overlapping, the sparse subspace algorithm alone cannot segment the multiple nonrigid motions effectively. In Section 2.2, due to the LASSO problem, the general SSC algorithm does not have good expansibility. Therefore, to reduce the computational cost of solving the LASSO problem, we refer to the work in [37] and select a certain number of trajectories (anchor set), , as the support set of each trajectory through hierarchical clustering. Using indices , we express as follows:. We replace the self-expression attribute with the following:
In this study, we use the SSC to segment the multiple nonrigid motions in the same scene. Therefore, the SSC model is applied to the multiple nonrigid datasets according to our goal:
Thus, the scale of solving the LASSO problem of the SSC is reduced. Different sizes of datasets are suitable for different . Unfortunately, in literature [37], is manually selected, instead of being automatically calculated by the algorithm, according to the different datasets. Apparently, it cannot be effectively applied to real scenes. Therefore, to make more adaptable to multiple nonrigid motion datasets and improve its robustness, we propose a method to estimate it reasonably and optimise the weight matrix.
3.1. Estimating
Inspired by reference [38], we attempt to estimate the grouping of data itself using the simple idea of clustering transmission among data and regard it as the corresponding anchor trajectories number of the dataset.
For entire 3D motion trajectories set , first, calculate the first neighbourhood relationship matrix using the accurate distance method. Then, use the first neighbourhood relationship of each trajectory to find the mutual links and groups in all the trajectories. Through the index of the first neighbour of each trajectory, we can directly define an adjacency matrix that is expressed as follows:
where is the first neighbour of , the adjacency matrix connects each trajectory to its first neighbour through , forces symmetry through , and connects trajectories ()with the same neighbors as . Equation (4) can return a symmetric sparse matrix and directly specify a connected graph in which the connected components are clusters. Using the undirected graph or directed graph , connected components can be effectively obtained from the adjacency matrix, where is the set of nodes (clustered trajectories), is the edge of the connected node, and .
The condition in Equation (4) is actually the combination of 1-NN and SNN. This method eliminates the need to solve the problem of graph segmentation. In other words, the adjacency matrix obtained from Equation (4) has absolute connection; thus, it is not necessary to use any distance value as the edge weight, or solve the graph partition.
|
In paper [38], the main flow of the proposed algorithm is straightforward. After computing the first partition, they proceed to merge these clusters recursively. However, if we set the exact grouping as the number of anchor trajectories , less error is allowed in the selection of anchor set. The selected trajectories must have good representativeness. However, this is very difficult, because there is no criterion to determine whether the current trajectory can effectively represent the whole sequence.
Therefore, in this study, by calculating the first adjacency matrix of the sequence once with Equation (4), we obtain a general group and designate the number of groups the number of anchor trajectories , like Algorithm 1. This ensures maximum flexibility in the selection of the anchor set. For the convenience of calculation, we use when defining the first adjacency matrix; thus, we transpose the sequence and assume .
3.2. Optimisation of Weight Matrix
After determining the values of , we use the hierarchical clustering based on the factorisation of the rank-two nonnegative matrix given in [39] to select the set of anchor trajectories. Although this method is somewhat random, it has been proven in literature [37] that this method is better than the random uniform selection used by SSSC [30]. However, it is very difficult to select the anchor trajectories without any prior conditions. Although the author has provided theoretical proof for this method in paper [37], the randomness in the algorithm cannot be ignored. To reduce the decisive influence of the anchor trajectories selection on the algorithm’s effectiveness, the idea of multilayer graphs is added in reference [37]. In our subsequent experiments, it proves that in multiple nonrigid scenes, the multilayer graphs alone cannot improve the effect of bad selection. For this reason, we refer to [12], which uses a postprocessing technology, and modifies the weight matrix through the good neighbors to optimise the sparseness and connectivity, the importance of the anchor trajectories is minimised.
For each trajectory, , we select , the most similar trajectories from the affine matrix , using Equation (5), while , , , and :
|
where is the absolute value of matrix element, , and the other rows of are equal to zero. To better reflect the strong connection of the trajectories in the same subspace and the weak connection of trajectories across different subspaces, the following conditions are introduced:
Equation (7) can be understood as follows. In the dataset, , for each, we have and ; thus, there are common neighbors between and . We take the as the judging condition; when, we assume that is one of the good neighbors of . In terms of calculation efficiency, due to the increased path length, the relationship between samples will be weaker. Therefore, to ensure the calculation efficiency, we assume in the rest of this study; that is, when there is more than one common similarity between , can be referred to as the good neighbour of . Finally, for each , we extract good neighbors from and express the set of good neighbors as follows: , where . Finally, to reduce the error of spectral clustering, we limit: and set the sum of each row of optimised as 1:
where is the sum of between and and
4. Proposed Method
In the third section, we analyse the process of estimation based on the number of anchor trajectories and weight matrix optimisation. Then, we integrate the theory and build a complete multiple nonrigid segmentation framework. Based on the idea of multilayer graphs [11], we choose the group, each group contains anchor trajectories, using indices , express the dictionary dataset of group as . Similarly, is the weight matrix corresponding to a -level graph. The optimisation problem (3) can be converted to the following equation:
For optimisation problem (9), we can use the alternating direction method of multipliers (ADMM) algorithm to optimise [37, 40]. Considering our requirements, the ADMM algorithm is very suitable because it has a low computational cost per iteration (linear in the number of variables). Furthermore, its slow convergence (linear at best) is not a bottleneck, because a high precision is not necessary as we only need to know the order of magnitude of the entries of . Moreover, it is not necessary to solve (9) with high precision, because the data is usually very noisy.
Then, when we obtain , we build a -level graph , , where the trajectories to be clustered are in , and is the affine matrix of th layer. In order to calculate the affine matrix later, and at the same time to distinguish easily, we redefine the weight matrix, :
We use Algorithm 2 to optimise the weight matrix for the th layer, making the affinity coefficient more representative. Then, can be obtained as follows:
Following the spectral clustering [9], we compute the Laplacian matrix of each layer as follows:
where is the identity matrix and is the diagonal matrix, whose value is the sum of each row of is expressed as follows:
|
According to the derivation of reference [11], we integrate the multilayer graphs. For each layer of graph, , it is necessary to obtain its corresponding -dimensional subspace representation matrix , which is composed of the eigenvectors corresponding to the smallest eigenvalues in Finally, the information of the graph, , is integrated to obtain the expression of the final Laplace matrix :
By using the -means to classify , the eigenvector corresponding to the smallest eigenvalue of the Laplacian matrix , we can obtain the final classification of .
5. Experiment
5.1. Data and Operating Environment
Most of the sparse datasets used in this study are from http://cvlab.lums.edu.pk/nrsfm; the others are from some open-source projects. We make reference to Kumar et al. [15, 16] and merge multiple datasets to simulate multiple nonrigid motions on the same scene, as shown in Figure 2(b). In Figure 2(b), different movements are represented by different color blocks. The sparse NRSFM datasets that we used are as follows: dance (75 feature points), shark (91 feature points), walk (55 feature points), yoga (41 feature points), stretch (41 feature points), face (37 feature points), pickup (41 feature points), and drink (41 feature points), some show in Figure 2(a). The operating system used in this experiment is macOS, which is configured as 3.1 GHz Intel Core i5. The memory is 8 GB, and the MATLAB version is 2017b.
(a)
(b)
5.2. Performance Evaluation Index
To better compare the difference between the segmentation results of the algorithm, we measure the segmentation effect using two indicators: the correct rate and the number of incorrectly classified trajectories.
Accuracy rate:
: the number of incorrectly classified trajectories is the algorithm segmentation errors, defined as follows:
5.3. Experimental Parameter
When the algorithm is running, three parameters are required: , , and .
: in reference [11], a parameter, , is required to balance the connectivity between each layer of graphs and the representation of the distance between subspaces, that is, Equation (14). In [11], the author proposes setting it to 0.5 and states that the theory of multilayer graphs is not sensitive to the setting of ; at the same time, we need the contrast algorithm, SR_SSC, to set to 0.5. To facilitate the comparison between the algorithms, we set to be equal to 0.5
: in Section 3.2, when optimizing the weight matrix, for each data point, it is necessary to select the largest similarity points. To explain the influence of the setting on the algorithm, we observed the effect of varying on the algorithm. We chose three datasets of different sizes: (1) yoga+stretch+pickup, (2) shark+yoga+dance+walking, and (3) shark+yoga+stretch+pickup+dance. Furthermore, we will assume (in other words, ). Each dataset was tested 20 times to calculate the average accuracy. The results are shown in Figure 3. From the trace of the curve in Figure 3, it can be observed that after in the three datasets, the segmentation accuracy tends to stabilise. Therefore, to improve the scalability of the algorithm, in this study, we set as 10.
: similarly, to illustrate the influence of on the algorithm, 20 experiments were conducted using the same three datasets. Here, the value of is 10, and the average accuracy is calculated, as shown in Figure 4. From the trend of the curve in the figure, it can be seen that the assumption that is 7 is relatively reasonable.
5.4. Analysis of the Anchor Trajectories Number Estimation
To analyse the rationality of the estimation of by Algorithm 1 in Section 3.1, we construct five composite data, as shown in Table 1. Because our algorithm refers to the basic idea of the SR_SSC [37], Table 1 only compares the experimental results of our algorithm and the SR_SSC algorithm using different on multibody nonrigid segmentation (MNR_S), and the code of the SR_SSC provided open-source links in the relative literature.
To mitigate the effect of the randomness of the anchor trajectories selection on the reliability of the results, we conducted 20 experiments on each dataset to calculate the average accuracy, and the results are shown in Table 1. In Table 1, bold indicates the best effect on the dataset, followed by italics. For comparison, all the parameters, with the exception of the anchor trajectories and , are set to be constant. By setting the in the SR_SSC algorithm as 60, 80, and 100, respectively, and estimating the formula in Section.3.1, four situations are obtained. In Table 1, in the section of SR_SSC (Section 3.1), , as obtained by the estimation of Algorithm 1, is in parenthesis beside the accuracy value, and the numbers in parentheses in the dataset represent the total number of feature points in the dataset.
From Table 1, we can see that the appropriate anchor trajectories of SR_SSC are different for the different data scales. Using the method of estimating that we introduced in Section 3.1, the experimental result shows that the results are relatively good across different datasets. Moreover, after combining the good neighbors in our method, our multibody nonrigid segmentation(MNR_S) is greatly improved, and the average accuracy is more than 90%.
5.5. Analysis of the Influence of Optimisation Weight Matrix on Algorithm
To observe the effect of optimising the weight matrix in addition to using the multilayer graph method, we visualized the difference of affinity matrices obtained by our method MNR_S and SR_SSC (Section 3.1) in Figure 5. Comparing the error correction ability of the same set of anchor trajectories separately, draw in 20 experiments on the SR_SSC and our algorithm, as shown in Figure 6. Furthermore, we show the clustering results for one dataset, ‘Dance+walking+shark+yoga+stretch’ in the same group of experiments, as shown in Figure 7. In Figure 7, the same color indicates belonging to the same motion, small blocks in different colors mean clustering error, and abscissa is expressed as feature point. Because each motion contains different feature points, so the color block width displayed is different.
(a) Shark-yoga-dance
(b) Shark-yoga-stretch
(c) Shark-yoga-dance-walking
(d) Shark-yoga-stretch-pickup-dance
(a) Yoga+stretch+pickup
(b) Shark+dance+yoga
(c) Shark+dance+yoga+walking
(d) Shark+yoga+pickup+stretch+dance
(a) SR_SSC (Section 3.1)
(b) MNR_S
In Figure 5, it can be observed that the affine matrix calculated in our algorithm effectively removes the interference items around, ensures the more representative affinity coefficient is concentrated near the main diagonal, reduces the influence of interference coefficient on segmentation, and makes the algorithm more stable.
From the curve trend of Figure 6, that in the multiple nonrigid scenes, our algorithm is obviously superior in terms of stability and error rate. Especially in the ‘shark+dance+yoga+stretch + pickup’ and ‘shark+dance+yoga+walking’ datasets, the error rate is not ideal under the influence of high motion similarity when only the multilayer graphs is used. In contrast, our algorithm cooperates with the optimisation of the weight matrix, reduces the interference term of the final obtained affine matrix, and visibly reduces the error.
5.6. Comparison with Other SSC Algorithms on Multibody NRSFM
In this subsection, we compare the segmentation results of other SSC algorithms to demonstrate the stability of our algorithm in multibody NRSFM segmentation. We compare our algorithm MNR_S with the SSC, SSSC [30], EnSC [35], and k_SSC [36] algorithms on the synthesised dataset. To rule out accidents, 20 experiments are conducted on the same set of data, and the average accuracy is obtained. The results are shown in Table 2. To ensure fairness, the initial parameters in the EnSC algorithm are different in each dataset. For each dataset, we debug a variety of possible parameter sets for the EnSC algorithm to achieve the best effect in the current experimental dataset. The dictionary set of the SSSC, with the size of 30 β, adopts the random uniform selection method, where is the number of nonrigid bodies. (Remarks: in Table 2, bold indicates the best effect on the dataset, followed by italics; ‘——’ indicates that the segmentation effectiveness is still less than 50% after adjusting the parameters.)
As can be seen from Table 2, in general, the segmentation accuracy of our algorithm is above 90%. However, the effectiveness of other algorithms is sometimes lower than or more than 90%.
For the three nonrigid datasets, the five algorithms perform relatively well. Although the EnSC algorithm achieves 100% segmentation accuracy on the shark+dance+yoga dataset, and the SSC achieves 100% segmentation accuracy on the ‘shark+walking+drink’ dataset. In contrast, our algorithm is very stable on five synthetic datasets, with an average segmentation accuracy of 98%.
For the four nonrigid scenes, the EnSC and SSC algorithms achieve good results on the ‘shark+dance+yoga+walking’ dataset; however, the accuracy of the remaining four datasets is not as high as 90%. The average accuracy of our algorithm in five datasets exceeds 94%.
For the five nonrigid scenes, our algorithm’s accuracy is also more than 90%; the other algorithms are not ideal (for the k-SSC, we also attempt to adjust the various parameters and observe the effect. However, many synthetic data remain unideal).
In Table 3, for two objects in the scene, SSC and MNR_S are used to segment the 2D observation matrix first, and then, the BMM is used for reconstruction. Because our segmentation algorithm can segment the scene of two objects very well, the effect is significantly better than that of SSC+BMM( is expressed as the 3D reconstruction error calculated by BMM).
We also extract a dataset, ‘shark+yoga+stretch+pickup+dance,’ and show four comparison algorithms and our method’s segmentation result in Figure 8, in which different colors represent different motions, small blocks in different colors mean clustering error. It can be seen that the segmentation result of our method is obviously better. The other algorithms cannot segment the middle three motions because of the high similarity, instead, our algorithm is very ideal.
(a) SSC
(b) SR_SSC (Section 3.1)
(c) SSSC
(d) EnSC
(e) MNR_S
MNR_S is less sensitive to the parameters. For example, from Section 5.3, we can observe that the fluctuation of the average accuracy rate when the value of exceeds 3 is minimal. Likewise, for , varying the initial parameters in the normal range has little effect on our algorithm. At the same time, we use the combination of the good neighbour and the multilayer graphs to eliminate the influence of the poor selection of anchor set on the algorithm. The experimental results demonstrate that our algorithm is stable and performs robustly on our synthetic dataset.
6. Conclusion
The main purpose of this study is to deepen the research on ‘3D reconstruction and segmentation of multi-body NRSFM.’ To study the segmentation effect of current subspace clustering algorithms on multiple nonrigid bodies on the same scene, we drew our inspiration from Abdullali et al.’s study [37]. At the same time, we realised that because the number of anchor trajectories in the SR_SSC algorithm was manually determined, it could not be effectively adapted to all datasets; and the segmentation has certain randomness. With the first neighbour of each trajectory, we use the concept of hierarchical clustering to mine the grouping between data, estimate the number of anchor trajectories , and selected anchor trajectories by random hierarchical clustering. Then, we modify the weight matrix obtained by good neighbors to improve the affinity coefficient. At present, from the perspective of the experiments, our algorithm is effective for the segmentation of multibody NRSFM. Furthermore, the proposed algorithm can be more suitable for different scenes of multibody NRSFM, because the initial parameters have little influence on it.
Data Availability
The data used to support the study can be available upon request
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was supported by the Natural Science Foundation of Zhejiang Province (LZ20F020003, LY17F020034, and LSZ19F010001), the National Natural Science Foundation of China (61272311, 61672466), and the 521 Project of Zhejiang Sci-Tech University.