Abstract
This paper presents a geometrical-information-assisted approach for matching local features. With the aid of Bayes’ theorem, it is found that the posterior confidence of matched features can be improved by introducing global geometrical information given by distances between feature points. Based on this result, we work out an approach to obtain the geometrical information and apply it to assist matching features. The pivotal techniques in this paper include exploiting elliptic parameters of feature descriptors to estimate transformations that map feature points in images to points in an assumed plane; projecting feature points to the assumed plane and finding a reliable referential point in it; computing differences of the distances between the projected points and the referential point. Our new approach employs these differences to assist matching features, reaching better performance than the nearest neighbor-based approach in precision versus the number of matched features.
1. Introduction
For matching local features, the threshold-based method and the nearest neighbor-based approach (NNA) are two fundamental strategies. Compared with the threshold-based approach, the NNA matches features more precisely [1]. From the previous works (cf. [2–5]) to the recent ones (cf. [6, 7]), the nearest neighbor techniques are exploited broadly. Since positions of matched features can be seen as samples generated from two images related by a certain homography, RANSAC [8] is usually applied to exclude the impact of outliers on the estimation [9–16], therefore improving matching effect of local features. RANSAC has a defect when inliers are fairly less in putative matches, which causes unpredictable elapsed time, even failing to estimate consensus sets. For matching features more efficiently, some methods introduce prior information into the algorithm of estimating consensus sets, e.g., Guided-MLESAC [17], PROSAC [18], and SESAC [19], using distributions constructed by prior information instead of the uniform distribution to generate hypothetical homographies. Besides, some preliminary processes are also employed to produce subsets containing more uncontaminated samples in advance before running plain RANSAC. For example, Cov-RANSAC [20] employs SPRT and a covariance test; DT-RANSAC [21] refines data by topological information in the Delaunay triangulation; SVH-RANSAC [22] adopts a local feature scale constraint to group observations; SC-RANSAC [23] utilizes spatial relations between extracted corresponding points; and WD-RANSAC [24] and QML-RANSAC [25] adopt the Wigner distribution and the quasi maximum likelihood algorithm, respectively. Nevertheless, in the case of matching features between nonrigidly transformed images, it is difficult to estimate a nonparametric consensus by a homography-estimation-based approach. Some nonparametric consensus methods are also developed to match local features. SparseVFC [26, 27] introduces sparse representations to estimating vector field consensus and shows powerful performance in matching features. LLT [28] exploits local geometrical constraints to estimate the consensus set which comprises inliers in matches between two rigidly or nonrigidly transformed images. Besides those consensus-estimation-based methods, approaches without estimating consensus are also studied. The method presented in [29] divides the vector of a local descriptor into subvectors and employs binary trees to compare subvectors to achieve NNA. For matching features in the binocular stereo scene, the approach in [30] uses some adjacent pixels near a feature point to form a block and then to find the best match by the block. LPM [31] applies local neighborhood structures to determine true matches, and based on it, GLPM [32] introduces a set with more confidence of including true matches, which are obtained by the distance ratios of local descriptors, and then reaches better performance than LPM. Another local-geometrical-information-based technique is proposed by [33] for describing features and matching features, which exploits topological relationship amid local features. Moreover, in the case of matching some deep features, convolutional neural networks are employed [34].
Motivated by approaches above built on local geometrical information, we discuss a new method assisted by geometrical information to improve the performance for matching features. An extracting procedure for scale-invariant features provides information about scale or even affine parameters of features (cf. [10, 35]), which can estimate the geometrical relationship between two images. We here discover an appropriate geometrical relation and study how to estimate it and exploit this relation to further improve matching effect. This paper is organized as follows. In Section 2, we discuss factors that influence matching effect and point out a new way to enhance matching effect. In Section 3, we discuss how Euclidean distance can assist matching features and work out an algorithm of matching features assisted by geometrical information. We would like to abbreviate the algorithm Geometrical-Information-Assisted Matching as GIAM in this paper. In Section 4, we test GIAM and compare the results with other methods. Finally we conclude our work in Section 5.
2. Factors That Influence Matching Effect
In what follows, denotes the event that a point matches another point , and is the negation of it. denotes a feature descriptor and is a similar metric of the descriptors of two points. Besides, the symbol represents that the function ( is a constant satisfying ) as , where is a vector in for some positive integer .
2.1. Discussion for Posterior Confidence of Matched Features
Usually it is not difficult to construct a descriptor and a similar metric satisfyingwhere and are two points of attempt matching. Suppose . For a certain descriptor and a similar metric, we have Hence the posterior probability of the matched and is where
As can be seen from (3), except the trivial case of , to reach a higher posterior confidence of the match , it is plausible to lower the conditional probability . Nevertheless, comparing with designing an appropriate and satisfying , constructing and such that , is a more arduous task. Thus it would be alleviating if another metric can be introduced satisfyingfor a given and for some , where the event is conditionally independent with . Let us denote and , so that the posterior confidence of matched features can be written as
In contrast to (3), (5) shows that approximating improves posterior confidence. As will be shown next, some geometrical information can be exploited to construct a metric satisfying (4).
2.2. Geometrical Information for Matching Features
Suppose that two images and are transformed, respectively, from an image by two affine homographieswhere and contribute to translations on the homogeneous coordinates; and are invertible linear transformations on the Cartesian coordinates, which yield rotation, scale change, and tilt of planar images. Let symbols and be matched points and symbols and be points (respectively, in and ) to be matched. Here we by “a point in an image”, say in , mean that , its domain of definition. Since the procedure for feature extracting hardly provides information about and , we then perform a technique to erase them by calculation of Euclidean distance (which is denoted by ) in the same image as follows: and Therefore we figure out the following expression describing geometrical information (where )
Noting that the point equals in the image , we will show that the geometrical information (9) can be employed as a metric satisfying (4) and (5). We call such a the referential point and the referential pair.
Proposition 1. Suppose that the image is defined on . Then any point in and any point in satisfy
Proof. Denote by the p.d.f. of projected feature points in (from or ) on and set . Since and do not match, the point differs from in . We then have where . It follows that as .
The result of Proposition 1 indicates an optimal case in applying (5). However, we need to formulate the discrete form for pragmatic use, which is built on a probability dominated by the counting measure.
Proposition 2. Suppose that is an image in its discrete form, where , for some . Then any point in and any point in satisfy where .
Proof. Denote by the probability of a feature point (from or ) being projected to the point and set . Since and do not match, the point differs from in the image . Hence we have where . We finish the proof by noting that for sufficiently small ’s, the ’s are constant.
Since the number of pixels centred at the referential point with a certain radius is far less than the number of pixels in the whole image, presumably in most cases the should be far less than . Therefore Proposition 2 offers a suboptimal result for (5) in matching features. The proofs and discussions of Propositions 1 and 2 are shown in Figure 1.

3. Matching Features Assisted by Geometrical Information
3.1. Discussion for Algorithm
We first discuss how to obtain and . Suppose that there is a linear transformation mapping an unit circle , to an ellipse , and denote , , and . Then we have which means that by Cholesky decomposition of the positive-definite matrix , we can obtain the linear transformation . In descriptors of Mikolajczyk’s format [10, 35], the affine region of a feature point is enclosed by an ellipse with parameters (for the details of these parameters please refer to the website http://www.robots.ox.ac.uk/~vgg/research/affine/descriptors.html#binaries), by which we can calculate and .
Next we need to find the referential pair and to estimate the invertible linear transformations and . Suppose that is a sequence, which consists of the pairs ordered in similar metrics of descriptors. We assign the most similar pair as the referential pair, i.e., . Since the unit circle in can be transformed by and onto the boundaries (ellipses) of affine regions of feature points and , respectively, and should be the respective Cholesky decomposition of matrices and , where are parameters of the boundary ellipse of the affine region of and the same applies to .
The geometrical information for each pair of feature points can be calculated henceforth. For two sets of feature points and , we compute their geometrical information asFinally, new matches are chosen by ordered scores which are calculated by a sum of similar metrics of features and geometrical information of feature points. We calculate the score of each putative pair as
3.2. Data Objects and the Algorithm of GIAM
According to the preceding discussion, we assign five matrices, including a descriptor difference matrix, a distance matrix for candidates in the image , a distance matrix for candidates in the image , a distance correlation matrix, and a score matrix to be basic data objects.
Similar metrics between each pair of candidates constitute the descriptor difference matrix , where .
The distance matrices are and , where the entries represent the distance induced by between the corresponding points (in ) of the -th and the -th feature points in , and the same applies to for feature points in .
The distance correlation matrix presents the geometrical information of feature points, which is given by for , whereand and are entries selected from distance matrices and in the following manner: if the entries of the -th pair are made up of the -th candidate in the image and the -th candidate in the image , then the -th column in matrix is the -th column in matrix , and the -th row in matrix is the -th row in matrix , whereas the operator in (17) is defined by . Here we name the order of the distance correlation matrix.
The score matrix contains all results computed by (16).
We summarize our algorithm in Algorithm 1.
|
4. Simulations
4.1. Datasets and Methods for Simulations
To test our new algorithm for local feature matching, we employ four methods, NNA, NNA with the plain RANSAC (NNA-RANSAC), NNA with LPM (NNA-LPM), and NNA with GIAM (NNA-GIAM). The codes of RANSAC adopted in our simulations are developed by Marco Zuliani (all these codes are obtained from the website https://github.com/RANSAC/RANSAC-Toolbox) and the codes of LPM are developed by Dr. Jiayi Ma (all these codes are obtained from the website https://github.com/jiayi-ma/VFC). We set parameters for LPM as shown in Table 1 and parameters for RANSAC as shown in Table 2.
We utilize an executable file implemented by Dr. Mikolajczyk (the original codes are obtained from the website http://www.robots.ox.ac.uk/~vgg/research/affine/) to produce experimental data. First, we use the executable file to process all test images and generate the sets of descriptors for each test image. Second, we match features between those sets by NNA, NNA-RANSAC, and NNA-LPM as well as NNA-GIAM, respectively. We here define that a point in the first image correctly matches a corresponding point in the second image, if the distance between the point in the second image and the standard point (to which the point in the first image is mapped via a given homography which is a ground truth) is less than 2 pixels. Precision of matches with regard to a number of matched features equals the number of correct matches divided by the number of matches. We apply a curve of precision vs. number of matches to estimate the performance of the four algorithms.
For tests under fairly complex scenes, such as changes from slight to dramatic degree of rotation, scale, viewpoint, blur, light, and JPEG compression, the Mikolajczyk’s test data [36] are exploited, which consist of a group of image sets with homographies as the ground truth (the image sequences are from the website http://www.robots.ox.ac.uk/~vgg/research/affine/detectors.html) There are 8 test sets in the group, and each set contains 6 images. We match features in the first image to the rest of images in each test sequences by NNA, NNA-RANSAC, and NNA-LPM as well as NNA-GIAM, respectively. In these tests we employ the Harris-Affine detector [10, 35] to extract features and describe them by the SIFT [3, 4].
4.2. Results of Experiments
We obtain 40 test curves divided into 8 sets, which are shown in Figures 2–9 in the appendix. In these figures, the “1vs.k” in the captions is interpreted as “the first image versus the -th image”, indicating that the test result is obtained by matching features between the first image and the -th image in the same test image sequence.

(a)

(b)

(c)

(d)

(e)

(a)

(b)

(c)

(d)

(e)

(a)

(b)

(c)

(d)

(e)

(a)

(b)

(c)

(d)

(e)

(a)

(b)

(c)

(d)

(e)

(a)

(b)

(c)

(d)

(e)

(a)

(b)

(c)

(d)

(e)

(a)

(b)

(c)

(d)

(e)
In the case of rotation and scale change for the textured scene (cf. Figure 2), GIAM overcomes NNA and exceeds LPM and RANSAC when the change is dramatic. After the middle degree change ((c), (d), (e) in Figure 2), GIAM outperforms LPM on most intervals of “number of matched features”. GIAM shows slight advantage to NNA on the rotation and scale change for the structured scene (cf. Figure 3). In the cases of blurred images on both structured scene and textured scene (cf. Figures 4 and 5), GIAM improves precision of NNA. GIAM reaches higher scores than LPM when the number of matches is relatively small ((a), (b), (c) in Figure 4) in the structured scene test, and admits analogous precision to RANSAC. For the severest JEPG compression, GIAM has better performance than NNA and LPM (cf. Figure 6). GIAM, NNA, and RANSAC show almost identical performance in the case of slight to mild JPEG compression over more than half of the matches. LPM finds relatively fewer correct matches in this test sequence. In the case of illumination change (cf. Figure 7), GIAM reaches higher precision than NNA as the light turns dim and in general slightly outperforms LPM over the first 1/3 of all matches in each degree of illumination. In the case of viewpoint change for the textured scene (cf. Figure 8), GIAM overcomes LPM and exceeds NNA in all tests, and has comparable performance with RANSAC over the first 1/2 of all matches. Under the situation of viewpoint change for the structured scene (cf. Figure 9), GIAM exceeds NNA in all tests and shows better performance than others when the number of matches and the change of viewpoints are relatively small ((a), (b) in Figure 9).
Consequently, it can be seen from these results that GIAM improves precision of NNA on the situations including scale change, blur change, JEPG compression change, illumination change, and slight change in viewpoint, and reaches better performance than LPM in some change of rotation, scale, illumination, signal compression, and viewpoint.
5. Conclusion
We have studied how geometrical information improves the posterior confidence of matched features and therefore enhances the matching effect. There are two essential merits in our work. The first one is that we utilize Bayes’ theorem to analyze factors influencing matching effect and figure out that prior information, which is a conditional distribution satisfying (4), can assist improving the posterior confidence of matches. The second one is exploiting the geometrical information in the descriptors, which can be used to estimate the distance between projected feature points. Consequently, the technique proposed in this paper shows its capability to improve the performance of matching features.
Appendix
Results of Simulations
Data Availability
The image and diagram data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work is supported by Guangdong Project of Science and Technology Development (2014B09091042) and Guangzhou Sci & Tech Innovation Committee (201707010068). The authors appreciate Dr. Krystian Mikolajczyk for his test data and an executable file, Dr. Macro Zuliani for his codes of RANSAC, and Dr. Jayi Ma for his codes of LPM.