SAR Image Matching Based on Local Feature Detection and Description Using Convolutional Neural Network

Elwan, Mohammed; Amein, Ahmed S.; Mousa, Aiman; Ahmed, Abdelmoty M.; Bouallegue, Belgacem; Eltanany, Abdelhameed S.

doi:https://doi.org/10.1155/2022/5669069

Security and Communication Networks

On this page

Abstract Introduction Related Works Discussion Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 5669069 | https://doi.org/10.1155/2022/5669069

SAR Image Matching Based on Local Feature Detection and Description Using Convolutional Neural Network

Mohammed Elwan,¹Ahmed S. Amein,²Aiman Mousa,³Abdelmoty M. Ahmed,⁴Belgacem Bouallegue,⁴and Abdelhameed S. Eltanany³

Academic Editor: Bruno Carpentieri

Received01 Nov 2021

Revised07 Apr 2022

Accepted20 Apr 2022

Published24 May 2022

Abstract

Feature detection is a vital step for the image registration process whose target is the misalignment correction among images to increase the convergency level. Deep learning (DL) in remote sensing has become a worldwide sensation. Despite its huge potential, DL has not reached its intended target concerning the applications of Synthetic Aperture Radar (SAR) images. In this study, we focus on matching SAR images using a Convolutional Neural Network. The big challenge in this study is how to modify a pretrained Visual Geometry Group model based on the multispectral dataset to act as a SAR image feature detector where it does not require any prior knowledge about the nature of the SAR feature. Since SAR images have different characteristics from optical images such as SAR dynamic range and imaging geometry, some problems arise and should be considered during the matching process. Despite all these difficulties, results demonstrate the robustness of the registration process where it can provide descriptors that preserve the localization data of features. Also, the proposed approach provides reasonable results compared to the state-of-the-art methods and outperforms the correlation approach and ORB descriptor under scaling. In addition, it may be considered an end-to-end image matching tool for SAR images, although the calculations of fine matching parameters are included.

1. Introduction

The process of image registration aims to find the deformations among the used images to be corrected for further processing. These deformations resulted from capturing the images by different sensors, at different times, or from different viewpoints. The registration process can be defined as an optimization issue to get the correct values of a transformation matrix required for mapping one image to another one for providing the dedicated similarity factor. The accuracy of this process is mainly dependent on the correct detection and extraction of the same features demonstrating the same objects inside images. Generally, image registration techniques are classified into two main categories, traditional and automated techniques. Traditional techniques suffer from consuming a high processing time since they depend on a manual feature selection that is strongly dependent on the n user’s experience. Automated techniques are proposed to overcome the limitations of traditional techniques, where they are classified into three groups: Area-Based Matching (ABM) methods, Feature-Based Matching (FBM) methods, and hybrid methods. FBM methods include both feature detectors and feature descriptors, where descriptors may act as detectors and/or descriptors. Many approaches are performed from detection and/or description processes, and each approach has its advantages and limitations. Generally, two types of features can be detected: global and local features. Image content is wholly depicted using global features, while key points detection inside the image requires the usage of local features as shown in Figure 1 [1–4].

(a)

(b)

Many registration problems arise during the capturing process of images as viewpoint, multimodal, temporal, and template registration probe ability to use these images together requires recovery from these problems. Viewpoint registration problem concerns the variations of viewing angles and acts as a shape reconstructing process. Multimodal registration problem concerns the usage of different payloads and acts as an information integration process. On the other hand, the template registration problem concerns the effect of imagined geometry and acts as a recognition process. Finally, the temporal registration problem is the most famous one that concerns the effects of imaging and environmental variations, including mainly the effect of noise, and acts as a detection process [5–8].

Nowadays, many remote sensing image applications allow the usage of Artificial Intelligence (AI) for multispectral images in many tasks such as object detection and recognition, image registration, image fusion, and image segmentation. But, this is not the case for SAR image applications. AI can be described as Machine Learning (ML) which can be classified as Neural Networks (NN), Convolutional Neural Networks (CNN), Deep Learning (DL), and Spiking Neural Networks (SNN). Since the characteristics of SAR images differ from that of multispectral images, some problems should be considered while applying AI approaches to SAR images, such as SAR image dynamic range, SAR signal statistics, SAR imaging geometry, complex nature of SAR data, and shortage of SAR images library. Finally, the registration process should resist any imaging and environmental variations to maximize the similarity among captured scenes. So, the wide scope of registration approaches makes it difficult to compose these approaches since each one concerns a certain problem for a certain application [5, 9–17].

The proposed approach demonstrates the usage of a Visual Geometry Group (VGG-16) model [18] as a SAR image feature detector. Since the VGG-16 model is mainly used as a classifier for multispectral remote sensing images, modifications related to its structure and operational concept are required to adapt to opt new task, SAR image registration. First, a Gaussian kernel filter is used to smooth the intensities variations acting as a normalization process to overcome the problem of SAR image dynamic range. Using the image intensity form overcomes the problem of SAR image signal statistics. The intensity image is then applied to the model after being resized to a multiple of 32 using the cubic spline interpolation technique that provides a fine output having a smaller error. Detected feature locations are depicted using two parameters representing both x and y coordinates; then, descriptors corresponding to these features are generated. These descriptors may be good or not; good descriptors are required, while others should be reduced since they affect the output accuracy. Two matching methods are used for this purpose to detect the most powerful descriptors. The first matching method is a prematching step using Euclidean distance [19], and the second one is a fine matching step using Random Sample Consensus (RANSAC) algorithm [20]. Then, a transformation matrix based on the most powerful key points is generated for mapping one image to the other image. Mean Squared Error (MSE), Peak-Signal-to-Noise Ratio (PSNR), and correlation coefficient (CC) are used to evaluate the overall performance of the registration process.

This paper is organized as follows. Some relevant works concerning this research field are presented in Section 2. CNN is briefly introduced in Section 3. Section 4 discusses the methodology, including the proposed structure, dataset, and assessment criteria. Results and discussion are discussed in Section 5. Finally, the conclusion is given in Section 6.

Recently, AI approaches have been considered for many applications of image registration. The proper usage of AI requires the availability of datasets where common training features exist [14–17, 21–23]. A survey concerning all image matching techniques is introduced starting with traditional approaches, including Area-Based Matching (ABM) and Feature-Based Matching (ABM) methods, followed by a description of matching methods from handcrafted to deep approaches. Its target is analyzing several tasks such as matching and registration using the introduced approach [14, 15, 23]. In [12], it demonstrates a VGG-16 model to register multitemporal RGB remote sensing images. It provides a powerful multiscale feature descriptor and increases the number of inliers matched points to enhance the registration process depending on the Thin Plate Spline (TPS) interpolation technique. It claims that it outperformed Scale Invariant Feature Transform (SIFT) descriptor [24]. In [16], an effective method known as Locality Preserving Matching (LPM) is presented. Its target is how to keep the structure of local adjacent correctly matched. It mainly overcomes the process of mismatch removal keeping the number of true matched points high for the matching process.

In [17], an efficient method known as Neighborhood Manifold Representation Consensus (NMRC) is developed aiming to remove the incorrect matches based on description similarity. It is mainly dependent on the stability of the adjacent topologies around the real correspondences through iterative filter blocks, that is, keeping the structure of local adjacent correctly matched. It enhances the performance under the situation of dead earnest deteriorated information. It can be considered as an optimization formula dependent on NMR and iterative filtering. In [18], a framework for the registration process was proposed based on learning of mapping function among patch pairs and their labels using forward and backward operations. It enhances the overall accuracy compared with the state-of-the-art methods and minimizes the training cost. But it consumes a high processing time to estimate the required mapping functions as an initial step. In [25], a combination of features between CNN and SIFT was proposed. It is based on a manual dataset generation for fine-tuning the VGG-16 model to detect the features. Then, combining CNN features with those of SIFT and feeding the output to PSO-SIFT to perform the registration process, it improved the accuracy and the number of correct correspondences. Hence, [25] contradicts what was presented in [12] as it indicates the importance of SIFT descriptor.

Although the availability of datasets is a big challenge related to SAR images, many researchers have succeeded in applying AI to SAR image applications. The earlier attempts concerning SAR images were based on the utilization of a model for optical ones with few modifications to recover the raised problems due to SAR image unique specifications and the shortage of common SAR datasets. These attempts were designed for the detection of the military vehicle based on a custom dataset [9–17, 21–23]. In [26] as a first attempt, the author used an unsupervised sparse autoencoder and exhibited good efforts acting as a target recognition process. More attempts such as [27–29] were performed based on [26] using the same dataset with different conceptual and operational concepts. More researchers tried to perform different tasks such as ship detection and building detection acting as not only SAR target recognition but also a classifier where SAR signal was used instead of the processed SAR images [2].

All these attempts were mainly dependent on a customized dataset and search for a specific object to be recognized using CNN models. Finally, these attempts addressed the main problem concerning the application of CNN to SAR images; this problem is the lack of SAR datasets for training purposes. Although these attempts were designed to perform specific tasks as SAR target recognition and classification, they did not achieve SAR image registration. So, this paper demonstrates the utilization of the VGG-16 model with little operational concept modifications to perform SAR image registration taking into account SAR image characteristics.

3. Convolutional Neural Network

Artificial Neural Network (ANN) can be defined as adaptive fully automated data driven based on mathematical assumptions. Depending on the flowing data, its structure can be modified. It depicts a group of interconnected neurons performing the information processing. Generally, NN is nonlinear statistical data modeling between the input and the output where the interconnections between different neurons have dedicated weights values. NN has two design forms: forward propagation form and backward (recurrent) propagation form. CNN model, as a subfield of AI, was developed for processing of multiple arrays data forms. It contains 3 layers: input layer, hidden (CNN) layer, and output layer. CNN model may be simple having only one CNN layer or complex having more than one CNN layer. Image registration using CNN models is an interesting research area based on two steps: feature extraction and the estimation of similarity metric among these extracted features. The most important constraint concerning the utilization of the CNN approach is the shortage of common training dataset which covers all aspects of remote sensing images. A single CNN layer contains 3 different layers: convolutional layer, pooling layer, and fully connected (FC) layer as shown in Figure 2 [9–17, 30].

Convolutional layer consists of many filters searching for a specified pattern. Each convolutional layer provides what is called a feature map. Then, an activation function (linear or nonlinear) is applied to produce an activated feature map. The most commonly used activation functions are sigmoid, tanh (hyperbolic transient), identity, and rectified linear unit (ReLU). Pooling layers (maximum or average) target the aggregation of the pixels values within a predefined window size. The number of provided and outcome feature maps is equal but changes in dimension due to the subsampling process. This means that the output dimension equals 1/x of the input activated feature map dimension where (x) is the subsampling mask size or patch size where such that down(.) represents a pooling function [9–11, 15, 17, 25, 30].

As being a fully automated model, CNN has achieved a great success where it can automatically learn the nature of features using input images. Performance of CNN models has a direct proportional relationship to both processing capabilities and dataset availability. These two factors permit CNN model to be trained with millions of parameters, and hence the performance increases. All CNN frameworks focus on the estimation of dedicated metrics provided by using large images without the need for location information [10–13, 18, 25]. On the other hand, CNN suffers from vanishing the gradients and may not work properly for severe distortions. In addition, the data overfitting is one of CNN’s limitations. So, researchers try to solve these limitations via different directions as the modifications of both activation function and training phase strategies, adding normalization layers, and updating the model architecture. Finally, the CNN structure depth is a critical parameter for getting better performance. As being one of the most commonly used tools for feature matching, SIFT descriptor may be combined with CNN’s models to improve both performance and accuracy [10–13, 18, 25]. Figure 3 demonstrates a single CNN model where a kernel filter is applied to the input, and then an activation function is performed followed by the pooling process.

4. Methodology

Visual Geometry Group (VGG) model [18] consists of several convolutional blocks followed by many FC layers. Each convolutional block has two convolutional layers followed by a maximum pooling layer. All convolutional layers and FC layers use the ReLU activation function to overcome the problem of vanishing gradient. Figure 4 represents the basic structure of the VGG network. The operational concept of the VGG network depends on a fixed input size (224 × 224) delivered to the convolutional layer, where the spatial resolution is kept after each convolution process. The convolution process is carried out over a window of size equal to 3 × 3 with a stride equal to 1, whereas the maximum pooling process is carried out over a window of size equal to 2 × 2 with a stride equal to 2.

There are several variants of VGG-Net such as VGG-11, VGG-16, and VGG-19, where they differ in the number of convolutional layers. A VGG network model is mainly used for image classification, where it has many advantages such as being trained on tremendously and different datasets of optical images and has a high performance. The most important advantage of the VGG model is its simple architecture that does not require any shorter alternative routes to strengthen the flow of gradient and hence can be adapted for many image processing applications [31].

One of the most important terminologies is the pretrained model. It is a model which is already trained in an environment similar to the required one. Shortage of the common training dataset covering all aspects of remote sensing images is an important factor related to the model structure. So, pretrained models are very useful in case of no datasets shortage. There are several pretrained models for many remote sensing applications such as feature detection, image recognition, and image classification. One of the most famous pretrained models is VGG-Net which is trained on the ImageNet dataset such that it searches for general patches and is more useful for many applications. Figure 5 shows the general workflow diagram.

The operational target of the registration process is how to transform sensed image to coincide with reference image. The general workflow of the registration process has four steps where the reference image is kept without any changes, and only the sensed image will be changed. First, a preprocessing step is performed due to the different characteristics between optical and SAR to overcome the raised problems such as SAR dynamic range, SAR signal statistics, SAR imaging geometry, and complex nature of SAR data. Second, a set of key points will be detected from sensed image, and another key point set will be detected from the reference image. Then, a description process is carried out. Third, a matching step is performed to determine which points are classified as inliers that are important and outliers which are needed to be removed. Finally, the required transformation matrix is generated using the inliers set of key points and applied to the sensed image to produce the registered image.

4.1. Preprocessing of SAR Images

As mentioned before, some problems should be considered while applying AI approaches to SAR images such as SAR dynamic range, SAR signal statistics, imaging geometry, and SAR image specifications. These problems should be recovered to get high performance. SAR images have a wide dynamic range which may reach 90 dB where pixels are within low or high values of dynamic range. Since CNN networks cannot control this range, preprocessing steps should be performed as a normalization process to recover this dynamic range safely. SAR signal statistics plays an important role where it takes a complex form (real and imaginary). Hence, the speckle effect must be recovered to recapture the features. Although attempts to recover the speckle effect, such as the usage of complex reduction filters, will reduce the dynamic range, CNN can approximately imitate the features inside SAR image. Since the SAR imaging geometry is not the same as that of multispectral, any attempts to generate a SAR dataset will provide meaningless images. In addition, SAR image specifications are very important where the phase information plays an important role. According to the above mentioned, the used CNN network should be able to perform all operations related to a distorted phase concerning different degrees [2, 5, 11].

4.2. Feature Detection and Description

To recover the raised problems due to the different characteristics between SAR and multispectral images, two considerations should be taken into account for SAR feature detection. The first one is the characteristics of the SAR image, and this is recovered by performing a preprocessing step before being applied to the model by using the Gaussian kernel filter. The second is the shortage of common SAR datasets, and this will be recovered using a pretrained model based on multispectral dataset with minor or major modifications. The VGG-16 model is utilized for key point detection and extraction in many neural network techniques where there are many filters of small sizes embedded in the convolutional layers searching for a certain pattern inside the input image. The depth of the CNN model can be considered as a measurement of how many layers are included, especially the number of convolutional blocks (convolutional and pooling layers). As the CNN depth increases, the search scaling increases, and the model will be more complex as the number of patterns becomes larger. The searching strategy of the VGG model is mainly dependent on universal features and patterns. Also, since the VGG model is a pretrained model based on optical images not SAR, the detection of features requires a slight modification for the output of the model. The detection process is strongly depending on the created patches of fixed sizes from the SAR image. Then, the identification process, whether patches are related to objects or not, is performed depending on the model’s parameters.

Firstly, the input images are resized to a size which is a multiple of 32. 224 × 224 single layer (amplitude) of SAR image will be provided to the network after applying a Gaussian kernel filter through a bank of convolutional blocks. Each block uses filters whose receptive field is small and equals 3 × 3. The stride for the convolution process is one pixel where its padding is selected to preserve the spatial resolution after each convolution process. Each 3 × 3 convolution requires one-pixel padding. After each convolution process, a spatial maximum pooling process is carried out concerning a window of size 2 × 2 with a stride equal to 2. ReLU is used as an activation function after each convolutional block. The model output is delivered from certain layers covering various sizes as shown in Figure 6 where h and present the height and width of the input image. Since the feature description process is mainly dependent on the detected feature, performing proper description requires the output to be delivered from certain layers covering different sizes.

There are many handling data managers for the utilization of CNN models such as TensorFlow. These handling tools collect the data from various layers and feed others. These data differ in size. Starting from this point, Kronecker product (⊗) is enrolled due to the utilization of outputs from different convolutional blocks 3, 4, and 5. Kronecker product is one of the most widely used tools for matrices operations depicting the operations on two different sizes matrices to provide what is known as a block matrix or tensor product matrix. A tensor object is a special type of multilinear map where its input is a certain number of matrices of different sizes generated from specific nodes within a manifold. Its output is the tensor metric, where this metric is a scalar representing the local dot product of these matrices concerning this node. This operation is an encoding process for these matrices data such as lengths and angles among them. This operation is mainly dependent on the gradient values concerning neighborhood key points. From the conceptual point of view, each node on the manifold has only one tensor metric, and any changes in the metric tensor reflect how the data encoding varies [32–34].

Let (A) be m × n matrix with ij^th element a_ij for i = 1,..., m and j = 1,..., n and (B) be any r × q matrix. Then, A⊗B is a block (partitioned) matrix (I) of size mr × nq formed by multiplying each a_ij element by the entire matrix B as follows:where (I) is depicted by blocks or submatrices as a₁₁B, a_1nB, …, a_m1B, and a_mnB. A block matrix (I) can be generated depending on the required division for its rows and columns. Tensor product of vector spaces A_m×n and B_r×q is itself a vector space I_mr×nq and has a dimension equal to the product of the two vectors’ dimensions. So, the tensor product is distinguished from direct summation vector spaces whose output has a dimension equal to the summation of these two vector spaces dimensions. So, the tensor product approach is considered to be the most common approach in case of handling fewer constraints [32, 33, 35].

Since the detection operation of the proposed approach is based on different size layers, the activated feature map from convolutional blocks 3, 4, and 5 is provided for the description process. These activated feature maps include common patterns. First, the output of convolutional block number 3 is an image of size 32 × 32. A descriptor of 16 bins will be generated for each window of size 8 × 8, considering its center as a key point inside the pooling layer number 3. The output feature map, M1, from convolutional block number 3 has a length of 256, and its key points are centers with blue color as shown in Figure 7(a).

(a)

(b)

(c)

The detection process concerning convolutional block number 4 is obtained in a slightly different way based on the tensor object operational point of view and the Kronecker product. Detected key points will be generated for each window of size 16 × 16 inside the pooling layer number 4 considering its center as a key point with green color as shown in Figure 7(b). Equation (2) represents the generation of the output feature map, M2, from pooling layer number 4 using the Kronecker product as follows:where P₄ is the output of pooling layer number 4 and T presents the tensor product with 1 stride.

Detected key points are generated for each window of size 32 × 32 inside the pooling layer number 5, considering its center as a key point with red color as shown in Figure 7(c). Equation (3) represents the generation of the output feature map, M3, from pooling layer number 5 using the Kronecker product as follows:where P₅ is the output of pooling layer number 5 and T presents the tensor product with 1 stride. Then, features maps M1, M2, and M3 are normalized to the unit variance as follows:where σ is the standard deviation of matrix elements [5, 12].

4.3. Feature Matching

The feature matching process is carried out via two steps: the first one is the prematching step, while the second is the fine matching step. For the prematching step, the Euclidean distance metric [19] of the detected feature is defined and estimated between the respective feature descriptors. Distance metric of two features (x and y) is composed of the summation of three distances, each corresponding to the selected layers’ output descriptor concerning this point as follows:where value is Euclidean distance value. Since the output of convolutional block 3 is 256-D and outputs of blocks 4 and 5 are 512-D each, the value is used to compensate for this dimensional difference. Euclidean distances , , and are concerning the outputs of the third, fourth, and fifth convolutional blocks, respectively, and can be expressed as (6). The sensed features are matched to its congruent inside the reference image if its distance is the smallest distance measurement for all distance measurements according to a predefined threshold value (θ); that is, .

A patch whose center is a matched point is generated after satisfying these two conditions. Since these generated patches may be partially or completely overlapped, a fine matching step is used to overcome this issue. RANSAC approach [20] is used to coherently remove the outliers. The remaining inliers set of matched points are then utilized to generate the required transformation matrix to align the sensed features to their correspondence in the reference set. The outlier removal process enhances the transformation matrix generation and subsequently the accuracy of the registration process.

5. Dataset, Result, and Discussion

5.1. Data and Computational Cost

Two aspects of SAR image sets are used where their specifications are listed in Table 1. The first image is an intensity image captured by Space Shuttle Imaging Radar (SIR-C) for Kilauea, Hawaii, dated May 1999 as shown in Figure 8. The second image is an intensity image captured by Terra-SAR-X (strip map) for Aswan High Dam, Egypt, as shown in Figure 9. Computational cost is performed on a machine with Intel(R) Core (TM) i7-4790 CPU @ 3.60 GHz and 16 GB installed RAM.

5.2. Evaluation of the Proposed Approach

There are many metrics to evaluate the performance of the registration process. MSE, PSNR, and CC are used to measure the similarity between images before and after the registration process. Although being one of the most commonly used tools for feature matching, SIFT descriptor [24] is not good for SAR images because the speckle noise affects the image quality producing many false interest points, and this impedes Laplacian of Gaussian (LoG) from working properly. Also, multiplicative noise will affect the estimations of orientation’s histogram. So, there are many modifications based on SIFT state of the art which were introduced to enhance the performance of SIFT in general for multisensor imaging. In addition, SIFT is combined with the CNN to improve the performance accuracy.

The proposed approach is compared to correlation approach as an ABM method, SIFT [24] as a floating descriptor, Oriented FAST and Rotated BRIEF (ORB) [36] as a binary descriptor, and the CNN model proposed in [12]. Since SIFT and ORB descriptors are not related to the same category of the proposed approach, so they are used as monitoring tools to visually evaluate the performance of the proposed approach. Subset, translation, scaling, and rotation cases are included in the evaluation process. Due to the shortage of a common training dataset covering all aspects of SAR images, each sensed image related to each case of study is generated automatically.

5.3. Experimental Results

The experimental workflow to perform an evaluation of the proposed approach performance is shown in Figure 10. There is no doubt that making a dependable comparison related to a specific topic requires the existence of all the metrics used for this area during a certain period of time. But, due to the shortage of these measurement results, these results may be regenerated again by the recreation process of its source code. Performance results of the first and second data sets are shown in Tables 2 and 3, respectively, concerning the four cases of study and including the most famous metrics measurements. Figures 11 and 12 depict the output registered image using the proposed approach compared to different approaches concerning the four cases of study.

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)

5.4. Discussion

Figures 13 and 14 represent the performance evaluation, including MSE, PSNR, and CC related to the first and second datasets, respectively. Detection of features based on the CNN model is an advantage where there is no prior knowledge about the nature of features inside the image. Results demonstrate that the proposed method can successfully recover the deformation error related to translation and scaling cases of studies for the used image pairs. However, it cannot recover the rotation issue as the correlation approach and the approach in [12] did. Also, it performs more effectively than the approach in [12]. The proposed approach uses a spline interpolation method, while the approach in [12] uses the Thin Plate Spline (TPS) interpolation technique which is more complex as it requires ground points to calibrate the results. Concerning scaling, the proposed approach outperforms the ORB descriptor, correlation approach, and approach in [12] considering MSE, PSNR, and CC.

(a)

(b)

(c)

(a)

(b)

(c)

Despite having the ability to overcome the scaling problem, the correlation method requires the sensed image to be resized first in order to have the same dimensions as the master image. This of course affects the values of pixels and cannot indicate the correct results. So, the correlation approach should not be considered for the scaling problem. Also, no doubt that a relatively large reduction in matched points is not recommended since it will strongly affect the result if these points are not powerful enough. But, this is not the case for the proposed approach as it has the ability to detect and extract the most powerful key points. So, using Random Sample Consensus (RANSAC) approach as a fine-tuning enhances the removal of outliers and hence improves the accuracy of the registration process. On the other hand, the proposed method cannot recover the deformation error related to rotation. This is a common problem for all CNN models applied to SAR data. In addition, as mentioned before, any attempts to generate a SAR dataset may provide meaningless images because the imaging geometry is not real to reflect real aspects.

Concerning the consumed processing time, the results show that the proposed approach overcomes that presented in [12] as shown in Table 4.

6. Conclusion

SAR image registration process is performed based on the utilization of the Visual Geometry Group (VGG-16) model. Due to the unique characteristics of SAR images such as dynamic range, signal statistics, imaging geometry, and complex nature of data, some problems arise and should be considered. Feature detection is based on a slight modification to the VGG-16 model. First, a Gaussian kernel filter is used to smooth the intensities variations, followed by input image resizing to a multiple of 32, and then applied to the proposed approach for feature detection such that the outputs from predefined layers are picked up. The location of the detected feature is extracted, followed by the description process. Two matching methods are used to generate a more accurate transformation matrix: Euclidean distance and RANSAC algorithm. Finally, a warping process is carried out to align the sensed image to the reference image.

Results show that the proposed approach has high performance and the ability to provide high-level features and reasonable outputs. In addition, it provides descriptors that preserve the localization data. The proposed approach has the ability to detect and extract the most powerful key points, so using the RANSAC approach as a fine-tuning enhances the removal of outliers and hence improves the accuracy of the registration process. Although the calculations of fine matching parameters are included, it may be considered an end-to-end SAR image matching approach. As a future direction, the proposed approach is planned to be applied to real SAR datasets covering all aspects of images and combined with other CNN models to improve performance.

Data Availability

The data are available any time if needed from the authors Abdelmoty M.Ahmed (e-mail: abd2005moty@yahoo.com) and Mohammed Elwan:(e-mail: msafy@eaeat.edu.eg).

Conflicts of Interest

The authors declare no potential conflicts of interest.

Acknowledgments

The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through the Research Group Project (Grant no. RGP.1/157/42).

References

S. Paul and U. C. Pati, “A comprehensive review on remote sensing image registration,” International Journal of Remote Sensing, vol. 42, no. 14, pp. 5396–5432, 2021.
View at: Publisher Site | Google Scholar
X. Zhu, S. Montazeri, M. Ali et al., “Deep learning meets SAR: concepts, models, pitfalls, and perspectives,” IEEE Geoscience and Remote Sensing Magazine (GRSM), vol. 9, 2021.
View at: Publisher Site | Google Scholar
A. S. Eltanany, A. S. Amein, and M. S. Elwan, “A modified corner detector for SAR images registration,” International Journal of Engineering Research in Africa, vol. 53, pp. 123–156, 2021.
View at: Publisher Site | Google Scholar
A. A. Goshtasby, Theory and Applications of Image Registration, John Wiley & Sons, Hoboken New Jersey, 2017.
M. P. S. Tondewad and M. M. P. Dale, “Remote sensing image registration methodology: review and discussion,” Procedia Computer Science, vol. 171, pp. 2390–2399, 2020.
View at: Publisher Site | Google Scholar
V. T. Wang, Registration of Synthetic Aperture Imagery Using Feature Matching, University of Canterbury, Christchurch, 2018.
A. Amein and A. H. El-Tanany, “Advanced selective key point detection techniques,” Journal of Engineering Science and Military Technologies, vol. 3, no. 2, pp. 61–69, 2019.
View at: Publisher Site | Google Scholar
M. Hassaballah, H. A. Alshazly, and A. A. Ali, “Analysis and evaluation of keypoint descriptors for image matching,” Recent Advances in Computer Vision, Springer, Cham, pp. 113–140, 2019.
View at: Publisher Site | Google Scholar
L. Ma, Y. Liu, X. Zhang, Y. Ye, G. Yin, and B. A. Johnson, “Deep learning in remote sensing applications: a meta-analysis and review,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 152, pp. 166–177, 2019.
View at: Publisher Site | Google Scholar
M. Z. Alom, T. M. Taha, C. Yakopcic et al., “A state-of-the-art survey on deep learning theory and architectures,” Electronics, vol. 8, no. 3, p. 292, 2019.
View at: Publisher Site | Google Scholar
M. Vakalopoulou, S. Christodoulidis, M. Sahasrabudhe, and N. Paragios, “Deep learning for image matching and Co‐registration,” Deep Learning for the Earth Sciences: With Applications, Wiley, Hoboken, New Jersey, pp. 120–135, 2021.
View at: Publisher Site | Google Scholar
Z. Yang, T. Dan, and Y. Yang, “Multi-temporal remote sensing image registration using deep convolutional features,” IEEE Access, vol. 6, pp. 38544–38555, 2018.
View at: Publisher Site | Google Scholar
T. Georgiou, Y. Liu, W. Chen, and M. Lew, “A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision,” International Journal of Multimedia Information Retrieval, vol. 9, no. 3, pp. 135–170, 2019.
View at: Publisher Site | Google Scholar
J. Ma, X. Jiang, A. Fan, J. Jiang, and J. Yan, “Image matching from handcrafted to deep features: a survey,” International Journal of Computer Vision, vol. 129, no. 1, pp. 23–79, 2021.
View at: Publisher Site | Google Scholar
X. Jiang, J. Ma, G. Xiao, Z. Shao, and X. Guo, “A review of multimodal image matching: methods and applications,” Information Fusion, vol. 73, pp. 22–71, 2021.
View at: Publisher Site | Google Scholar
J. Ma, J. Zhao, J. Jiang, H. Zhou, and X. Guo, “Locality preserving matching,” International Journal of Computer Vision, vol. 127, no. 5, pp. 512–531, 2019.
View at: Publisher Site | Google Scholar
J. Ma, Z. Li, K. Zhang, Z. Shao, G. Xiao, and R. Sensing, “Robust feature matching via neighborhood manifold representation consensus,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 183, pp. 196–209, 2022.
View at: Publisher Site | Google Scholar
S. Wang, D. Quan, X. Liang, M. Ning, Y. Guo, and L. Jiao, “A deep learning framework for remote sensing image registration,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 145, pp. 148–164, 2018.
View at: Publisher Site | Google Scholar
M. Greenacre and R. Primicerio, “Measures of distance between samples: Euclidean,” Multivar. Anal. Ecol. Data, FBBVA, pp. 47–59, 2013.
View at: Google Scholar
J. Janicka and J. Rapinski, “Outliers detection by RANSAC algorithm in the transformation of 2D coordinate frames,” Boletim de Ciências Geodésicas, vol. 20, no. 3, pp. 610–625, 2014.
View at: Publisher Site | Google Scholar
W. Yuan, B. Eckart, K. Kim, V. Jampani, D. Fox, and J. Kautz, “DeepGMR: learning latent Gaussian mixture models for registration,” in Proceedings of the European Conference on Computer Vision, pp. 733–750, Springer, Cham, Switzerland, August 2020.
View at: Google Scholar
X. Zhang, W. Jian, Y. Chen, and S. Yang, “Deform-GAN: an unsupervised learning model for deformable registration,” 2020, https://arxiv.org/abs/2002.11430.
View at: Google Scholar
Z.-Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, “Object detection with deep learning: a review,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3212–3232, 2019.
View at: Publisher Site | Google Scholar
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
View at: Publisher Site | Google Scholar
F. Ye, Y. Su, H. Xiao, X. Zhao, and W. Min, “Remote sensing image registration using convolutional neural network features,” IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 2, pp. 232–236, 2018.
View at: Publisher Site | Google Scholar
S. Chen and H. Wang, “SAR target recognition based on deep learning,” in Proceedings of the 2014 International Conference on Data Science and Advanced Analytics (DSAA), pp. 541–547, Shanghai, China, November 2014.
View at: Publisher Site | Google Scholar
D. A. E. Morgan, “Deep convolutional neural networks for ATR from SAR imagery,” SPIE Proceedings, vol. XXII, Article ID 94750F.
View at: Publisher Site | Google Scholar
J. Ding, B. Chen, H. Liu, and M. Huang, “Convolutional neural network with data augmentation for SAR target recognition,” IEEE Geoscience and Remote Sensing Letters, vol. 13, pp. 1–5, 2016.
View at: Publisher Site | Google Scholar
K. Du, Y. Deng, R. Wang, T. Zhao, and N. Li, “SAR ATR based on displacement- and rotation-insensitive CNN,” Remote Sensing Letters, vol. 7, no. 9, pp. 895–904, 2016.
View at: Publisher Site | Google Scholar
C. C. Aggarwal, “Convolutional neural networks,” Neural Networks and Deep Learning, Springer, Cham, pp. 315–371, 2018.
View at: Publisher Site | Google Scholar
K. Kuppala, S. Banda, and T. R. Barige, “An overview of deep learning methods for image registration with focus on feature-based approaches,” International Journal of Image and Data Fusion, vol. 11, no. 2, pp. 113–135, 2020.
View at: Publisher Site | Google Scholar
B. K. Moser, “Linear algebra and related introductory topics,” Linear Models: A Mean Model Approach, Elsevier, Amsterdam, Netherlands, pp. 1–22, 1996.
View at: Google Scholar
E. o. Mathematics, “Tensor product,” 2020, https://encyclopediaofmath.org/index.php?title=Tensor_product.
View at: Google Scholar
W. M. World, “Vector space tensor product,” 2022, https://mathworld.wolfram.com/VectorSpaceTensorProduct.html.
View at: Google Scholar
K. K. Wu, Y. Yam, H. Meng, and M. Mesbahi, “Kronecker product approximation with multiple factor matrices via the tensor product algorithm,” in Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 004277–004282, Budapest, Hungary, October 2016.
View at: Publisher Site | Google Scholar
E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “ORB: an efficient alternative to SIFT or SURF,” in Proceedings of the 2011 International conference on computer vision, pp. 2564–2571, Barcelona, Spain, November 2011.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Mohammed Elwan et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Security and Communication Networks

SAR Image Matching Based on Local Feature Detection and Description Using Convolutional Neural Network

Abstract

1. Introduction

2. Related Works

3. Convolutional Neural Network

4. Methodology

4.1. Preprocessing of SAR Images

4.2. Feature Detection and Description

4.3. Feature Matching

5. Dataset, Result, and Discussion

5.1. Data and Computational Cost

5.2. Evaluation of the Proposed Approach

5.3. Experimental Results

5.4. Discussion

6. Conclusion

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright