Abstract

One of the most challenging aspects of satellite remote sensing is image fusion. Image fusion increases the visual interpretation of the image and has many applications such as monitoring water bodies, land cover, urbanisation, agriculture, national defence, and so forth. Remote sensing applications require images with a high spatial and spectral resolution for accurately processing and distinguishing the land cover classes with fine texture and shape details. Due to technical limitations, most satellites cannot take high-resolution multi-spectral images but can get high-resolution panchromatic images and low-resolution multi-spectral satellite images separately. This article proposes a novel fusion method, geospatial data fusion (GDF), to obtain high-resolution multi-spectral images. GDF, along with three well-known fusion methods viz., Brovey Transform (BT), wavelet transform (WT), and Fourier transform (FT), have been comparatively implemented to fuse the Cartosat-2 and Sentinel-2 imageries of the Sangam area of Prayagraj, India. The fusion has been done to extract the earth’s surface features from the fused imagery. In this research, the fused image is utilised for river water mapping. Results confirm that the GDF outperforms other existing fusion methods and successfully maps the river water in the study area.

1. Introduction

Remote sensing is the procedure of gathering data about the physical characteristics area without coming into direct contact with it. It is accomplished by sensing, collecting, and processing data of the reflected or emitted radiation of the area’s physical characteristics. Satellite gathers details of the Earth’s surface via remote sensing and stores them in images, which are called satellite images [1]. The steps of remote sensing are represented in Figure 1.

The satellite can collect images in multiple bands. Image registration is the process of aligning different images to the same scene, and it is the primary requirement for image fusion. Image fusion is the procedure of merging the information of multiple images from one or multiple sources into one image, which carries extra feature information than is contained in every individual one. Image fusion methods are mainly classified as pixel level, feature level, and decision level [2]. Features are a particular type of information getting from an image, e.g., in satellite images, one can detect water bodies, buildings, urban areas, agriculture areas, bridges, and road networks. Identification of this particular type of information in images is called feature extraction. There are many techniques for feature extracting, e.g., segmentation-based, object-based, region-based, and machine learning-based.

1.1. Motivation

Most satellites do not directly acquire high-resolution multispectral images. However, two critical practical constraints must be overcome in order to fulfil this requirement. The first constraint is regarding the radiation entering the satellite sensor. The second constraint is the amount of data accumulated by the satellite sensor. On the other hand, the multispectral band deals with a narrower spectral area, whereas the panchromatic (PAN) band deals with a broader wavelength area. A multispectral detector can be larger than a panchromatic detector in order to gather an equivalent volume of energy. As a result, the panchromatic images on the same satellite can have a higher resolution than multispectral images.

Information on high-resolution multispectral images is considerably more than high-resolution panchromatic and low-resolution multispectral images [2, 3]. Also, one can fuse images of different satellites. As fused images have more information, one can select features accurately. This research addresses the fusion of Cartosat-2 and Sentinel-2 images using GDF and some well-known fusion methods for comparative assessment. Further, the feature is extracted using the segmentation-based Otsu threshold method from the fused images and also from the original (input) images; after that, both the results are compared.

1.2. Image Fusion Methods

The flowchart represented in Figure 2 is the view of how image fusion works. Some well-known satellite image fusion methods are discussed as follows and some of them are also utilised in this research work.

1.2.1. Brovey Transformation (BT)

The BT is a straightforward image fusion technique. It keeps track of each pixel’s individual spectral distribution. It replaces the entire brightness of the image with a high-resolution PAN image [4]. It follows colour normalised spectral sharpening. BT works on any number of bands. The Brovey transform normalises multispectral bands and multiplies the resulting channels with the intensity or brightness channel. The well-known spherical coordinate conversion of the RGB image is expressed in Figure 3, and steps are shown through Equations (1)–(6).and reverse transformation is:

The PAN sharpening algorithm precisely replaces rearranged intensity in place of . expressed as follows:where means panchromatic image. For m bands image fusion there are m−1 angles are evaluated for each pixel. From Figure 3, related angles can be described as shown in Equations (8) and (9).

If Equations (8) and (9) are applied in Equations (4)–(6) and taking then Equations (10)–(12) are formed as follows:

This also can be written as matrix as shown in Equation (13)

1.2.2. Fourier Transformation

It is feature-level image fusion technique. Let low-resolution multispectral image be and high-resolution panchromatic image be . Performing Fourier Transform (FT) on low-resolution image and high-resolution image yields and , respective in the spatial-frequency domain. As spectral features are in lower frequency bands and spatial features are in higher frequency bands. Thus, apply low and high-spatial resolution filters, respectively, on FT of low and high-resolution images, fuse them and perform inverse FT to get spatial domain fused image, as expressed in Equation (14).where fused image in the frequency domain, is low-resolution filter and is high-resolution filter.

1.2.3. Wavelet Transformation

It is a multiresolution analysis and depends on the discrete wavelet. This method uses panchromatic and multispectral images to perform discrete wavelet transforms (WTs). The output is the corresponding decomposition as HH, HL, LH, and LL for both panchromatic and multispectral images. HH, HL, LH, and LL contain diagonal detail, vertical detail, horizontal detail, and approximation coefficient, respectively. After that, the average of the same type of coefficients of panchromatic and multispectral images is computed. For example, taking HH coefficients of panchromatic and multispectral and computing its average. This results in averaging coefficients as , , , and . The inverse WT is computed from these averaging coefficients to get the fused image.

1.3. Evaluation Metrics

One has to use evaluation metrics to check how image fusion methods perform. There are many evaluation metrics to evaluate the performance of image fusion. Evaluation can be done with a reference image or without a reference image [5]. These metrics are based on mathematical models. Root-mean-squared error (RMSE), correlation coefficient (CC), and peak signal-to-noise ratio (PSNR) are a few of the well-known metrics employed for assessment [5]. These well-known metrics use the multispectral image as a reference image. Multispectral images from Sentinel-2 are better utilised for this purpose as reference images. Entropy and standard deviation (SD) are other metrics which do not require a reference image. Referenced image is denoted as and fused image denoted as in the following evaluation metrics.

1.3.1. Root-Mean-Squared Error (RMSE)

RMSE is an excellent metric to evaluate the deviation of fused and referenced images. It compares the deviation among the fused and reference images by calculating the inequality in pixel values. RMSE is zero if the fused image is close to the reference image [5]. RMSE for a fused image having M rows and N columns, with respect to referenced image of same resolution, can be calculated using Equation (15) as follows:

1.3.2. Correlation Coefficient (CC)

The CC indicates the commonality of spectral features between images. CC is close to + 1 if the fused image is similar to the reference image [5]. CC is computed with the formula expressed in Equation (16) as follows:where Crf is the correlation between pixels of referenced image and fused image, Cr is the correlation between pixels of referenced image, and Cf is the correlation between pixels of fused image.

1.3.3. Standard Deviation (SD)

SD computes the contrast in a fused image. The higher the SD’s value better the contrast [5]. Equation (17) is used to compute SD for an image having I as total number of pixels.

1.3.4. Entropy

Entropy computes the detailed content of images. The larger the entropy value more details are in the image [5]. Entropy is computed for an image having L grey levels, through the formula expressed in Equation (18).

1.3.5. Peak Signal-to-Noise Ratio (PSNR)

PSNR is measured by dividing the grey levels of the image by the respective pixels in the fused and referenced image. Good fusion needs a higher value of PSNR [5]. PSNR for a fused image having M rows and N columns, with respect to referenced image of same resolution, can be calculated using Equation (19).

1.4. Elbow Method

There is another point of interest in this research which is elbow method. The elbow method is used in the K-means algorithm to get the optimal number of clusters (K) from the dataset. In K-means, the number of clusters (K) is varied from 1 to N, where N is a natural number greater than 1. For each value of K, the within-cluster sum of square (WCSS) is computed. WCSS is the sum of the squared distance between each data point and centroid. When a line graph is plotted for the WCSS value versus the K value, the plot looks like an elbow, as shown in Figure 4. On analysis, it can be observed that the graph changes rapidly at a point and thus creating an elbow shape. This point is the optimal K value of clustering. The optimal value K specifies that after this point if we increase number of clusters, then the error WCSS would not be reduced too much and the overhead of managing extra clusters is there. Hence, it is ideal to choose K clusters for low overhead and better accuracy. Elbow method has been discussed here as it will be utilised in the proposed algorithm GDF for introducing a parameter K to redefine fusion.

In order to produce high-resolution multi-spectral images, this article suggests a novel fusion technique called geospatial data fusion (GDF). The Cartosat-2 and Sentinel-2 imageries of the Sangam area of Prayagraj, India, have been fused using GDF in comparison to three well-known fusion methods, namely the BT, WT, and FT. The footage has been merged in order to extract the features of the earth’s surface. The fused image is employed in this study to map the water content of rivers. This article is organised into five sections. Section 2 discusses the related research in this context. Section 3 presents the study area and discusses the utilised dataset. Section 4 proposes the GDF algorithm and discusses its implementation along with the other techniques. Section 5 discusses the result along with its analysis. Finally, Section 6 concludes this article.

2. Literature Survey

Numerous image fusion methods are divided into three categories: pixel level, feature level, and decision level [2]. IHS, BT, and PCA are the widely used fusion methods in remote sensing, which are independently evaluated using various mathematical metrics and compared among them. These methods preserve the spatial resolution of PAN images but distort the spectral characteristics [6]. With the increased computational power, one can use multi-resolution techniques such as wavelet and FTation for image fusion [7]. These transformation approaches use the frequency domain and dissolve images into many channels based on their particular frequency content.

Feature level techniques are at a greater processing level than pixel level [2]. In this first, feature extraction is applied; after that, fusion will be applied. Some feature-level methods are multi-resolution, retina-based, and learning-based fusion. The highest level of processing is done in the decision-level techniques [2]. It first classifies image data and then applies fusion to them. Commonly used decision level methods are rank-based decision, Bayesian inference, and fuzzy decision rule [2].

Wang et al. [8] worked on land cover change detection. The authors used IKONOS-2, WorldView-3 images, GF-1 images, and cross-fused images, which generated Gram–Schmidt adaptive image fusion. Wu et al. [9] proposed PCA and curvelet transform-based image fusion methods. Compared with IHS, PCA, Brovey, and curvelet transform, these methods perform better. Luo et al. [10] proposed a GAN-based satellite image fusion method and tested it on Landsat-8 and Sentinel-2 images. Ghahremani et al. [11] proposed compressive sensing-based remote sensing image fusion. The authors used Deimos-2, GeoEye-1, and QuickBird-2 satellite images as a dataset and compared it with the IHS, Brovey method.

Ishii et al. [12] worked on detecting and classifying buildings. The authors used MS images from the Landsat-8 satellite and applied the CNN approach for object recognition to identify buildings in images. Byun et al. [13] worked on flood area extraction and change identification. They used a cross-fused method using Gram–Schmidt adaptive Image Fusion on KOMPSAT-2 data. Akshay et al. [14] proposed the detection of unused landscapes using CNN. The authors used 1,000 images from Euro and Landsat sources. This method shows high accuracy.

Rishikeshan and Ramesh [15] proposed shoreline identification using a mathematical morphology-based algorithm. The authors used PAN images from Cartosat-2 and Cartosat-1, LISS IV for Resourcesat-2, and ETM + Landsat-8 images as the dataset [16]. The obtained fusion results were compared for the different datasets’ results. Zhou et al. [17] worked on multi-scale water body identification in urban areas. The authors used NDWI for thresholding and SVM for classifying water body areas in urban environments. Naik and Anuradha [18] proposed a time-series analysis for the water body of the reservoir Nagarjuna Sagar, Nalgonda, Telangana, from 2014 to 2019. The authors used Landsat data and NDVI, NDWI, MNDWI and AWEI water indexes for water body observation.

Abraham and Sasikumar [19] proposed the detection of bridges from high-resolution satellite images. The authors used threshold algorithms to detect bridges, irrespective of their shape, automatically. Researchers have also proposed water body observation from multi-sensor images using a deep learning method [20]. The technique utilised is dense local feature compression networks for water body extraction. Compared with the other convolutional models [21], this method shows it is highly accurate for water body extraction. Otsu [22] proposed a classical threshold selection process known as Otsu’s method. The article by Otsu describes the Otsu threshold method, which uses maximum in-between class variance to get an automatic threshold.

This research aims to map river water bodies in the study area of the Sangam region at Prayagraj, India, utilising fusion methods. The article proposes a novel fusion method, GDF and does its comparative assessment with three well-known fusion methods, viz., pixel level BT, feature level WTation, and FT. This research work utilises these fusion methods to get high-resolution multispectral images from the fusion of high-resolution PAN image and low-resolution MS images. The aforesaid methods are used to fuse the 1 m PAN image from Cartosat-2 and the 10-m multispectral image from Sentinel-2. The challenge behind the fusion is dealing with spectral mixing, which occurs due to highly similar features of the PAN image. Further, the fused images are used for the feature extraction and to resolve the misclassification due to spectral mixing.

3. Study Area and Dataset

This work is conducted in the Sangam area of Prayagraj city. Prayagraj, also known as Allahabad, is in the southern part of Uttar Pradesh, India, at the confluence of the Ganges, Yamuna, and Saraswati (mythical). The geographical coordinates of the region are bounded between 23°27′19.84″N, 81°51′28.24″E and 25°23′53.69″N, 81°55′12.25″E.

The data set from two sensors viz., Cartosat-2 and Sentinel-2, are summarised in Table 1 and shown in Figure 5.

4. Proposed Methodology

Figure 6 represents the workflow of the proposed method. It shows a higher view of the method. The proposed methodology can be broadly summarised into two-parts: image fusion and feature extraction. The two input images from Cartosat-2 and Sentinel-2 have been considered and registered to match the resolution and one-to-one correspondence of the two images. Further, the discussed methods of image fusion are implemented for the fusion of input images. The image fusion methodology and algorithm have been elaborated in Section 4.1. To ensure the quality of fusion, the evaluation metrics have been computed for the assessment of utilised fusion methods. On the basis of evaluation metrics, the resultant fused imagery from the outperforming method has been selected for the feature extraction. The modified Otsu’s method has been implemented in such a way that it extracts terrestrial features iteratively. The feature extraction is discussed in detail in Section 4.2. After the feature extraction has been done, the output is analysed for determining the type of extracted feature. Like, water body, urban area, bare soil, agricultural land, forest land, and so forth.

4.1. Image Fusion

This article proposes a novel method for image fusion, termed as GDF, which addresses the shortcomings of existing algorithms. GDF can command over the shift between the spatial and spectral information. In Brovey and other existing transformations, a colour distortion is observed when intensity (I) is zero. GDF provides the solution to the shortcomings of existing algorithms in case when R, G, and B values are ZEROES; as it leads to division by zero error. Hence, to overcome this issue, GDF induced a new parameter K, to decide the optimal transformation degree for employing in image fusion. GDF algorithm has three major steps. In the first step, intensity is computed as average of red, green, and blue components. In the second step, the value of K is computed using Elbow’s method, as discussed in Section 1.4. In the third step, adjusted intensity is calculated, which in turns compute fused RGB components.

Instead of division by PAN (as in other existing well-known methods), the denominator is replaced by {I + K (PAN−I)}. GDF can control the degree of transformation using parameter K. K’s value can be between 0 and 1. In the current research, value of K is 0.1. The induced formula, I + K (PAN−I), is a linear equation if 0 < = K < = 1. Through the variation in value of K, one can easily control over degree of transformation as per requirement of fusion. Algorithm 1 shows the steps of the proposed methods.

1: Start GDF with Inputs: R, G, B, and PAN
Where R, G, B, and represents the Red component, Green component, Blue component, and the matrix of the panchromatic image, respectively.
2: Compute intensity (I), which is average of RGB values, as per Equation (20).
3: Calculate parameter K’s value using Elbow’s method.
4: Compute rearranged intensity Iadj as per the proposal in pan sharpening algorithm using Equation (21).
5: Compute fusion components using Equations (22)–(24)

If Equation (21) is put in Equations (22)–(24), then we obtained Equations (25)–(27) for computing fused components.

The final formula for GDF in matrix form is shown in Equation (28), which can fuse images based on RGB components.

The optimal transformation parameter K is used in conjunction with PAN matrix values to obtain new fusion parameters. The value of K is varied from 0 to 1 with increment of 0.1 like 0, 0.1, 0.2, 0.3, …, 0.9, 1. Applying these different K values in Equation (9), numerous fused images can be obtained. If the multispectral image is assumed as reference image and RMSE is calculated with fused image for each K value, then the graph plotted of RMSE versus the value of K, gives a shape of elbow. When the graph is analysed, it can be seen that the graph will rapidly change at a point after which graph will be likely parallel to x-axis. Thus, it creates an elbow shape at this point. This point is the optimal K value.

The working of GDF is summarised in the block diagram shown in Figure 7. It first coregister the PAN and RGB bands of multispectral images and resample them to the same size. It apply GDF algorihtm to obtain fused RGB bands and merge them all to acquire a single resultant output image.

4.2. Feature Extraction

After image fusion, a little pre-processing is required for feature extraction. It can be observed from the fused image that Yamuna’s water is much darker than Ganges water in this area, and Ganges water is greenish. Feature extraction is done in this research to extract water bodies. Thus, it has to extract Ganges and Yamuna rivers. Hence, as per the fused image obtained, feature extraction tries to extract the greenish part using the band combinations. The research work presents a novel formulation for pre-processing through band combination, as expressed in Equation (29).

After applying various combinations, the band combination presented through Equation (29) is giving more prominent results as the extracted features were clearer. Further, segmentation is required after pre-processing to extract water bodies distinctly. For this purpose, Otsu’s threshold method has been used in this work. It uses the histogram of the image to get the threshold and uses that threshold to segment the image, as shown in Figure 8. It is an adaptive threshold method that means it will automatically get a threshold according to the image’s histogram. It is used in between class variance and within-class variance. According to the Otsu threshold, the optimal threshold will maximise for inter-class variance and minimise for intra-class variance. It gives a great outcome when the histogram has two well-defined peaks, one for the background and the second for the foreground. The threshold calculated by Otsu’s method is absolutely calculated by the class with the largest variance between classes, whether it is the foreground or the background.

Otsu thresholding uses three formulas to get the threshold: intra-class variance is computed using Equation (30), inter-class variance is computed using Equation (31), and mean value is calculated using the formula shown in Equation (32) as follows:where weight, mean, and variance are represented by, respectively w, , and . The background and foreground areas are differentiated as b and f.

5. Results and Discussion

The proposed fusion method, GDF, followed by the extraction technique, has been executed in MATLAB. Four fusion methods, namely GDF, BT, WT, and FT, have been applied to the dataset for comparative assessment. The fused imageries obtained after fusion have been shown in Figure 9.

From the resultant fused images, one can easily observe that in all fused methods, Ganges and Yamuna River can be distinguished because of watercolour. Yamuna’s water is much darker than Ganges’ water in this area. The fused images of WT and FT appear more like multispectral images rather than panchromatic images. It has detected edges on WT fused image. It can be seen that GDF and BT fusion methods perform well but GDF performs better than BT. There is a colour distortion in BT at Yamuna River, which is not the case with the GDF. For better comparative analysis, evaluation metrics have been computed for all four fusion methods.

Table 2 shows the comparison on the basis of RMSE, PSNR, and SD values. RMSE estimated for GDF, BT, WT, and FT are 0.0184, 0.0231, 0.0231, and 0.0228, respectively. PSNR estimated for GDF, BT, WT, and FT are 46.698, 51.5301, 49.376, and 63.6604, respectively. SD estimated for GDF, BT, WT, and FT are 0.2622, 0.2624, 0.2625, and 0.2623. So, it is clear from the stats of RMSE, PSNR, and SD that the proposed methodology, GDF gives the minimal values for metrics denoting error. Hence, GDF gives better results as compared to the existing methods.

Table 3 shows the comparison of FT, WT, BT, and GDF based on CC values for red, green, and blue bands. GDF shows the highest CC value as 0.8762 for the red band. For the green band, BT shows the highest CC value as 0.8702. Nonetheless, GDF has the second-best CC value as 0.8650. GDF again gives the best CC values for the Blue band as 0.8792. From CC values, it is clear that GDF shows the maximum correlation with all the bands, even though it has the second-best value for the Green band. Hence, in terms of CC values, GDF has a better correlation than the compared methods for red, green, and blue bands.

Table 4 compares the entropy for FT, WT, BT, and GDF methods. As discussed, the higher entropy value denoted the image’s clarity of contents. The entropy for the Red and Blue bands is the highest, corresponding to the GDF method. The entropy values for the red and blue bands corresponding to GDF are 7.9088 and 7.7957, respectively. However, the entropy for the green band is the highest, corresponding to the BT method. The entropy value for the green band corresponding to the BT method is 7.8761. In this case, also, GDF shows the second highest entropy value as 7.8627 for the green band, which is a comparative score. Overall, the entropy values for all three bands establish that GDF is more suitable for fusion than the existing methods.

After fusion, feature extraction has been done. The pre-processing and segmentation of feature extraction are done for the concerned images. Otsu’s thresholding method is used for the feature extraction. Since the study area is rich enough with the viewpoint of water bodies, extracting water features is of prime interest. For this purpose, the Panchromatic imagery from Cartosat-2 and obtained fused imagery from the GDF method have been used to extract the water features. Otsu segmentation is applied in original images (images before fusion) and fused images. The results of feature extraction are shown in Figure 10. The water features are clearly extracted in the GDF fused imagery, whereas the same method is unable to extract the water features accurately from the PAN imagery of Cartosat-2. One can see the significant difference between those two outputs.

From the resultant images, it can also be observed that water feature extraction is not accurate for the unfused image, and it gets mixed with similar features like vegetation, shadows, etc. Otsu’s method without fusion extracts water bodies and dark areas like trees and shadows, which have a similar spectral response. Moreover, after image fusion, Otsu segmentation extracts only water bodies of Yamuna and Ganges, along with the traces of bridges on them. Hence, the water features have been extracted properly, and the river streams for the Ganges and the Yamuna are extracted accurately without any mixing with other similar features and mapped successfully. From this, one can infer that image fusion can rectify spectral mixing issues also. These results underline the significance of image fusion.

6. Conclusions and Future Work

This article proposed a novel method for fusion, termed GDF. The high-resolution panchromatic imagery from Cartosat-2 and multi-spectral (only red, green, and blue) imagery from Sentinel-2 has been used for the fusion. The proposed method has been implemented to fuse the two imageries. Further, the comparative assessment has been done with three existing methods, viz., BT, WT, and FT. These four methods have been compared on the basis of RMSE, PSNR, SD, CC, and entropy values. The resu1ts obtained for GDF gives minimal error corresponding to the metrics RMSE, PSNR, and SD. GDF has a better correlation than the compared methods BT, WT, and FT for red, green, and blue bands. The entropy values of GDF is more than the existing methods, which accounts for image clarity. In summary, the comparative stats affirm that the GDF method gives better results as compared to the existing methods. After image fusion, Otsu’s method is applied for the water feature extraction from the fused imagery obtained through the GDF method and Panchromatic imagery of Cartosat-2. From the resultant images, it is observed that after image fusion, Otsu segmentation extracts only water bodies of Yamuna and Ganges. However, Otsu’s method, without fusion from the raw image, extracts water bodies and dark areas like trees and shadows, which have a similar spectral response. Also, from the results of feature extraction, bridges get extracted over the water area though as a traces. The results obtained from image fusion followed by Otsu’s segmentation are clear, sharper and specific to the extracted features. Hence, it can be deduced through results that image fusion also addresses the spectral mixing issue. In future work, bridges can be detected more accurately over the water after better processing.

Data Availability

The data that support the findings of this study are openly available at https://earthexplorer.usgs.gov/.

All authors have transferred their rights to publish research findings.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors are thankful to “Ministry of Education,” Indian Institute of Information Technology Allahabad, Prayagraj, Uttar Pradesh, India, and Galgotias University, Gr. Noida, Uttar Pradesh India.