The Fusion of Multi-Focus Images Based on the Complex Shearlet Features-Motivated Generative Adversarial Network

Wang, Lei; Liu, ZhouQi; Huang, Jin; Liu, Cong; Zhang, LongBo; Liu, ChunXiang

doi:https://doi.org/10.1155/2021/5439935

Journal of Advanced Transportation

On this page

Abstract Introduction Experimental Results and Analysis Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Machine Learning, Deep Learning, and Optimization Techniques for Transportation 2021

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 5439935 | https://doi.org/10.1155/2021/5439935

The Fusion of Multi-Focus Images Based on the Complex Shearlet Features-Motivated Generative Adversarial Network

Lei Wang,¹ZhouQi Liu,¹Jin Huang,¹Cong Liu,¹LongBo Zhang,¹and ChunXiang Liu²

Academic Editor: Chi-Hua Chen

Received29 Jun 2021

Accepted15 Jul 2021

Published30 Jul 2021

Abstract

The traditional methods for multi-focus image fusion, such as the typical multi-scale geometric analysis theory-based methods, are usually restricted by sparse representation ability and the transferring efficiency of the fusion rules for the captured features. Aiming to integrate the partially focused images into the fully focused image with high quality, the complex shearlet features-motivated generative adversarial network is constructed for multi-focus image fusion in this paper. Different from the popularly used wavelet, contourlet, and shearlet, the complex shearlet provides more flexible multiple scales, anisotropy, and directional sub-bands with the approximate shift invariance. Therefore, the features in complex shearlet domain are more effective. With of help of the generative adversarial network, the whole procedure of multi-focus fusion is modeled to be the process of adversarial learning. Finally, several experiments are implemented and the results prove that the proposed method outperforms the popularly used fusion algorithms in terms of four typical objective metrics and the comparison of visual appearance.

1. Introduction

The target information may display the differentiation for the lengths of the focus during the imaging procedure, that is, the closer the object to the focus is, the clearer the image is. On the other hand, it is difficult to synchronously get the full-focus image by only one imaging device [1]. A common method to deal with this problem is to fuse multiple images of the same scene into images of different focal lengths, which is called the multi-focus image fusion and has been widely used in military monitoring, image analysis, and transportation [2]. For example, in modern wars, the multi-focus images can be used to monitor important targets and facilities of the enemy, and in the transportation domain, the multi-focus images can be used to track logistics and vehicle information and even penalize violations.

Nowadays, there are mainly four kinds of strategies for the fusion of multi-focus images: the spatial domain methods, the early transform domain methods, the multi-scale geometric analysis theory-based methods, and the deep learning theory-based methods. The spatial domain methods usually directly implement the linear computation on the image pixel, for example, the averaging method, maxing method, and weighted method. The early transform domain methods include the Laplace pyramid-based method, wavelet-based method, etc.

In these methods, the multi-focus images are decomposed into different scales and each scale is with a limited number of sub-bands. Then, the features in different levels can be obtained for fusion. For example, in reference [3], the authors proposed a fusion method by using the extremum of the wavelet coefficients in different sub-bands. Dou et al. [4] proposed a fusion method by using the region energy in different high-pass sub-band coefficients by considering their distributions. Due to the limited number of high sub-bands, some block artifacts of edges may appear in these methods. To deal with these problems, multi-scale geometric analysis theory-based methods have been popularly reported in recent years. The curvlet transform, contourlet transform, non-subsampled contourlet transform (NSCT), and the shearlet are the typical decomposition tools in this period. For example, Li and Yang [5] proposed a fusion method by combining the wavelet and curvelet to overcome the disadvantages of the wavelet. He et al. [6] proposed a multi-focus image fusion method based on the improved contourlet package. Qu et al. [7] proposed the spatial frequency-motivated PCNN model in NSCT domain, and the spatial frequency was used to implement the firing mapping in the method. Liao et al. [8] proposed a shearlet-based fusion method by employing the statistical information of the shearlet coefficients. Considering the fusion procedure of the aforementioned methods, it is obvious that the fusion results are highly determined by the performance of the decomposition abilities.

From the viewpoint of fusion rules, the fusion procedure of the multi-scale geometric analysis theory-based methods can be modeled to be the classification problem of the multi-scale transformation coefficients. There are also three typical categories for the fusion rules: the active level metric-based rule, the kernel learning-based fusion rule, and the neural network-based fusion rule. For the former, the algebraic operations, such as averaging and maxing, are popularly used. The second one includes the ICA, SVM, and PCA. In literature [9], the principal component analysis (PCA) is implemented in dictionary training to reduce the dimension of transform coefficients. In literature [10], the cartoon components and the texture components are combined by ICA. The artifacts are easy to produce for the classification determined by the simple computations for the single coefficient. The neural network-based rule has been popularly reported in recent years, and some good results have been obtained. For example, in literature [11], the PCNN model is used to be the fusion rule by combining the NSCT together. Though good results have been obtained, these models are not abstract enough, which means the features are all in low levels. So, advanced neural network models should be further developed.

Compared with the traditional neural networks, the deeply coupled neural network is the breakthrough in this domain and has been popularly applied in image denoising, image recognition and classification, and in image fusion [12]. For example, a novel image fusion method for the multi-focus image was proposed based on the support value-motivated deep convolutional neural network model in literature [13]; a general multi-focus image fusion framework, called IF-CNN, was developed based on the deeply convolutional neural network in literature [14]. The MFF-GAN and Pan-GAN, under the mechanism of the unsupervised generative adversarial network, are proposed in the literature [15] and [16], respectively, and detail preserving adversarial learning model is proposed in literature [17, 18]. Specially, according to some recent references, good results are always obtained by the GAN-based methods. The main reason lies in its unique characteristics: firstly, the GAN model has more complex and deep network structure than the commonly used neural network; secondly, modeling the fusion process into the adversarial learning is more in line with the general principle of human understanding of the world. However, the common characteristics of the well-known methods are that they are directly developed in the pixel level, the important image features are not carefully used.

In order to overcome the shortcomings of the above methods, a multi-focus image fusion method based on the GAN in the complex shearlet domain is developed. Different from the traditional transformation methods, such as the curvelet and the contourlet, the complex shearlet can divide the source images into high-pass and low-pass sub-bands to provide more useful features. Besides, the computational efficiency of complex shearlet is higher than that of the NSCT to get the same shift invariance. With the help of the GAN, the whole fusion can be modeled by the adversarial learning of the features in the complex shearlet. Therefore, better fusion results can be obtained in the feature level.

The rest of this paper is organized as follows. The details of the whole method are given in Section 2. Experimental results and some important discussions are given in Section 3. Finally, the paper is concluded in Section 4.

2. Methodology

Figure 1 shows the structure of the proposed method. Firstly, the images to be fused are input into the GAN, and at the same time, the complex shearlet is implemented to get the high-pass subbands for them. Then, the features in the complex shearlet domain are computed to produce the new form of the loss function. The loss function is updated to drive the training of the GAN, and the final fusion results can be obtained after the training is finished.

2.1. The Complex Shearlet Transform

As one of the most famous multi-scale geometric transformation tools, the complex shearlet transform can extract directional information of different scales and deliver highly sparse approximations of the 2D signals. Generally speaking, it divides the source images into low-pass and high-pass sub-band images in different levels, i.e., the approximately sparse representation of the source images and the obvious feature information of the images. Different from the discrete wavelet, contourlet, and shearlet, the complex shearlet is realized based on the multi-scale pyramid filters and the Hilbert transform [19, 20]. The former gives the multiple partitions of the image, and the latter provides the directional sub-bands in the complex space. Figure 2 gives an example of the complex shearlet transform on a “clock with the left focus.”

2.2. The Feature in the High-Pass Sub-Bands

After the complex shearlet transform is done, the shearlet coefficients with large absolute values are considered to be the sharp brightness or salient features, meaning that they are the focused regions in the source images. Considering the aim of bringing the focus to the fused image, it needs to extract the focus firstly by using the complex shearlet coefficients.

On the other hand, the features in the multi-focus images can be uniformly described by the activity-level measurements, such as the local energy, standard deviation, and spatial frequency [21, 22]. For the above reasons, local energy and spatial frequency are used to represent the important features in the high-pass coefficients. Furthermore, different from their common form used in other literatures, they are computed in multiple scales and directions.

2.3. The GAN Model

2.3.1. The Structure of the GAN

Usually, the complete structure of the GAN network consists of two parts: the generator and the discriminator [23, 24]. The detailed structure of the GAN model used in this paper is shown in Figures 3 and 4.

For the generator, five convolutional layers are used to extract features. A 5 x 5 convolution kernel is used in the first convolutional layer, and a 3 x 3 convolution kernel is used for the other four layers. The inputs of each layer are connected by the outputs of all the previous layers, with the aim of speeding up the convergence and improving the stability of the model [25] All the activation functions are set to be “ReLU,” i.e., rectified linear unit. Furthermore, layer normalization (BN) is also employed to preserve the contrast information of the source images. It calculates the average value of all the dimensional inputs in each layer and finally implements the normalization operation. The advantages are to reduce the sensitivity of initializing data and effectively avoid the gradient disappearance problem [26].

Different from the generator, the main propose of the discriminator is to give the decision by classification. As shown in Figure 4, the discriminator has the same structure with the convolutional neural network which has two inputs, i.e., the Laplacian joint enhanced image from the source images and the fused image from the generator. Four layers of the 3 × 3 filters are designed to implement the convolution to capture the feature information. Meanwhile, in order to reduce the loss of important information caused by using the downsampling scheme, the activation function is set to be “ReLU.” Finally, the fully connected layer is used to classify, and the sigmoid function is employed to output the final results.

2.3.2. The Loss Function

The loss function plays the role of minimizing the loss of the training to get the ideal model, and it usually consists of the generator loss function and discriminator loss function, as shown in the following formula:where , are the generator loss function and the discriminator loss function, respectively.

According to the original model, is defined by formula (2). It is computed by the summarizing the confrontation loss and the content loss from the procedure of the image generation.where is a balanced weight between and , is the target image to be fused, is the number of fused images, is the result of classification, means that the false data is recognized to be true by the discriminator, is the intensity loss, is the gradient loss, and is the balanced weight.

can be expressed aswhere is the joint Laplace enhanced gradient map, is the gradient map of the fused image, and and are their labels, respectively.

From all the formulas above, it can be seen that target image to be fused is very important in confrontation learning. The common way to get it is to average the images to be fused or let it be initialized by one of the images to be fused. The drawback is that it is far from the final results and should spend much time and resource to get the optimal decision during the confrontation.

Therefore, a new form of the target image is proposed. Let be the low-pass sub-band coefficient at position , . The low-pass coefficient of the fused image can be obtained bywhere is the local energy computed in the neighborhood.

Let be the high-pass coefficient at in the sub-band and the level after implementing the complex shearlet transformation, ; then, the high-pass coefficient of the fused image can be obtained bywhere is the spatial frequency which can be computed by the following formula:

Then, the target image can be obtained by applying the inversion of the complex shearlet transform on the fused low-pass and high-pass sub-bands.

3. Experimental Results and Analysis

The experiments are implemented to show the performance of the proposed method. The platform used is Inspur big data server NF5280M4 with Intel Xeon CPU and 256 GB RAM. 100 pairs of multi-focus images are used for the training. All the data can be downloaded from the web [27–30].

Seven typical methods, i.e., PCNN-based method (PCNN for short) [31], the contourlet-based method (contourlet for short) [32], the GAN-based method (GAN for short) [17], the DCNN-based method (DCNN for short) [33], the discrete shearlet-based method (shearlet for short) [34], the convolutional sparse representation-based method (CSR for short) [35], and sparse representation and sum modified-Laplacian-based method (SR-SML for short) [36], are implemented. The level of the complex shearlet is four.

So far, how to evaluate the quality of the fusion results is still a confusing question. Subjectively visual and objectively quantitative comparison is the mainstream practice in this domain. Without loss of generality, mutual information, entropy, standard deviation (MI, En, and SD for short, respectively), and Q^AB/F are selected to be the metrics. The greater their value, the better the fusion images [37–39].

To save space, only “Pepsi-Cola,” “Plane,” “Clocks,” “Flower,” “Cup,” and “Calendar” are shown in Figure 5. All the fusion results are shown in Figures 6–11 . In Figure 7, the middle part of the “Plane” is partially enlarged to compare the local detail features.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

From the above methods, we can see that though the focus regions are expressed better than the source image, the fusion results are different from each other. For the PCNN-based method, blurred edges obviously occur, and so the details are not clear enough. For the contourlet method, shearlet method, CSR method, and SR-SML method, though the results are improved, the contours are sharpened and the phenomenon of ghosting occurs. This can be explained by comparing the ability of the sparse representations for the important image features.

As for the GAN based the DCNN-based method, the results are much clear, but the texture information is not good enough by comparing the results obtained by the proposed method. This is because these two models are directly learned based on the pixel of the images to be fused. The importance of the feature in the procedure of learning is not fully considered. On the other hand, the texture information in the proposed method is highly improved and the ghosting phenomenon is suppressed to the greatest extent. Furthermore, this can also be proved by the enlarged images in Figures 7 and 11. In addition, from the objective comparison in Tables 1 and 2, the best value of the four metrics can be almost obtained by the proposed method. All the above facts fully demonstrate the effectiveness and accuracy of the proposed method.

4. Conclusion

To get better fusion results for the multi-focus images, the features-motivated generative adversarial network is constructed with the help of the complex shearlet transform. Six typical experiments have been carefully implemented to show the full evidence of the effectiveness and accuracy. In the future, more complex models will be built to further improve the fusion performance.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (61502282 and 61902222), the Natural Science Foundation of Shandong Province (ZR2015FQ005), and the Taishan Scholars Program of Shandong Province (tsqn201909109).

References

S. C. Kulkarni and P. P. Rege, “Pixel level fusion techniques for SAR and optical images: a review,” Information Fusion, vol. 59, pp. 13–29, 2020.
View at: Publisher Site | Google Scholar
Y. Liu, X. Chen, Z. Wang, Z. J. Wang, R. K. Ward, and X. Wang, “Deep learning for pixel-level image fusion: recent advances and future prospects,” Information Fusion, vol. 42, pp. 158–173, 2018.
View at: Publisher Site | Google Scholar
J. Tian, G. Liu, and J. Liu, “Multi-focus image fusion based on edges and focused region extraction,” Optik, vol. 171, pp. 611–624, 2018.
View at: Publisher Site | Google Scholar
J. Dou, Q. Qin, and Z. Tu, “Image fusion based on wavelet transform with genetic algorithms and human visual system,” Multimedia Tools and Applications, vol. 78, no. 9, pp. 12491–12517, 2019.
View at: Publisher Site | Google Scholar
S. Li and B. Yang, “Multifocus image fusion by combining curvelet and wavelet transform,” Pattern Recognition Letters, vol. 29, no. 9, pp. 1295–1301, 2008.
View at: Publisher Site | Google Scholar
K. He, D. Zhou, X. Zhang, and R. Nie, “Multi-focus: focused region finding and multi-scale transform for image fusion,” Neurocomputing, vol. 320, pp. 157–170, 2018.
View at: Publisher Site | Google Scholar
X.-B. Qu, J.-W. Yan, H.-Z. Xiao, and Z.-Q. Zhu, “Image fusion algorithm based on spatia frequency-motivated pulse coupled neural networks in non-subsampled contourlet transform domain,” Acta Automatica Sinica, vol. 34, no. 12, pp. 1508–1514, 2009.
View at: Publisher Site | Google Scholar
Y. Liao, W. Huang, L. Shang et al., “Image fusion based on shearlet and improved PCNN,” Computer Engineering and Application, vol. 50, pp. 142–146, 2014.
View at: Google Scholar
V. P. S. Naidu, “Hybrid DDCT-PCA based multi sensor image fusion,” Journal of Optics, vol. 43, no. 1, pp. 48–61, 2014.
View at: Publisher Site | Google Scholar
D. Carone, G. W. J. Harston, J. Garrard et al., “ICA-based denoising for ASL perfusion imaging,” NeuroImage, vol. 200, pp. 363–372, 2019.
View at: Publisher Site | Google Scholar
T. Li and Y. Wang, “Biological image fusion using a NSCT based variable weight method,” Information Fusion, vol. 85, 2011.
View at: Publisher Site | Google Scholar
Y. Yang, Z. Nie, S. Huang, P. Lin, and J. Wu, “Multilevel features convolutional neural network for multifocus image fusion,” IEEE Transactions on Computational Imaging, vol. 5, no. 2, pp. 262–273, 2019.
View at: Publisher Site | Google Scholar
C. Du, S. Gao, Y. Liu, and B. Gao, “Multi-focus image fusion using deep support value convolutional neural network,” Optik, vol. 176, pp. 567–578, 2019.
View at: Publisher Site | Google Scholar
Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, “IFCNN: a general image fusion framework based on convolutional neural network,” Information Fusion, vol. 54, pp. 99–118, 2020.
View at: Publisher Site | Google Scholar
H. Zhang, Z. Le, Z. Shao, H. Xu, and J. Ma, “MFF-GAN: an unsupervised generative adversarial network with adaptive and gradient joint constraints for multi-focus image fusion,” Information Fusion, vol. 66, pp. 40–53, 2021.
View at: Publisher Site | Google Scholar
J. Ma, W. Yu, C. Chen, P. Liang, X. Guo, and J. Jiang, “Pan-GAN: an unsupervised pan-sharpening method for remote sensing image fusion,” Information Fusion, vol. 62, pp. 110–120, 2020.
View at: Publisher Site | Google Scholar
J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, “Fusiongan: a generative adversarial network for infrared and visible image fusion,” Information Fusion, vol. 48, pp. 11–26, 2019.
View at: Publisher Site | Google Scholar
B. Lu, Y. Hu, L. Lin et al., “Using ensemble deep learning method to integrate multi-source data to develop national visibility grid data,” Advances in Meteorological Science and Technology, vol. 8, pp. 77–82, 2018.
View at: Google Scholar
X. Liu, Ke Xu, P. Zhou, and J. Chi, “Edge detection of retinal OCT image based on complex shearlet transform,” IET Image Processing, vol. 13, pp. 1686–1693, 2019.
View at: Publisher Site | Google Scholar
H. Karbalaali, A. Javaherian, S. Dahlke, and S. Torabi, “Channel edge detection using 2D complex shearlet transform: a case study from the South Caspian Sea,” Exploration Geophysics, vol. 49, no. 5, pp. 704–712, 2018.
View at: Publisher Site | Google Scholar
X. Jin, D. Zhou, S. Yao et al., “Multi-focus image fusion method using S-PCNN optimized by particle swarm optimization,” Soft Computing, vol. 22, no. 19, pp. 6395–6407, 2018.
View at: Publisher Site | Google Scholar
N. Hayat and M. Imran, “Ghost-free multi exposure image fusion technique using dense SIFT descriptor and guided filter,” Journal of Visual Communication and Image Representation, vol. 62, pp. 295–308, 2019.
View at: Publisher Site | Google Scholar
X. Liu, Y. Wang, and Q. Liu, “PSGAN: a generative adversarial network for remote sensing image pan-sharpening,” in Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 873–877, Athens, Greece, October 2018.
View at: Publisher Site | Google Scholar
X. Guo, R. Nie, J. Cao D, Z. L. Mei, and K. He, “Fusegan: learning to fuse multi-focus image via conditional generative adversarial network,” IEEE Trans. on Multimedia, vol. 21, pp. 1982–1996, 2019.
View at: Publisher Site | Google Scholar
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269, Honolulu, HI, USA, July 2017.
View at: Publisher Site | Google Scholar
C.-B. Du and S.-S. Gao, “Multi-focus image fusion with the all convolutional neural network,” Optoelectronics Letters, vol. 14, no. 1, pp. 71–75, 2018.
View at: Publisher Site | Google Scholar
S. Slavica, “Multi-focus image fusion based on empirical mode decomposition,” in Twentieth International Electro technical and Computer Science Conference, ERK 2011, San Francisco, CA, USA, October 2011.
View at: Google Scholar
M. Nejati, S. Samavi, and S. Shirani, “Multi-focus image fusion using dictionary-based sparse representation,” Information Fusion, vol. 25, pp. 72–84, 2015.
View at: Publisher Site | Google Scholar
G. Piella, “A general framework for multiresolution image fusion: from pixels to regions,” Information Fusion, vol. 4, no. 4, pp. 259–280, 2003.
View at: Publisher Site | Google Scholar
“Multi-focus-Image-Fusion-Dataset,” https://github.com/sametaymaz/Multi-focus-Image-Fusion-Dataset.
View at: Google Scholar
Z. Wang, Y. Ma, and J. Gu, “Multi-focus image fusion using PCNN,” Pattern Recognition, vol. 43, no. 6, pp. 2003–2016, 2010.
View at: Publisher Site | Google Scholar
Z. Zhu, H. Yin, Y. Chai, Y. Li, and G. Qi, “A novel multi-modality image fusion method based on image decomposition and sparse representation,” Information Sciences, vol. 432, pp. 516–529, 2018.
View at: Publisher Site | Google Scholar
Y. Liu, X. Chen, H. Peng, and Z. Wang, “Multi-focus image fusion with a deep convolutional neural network,” Information Fusion, vol. 36, pp. 191–207, 2017.
View at: Publisher Site | Google Scholar
X. Liu, Y. Zhou, and J. Wang, “Image fusion based on shearlet transform and regional features,” AEU - International Journal of Electronics and Communications, vol. 68, no. 6, pp. 471–477, 2014.
View at: Publisher Site | Google Scholar
Y. Liu, X. Chen, R. K. Ward, and Z. Jane Wang, “Image fusion with convolutional sparse representation,” IEEE Signal Processing Letters, vol. 23, no. 12, pp. 1882–1886, 2016.
View at: Publisher Site | Google Scholar
X. Li, X. Zhang, and M. Ding, “A sum-modified-Laplacian and sparse representation based multimodal medical image fusion in Laplacian pyramid domain,” Medical & Biological Engineering & Computing, vol. 57, no. 10, pp. 2265–2275, 2019.
View at: Publisher Site | Google Scholar
M. B. A. Haghighat, A. Aghagolzadeh, and H. Seyedarabi, “A non-reference image fusion metric based on mutual information of image features,” Computers & Electrical Engineering, vol. 37, no. 5, pp. 744–756, 2011.
View at: Publisher Site | Google Scholar
H. R. Sheikh and A. C. Bovik, “Image information and visual quality,” IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430–444, 2006.
View at: Publisher Site | Google Scholar
V. Aslantas and E. Bendes, “A new image quality metric for image fusion: the sum of the correlations of differences,” AEU - International Journal of Electronics and Communications, vol. 69, no. 12, pp. 1890–1896, 2015.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Lei Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies