A Network with Composite Loss and Parameter-free Chunking Fusion Block for Super-Resolution MR Image

Han, Qi; Hou, Mingyang; Wang, Hongyi; Qiu, Zicheng; Tian, Yuan; Tian, Sheng; Wu, Chen; Zhou, Baoping

doi:https://doi.org/10.1155/2023/4959130

Journal of Healthcare Engineering

On this page

Abstract Introduction Related Work Methods Conclusion Data Availability Disclosure Conflicts of Interest Authors’ Contributions Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2023 | Article ID 4959130 | https://doi.org/10.1155/2023/4959130

A Network with Composite Loss and Parameter-free Chunking Fusion Block for Super-Resolution MR Image

Qi Han,¹Mingyang Hou,¹Hongyi Wang,¹Zicheng Qiu,¹Yuan Tian,¹Sheng Tian,¹Chen Wu,¹and Baoping Zhou²

Academic Editor: Mangal Sain

Received05 Oct 2022

Revised05 Mar 2023

Accepted10 Mar 2023

Published12 Jun 2023

Abstract

MRI is often influenced by many factors, and single image super-resolution (SISR) based on a neural network is an effective and cost-effective alternative technique for the high-resolution restoration of low-resolution images. However, deep neural networks can easily lead to overfitting and make the test results worse. The network with a shallow training network is difficult to fit quickly and cannot completely learn training samples. To solve the above problems, a new end-to-end super-resolution (SR) method is proposed for magnetic resonance (MR) images. Firstly, in order to better fuse features, a parameter-free chunking fusion block (PCFB) is proposed, which can divide the feature map into branches by splitting channels to obtain parameter-free attention. Secondly, the proposed training strategy including perceptual loss, gradient loss, and L1 loss has significantly improved the accuracy of model fitting and prediction. Finally, the proposed model and training strategy take the super-resolution IXISR dataset (PD, T1, and T2) as an example to compare with the existing excellent methods and obtain advanced performance. A large number of experiments have proved that the proposed method performs better than the advanced methods in highly reliable measurement.

1. Introduction

MRI is a noninvasive imaging technology in vivo that uses the phenomenon of magnetic resonance to obtain molecular structure and thus information about the internal structure of the human body. MRI not only provides more information than many other imaging techniques in medical imaging, but it can also directly make cross-sectional, sagittal, coronal, and various oblique images of the body, which does not produce the artifacts in CT detection, does not require contrast injection, does not have ionizing radiation, and has less adverse effects on the body. MRI is very effective in detecting intracerebral hematomas, extracerebral hematomas, brain tumors, and other diseases. Of course, MRI has its shortcomings [1]. It is relatively slow, has less spatial resolution than CT, has motion artifacts, etc. Therefore, obtaining high-resolution MRI images has become the direction of current research.

High-resolution MRI can not only clearly show the relationship between tumor and surrounding tissues but also the anatomical structure of the brain. It has high application value in the early and middle stages of diagnosis [2].

However, the generation of high-resolution MRI images is odnften influenced by many factors, such as hardware equipment, imaging time, the motion of the human body, and the effect of environmental noise. Therefore, in order to perform effective high-resolution restoration of the low-resolution images obtained by MRI, image super-resolution is an effective and cost-effective excellent technique to improve the spatial resolution of MR images. This technique offers the feasibility of a high signal-to-noise ratio and high-resolution reconstruction of low-resolution MRI images [3].

The traditional SR algorithms include interpolation-based and reconstruction-based methods, which are generally difficult to reconstruct from the high-frequency detailed information of the image, more complicated to compute, and take longer time to reconstruct [4]. In order to solve these problems, scholars have applied deep learning to SR reconstruction in recent years and made a lot of breakthroughs, and nowadays, SR algorithms based on deep learning have occupied the mainstream position of SR algorithm research. In the field of medical images, deep learning-based SR algorithms can obtain prior knowledge from medical image training set data and reconstruct low-resolution images into high-resolution images using neural networks based on this information.

In recent years, with the continuous development of deep learning [5–8], many advanced deep learning-based SR methods have emerged in the field of SR image [9, 10], enabling the performance and efficiency of SR image to be continuously enhanced. Super-resolution convolutional neural network [11] and fast super-resolution evolutionary neural network [12] were pioneering works of deep learning in the field of super-resolution reconstruction. They use a convolutional neural network (CNN) for super-resolution image reconstruction for the first time. Subsequently, on the basis of this pioneering work, researchers proposed many new super-resolution image networks to further improve the model performance, such as deeply recursive convolutional network [13] and deep recursive residual network [14] based on recurrent neural networks and super-resolution using very deep convolutional networks [15]. FFTI [16] was a fine inpainting method which is an incomplete image inpainting method based on feature fusion and two-step inpainting. However, most of these methods were aimed at natural images and are not suitable for medical images.

Recently, many literature studies in the field of medical images have also proposed many SR methods for medical images, such as [17–21]. However, unlike ordinary images, high-quality medical image datasets are relatively scarce, and most of the images are gray-scale images, and the images are relatively single. Using this data set to train a model with a deep network layer will easily lead to overfitting and make the test result worse. A model with a shallow training network will be difficult to fit quickly and cannot learn the training samples completely. Therefore, SR medical images trained by a traditional network cannot meet the requirements of SR tasks.

Considering the above problems, in order to make a SR image model more suitable for medical image tasks, in this paper, we introduce residual learning and a parameter-free chunking fusion method to improve the above difficulties. In the stage of feature extraction, residual learning is designed similar to the residual network [22] to acquire features, which uses layerNorm [23] in the transformer for reference. LayerNorm is also used in residual learning to make the training smoother and avoid the impact of variance differences between different batches. Subsequently, a parameter-free chunking fusion block is used to better fuse features and perform effective feature enhancement. In the module, the feature graph chunking is divided into branches for different information transmission, and then the SimAM [24] is performed on each branch to enhance the features of different branches, and finally the semantic information of different branches is integrated. SimAM can effectively enhance the feature on different branches and effectively integrate at the end. Moreover, SimAM has no parameters to learn and can improve the model performance without parameter training. In addition, in order to further accelerate model fitting and improve prediction accuracy, this paper proposes a composite loss to optimize the training strategy by combining perceptual loss, gradient loss, and L1 loss.

In order to solve the above problems, we have proposed corresponding solutions, to which the follow-up work mainly makes three contributions:(1)A parameter-free chunking fusion block (PCFB) model is proposed, which divides the feature map into branches for parameter-free attention and then integrates the feature information of different branches, so as to better fuse features and perform effective feature enhancement, which can improve the expression ability of the feature map without adding parameters, thereby improving the accuracy.(2)A composite loss for our SR method is proposed which combines perceptual loss, gradient loss, and L1 loss. The loss can further make the model pay attention to the impact of loss in different dimensions, thus enhancing the model’s expressiveness.(3)A new end-to-end SR method for MR images is proposed, where the methods contain PCFB and composite loss, which can improve SR method performance more effectively and avoid overfitting.

The rest of this paper is organized as follows: Section 2 introduces some related work in this paper. The proposed methods and experimental results are described in detail in Sections 3 and 4, respectively. We conclude our thesis in Section 5.

2.1. Super-Resolution in Deep Learning

With the development of deep convolutional neural networks (DCNN), research on super-resolution has made progress recently. For deep learning methods with SISR, fast response and reconstruction quality are important references for measuring super-resolution methods. Super-resolution convolutional neural network (SRCNN) [11] and fast super-resolution evolutionary neural network (FSRCNN) [12] were pioneering works of deep learning in the field of super-resolution reconstruction. The two neural networks first used bicubic interpolation to reduce and enlarge low-resolution images to obtain comparable super-resolution images. Then convolutional neural network was first introduced to achieve image reconstruction. In addition, the traditional SR method based on sparse coding can also be regarded as a deep convolutional network from the two networks, and compared with the traditional method, all sublayers in the two networks were optimized to give full play to the performance of each component. DRCN has a very deep recursion layer (up to 16 recursions), and recursive supervision and skip connections were further proposed by taking into account gradient disappearance/explosion. For deep models, the residual structure exhibits excellent performance. Therefore, the residual structure is introduced into the super-resolution method to make up for the shortcomings caused by gradient disappearance and gradient explosion. The deep super-resolution network (EDSR) [25] was inspired by the residual structure. Compared with the traditional residual structure, the residual blocks of EDSR discard unnecessary modules, thus constructing a multiscale depth super-resolution system (MDSR), which can reconstruct high-resolution images with different magnification factors in a single model. In addition, the SR robustness of images in complex scenes should also be focused on. A heterogeneous group SR CNN [9] contains multiple heterogeneous group blocks. These blocks increase the internal and external relations of different channels in a parallel way to cope with SR in complex scenarios. An enhanced super-resolution group CNN (ESRGCNN) [26] can fully fuse the correlation between wide channel features and retain the long-distance context dependence in the upsampling operation to obtain more accurate low-frequency information. Further, in order to solve the common problems in image super-resolution algorithms, such as image edge blurring caused by redundant network structure, inflexible selection of convolution kernel size, and slow convergence speed of training process, MFFN [27] used a lightweight fusion multilevel single image super-resolution method to achieve SISR.

2.2. Super-Resolution in Medical Imaging

The problem of super-resolution has been widely discussed in medical imaging. Due to limitations such as image acquisition time, low radiation dose, or hardware limitations, the spatial resolution of medical images is insufficient [28]. To solve this problem, Zhu et al. [29] proposed a method for arbitrary scale super-resolution (MIASSR) of medical images, where the method also combined meta-learning with GAN, which can be used for super-resolution at any magnification.

To get as many useful image details as possible, Bing et al. [20] proposed a SR method in medical imaging based on an improved generative adversarial network. This method can not only avoid the interference of high-frequency false information but also integrate the low-level feature constraints to train the model. Zhang et al. [21] proposed a fast medical image super-resolution method, in which subpixel convolution layer addition and mini-network replacement in the hidden layer were crucial to improving the speed of image reconstruction. Inspired by the super-resolution convolutional neural network method based on three hidden layers, Deeba et al. [18] proposed a wavelet-based microgrid network super-resolution method for medical images, where image restoration was speeded up by adding a subpixel layer to replace the small grid network on the hidden layer.

2.3. Attention Mechanism for Vision Tasks

Attention has arguably become one of the most important concepts in the field of deep learning. It was inspired by human biological systems, which tend to focus on unique parts when processing large amounts of information [30]. Liu et al. [31] proposed a multiattention domain module to weigh and reorganize the features; the channel and spatial domain information in the super-resolution method are effectively fused, and the quality of the super-resolution image is effectively improved. Wang et al. [32] proposed two new attention mechanisms: context-weighted channel attention and persistent spatial attention. The proposed attention modulates rich features by suppressing useless features and enhancing features of interest in a channel and spatial manner. Liu and Chen [33] made the following improvements on the basis of the super-resolution universal reverse network (SRGAN). Firstly, they added the channel attention (CA) module to the SRGAN network and increased network depth to better express high-frequency features. Secondly, the old batch normalization layer is deleted to improve network performance. Finally, the loss function is modified to reduce the influence of noise on the image.

3. Methods

3.1. Overview

In the image super-resolution task, our goal is to take the low-resolution (LR) image as the input of the super-resolution model and generate the super-resolution (SR) image . While the general low-resolution image is obtained by downsampling the ground-truth of the high-resolution image . We expressed the super-resolution model as and the parameter as . The super-resolution task can be expressed as the following formula:

In order to make as similar to as possible, it is necessary to optimize the model with the loss function , and finally the optimal parameter is obtained. The objective formula is as follows:

The proposed architecture of super-resolution is shown in Figure 1. Then, the details are given about the feature extraction block, parameter-free chunking fusion block (PCFB), and image reconstruction block. Finally, the composite loss and the training strategy are introduced to enhance the model’s expressiveness.

Figure 1

The proposed architecture of super-resolution. The architecture mainly includes the parameter-free chunking fusion block (PCFB) model, residual block, and reconstruction block. The residual block consists of convolutions, activation functions, and skip connections. PCFB divides the feature map into branches through a split channel, and each branch is fused after passing SimAM to obtain parameter-free attention feature enhancement. The reconstruction block consists of convolution, convolution, PReLU, and the PixelShuffle layer.

3.2. Network Architecture

3.2.1. Feature Extraction

The feature extraction part is composed of convolution, activation function, and residual block.

First, if the normal ReLU activation function is used, when the feature is less than 0, will be suppressed to 0, and the feature information will be lost. Therefore, we use PReLU [34] (parametric rectified linear unit) to replace ReLU. PReLU adds a learnable parameter on the basis of ReLU, which can adjust the activation function according to different experimental conditions. The formula is as follows:where represents the the feature map, is a learnable parameter.

Second, if batch normalization (BN) is used, due to the difference in the mean and variance of data in the mini-batch, unstable statistical data may be brought [35], and instance normalization [36] can avoid the above small batch problems. However, the work reported in [37] shows that adding instance normalization does not always bring performance improvement, and manual adjustment is required. Therefore, we introduce layer normalization (LN), which was used by relevant papers of transformer [23] in the early stages. Many recent SOTA methods [38–40] also use this normalization. LayerNorm is independent of the batch size, so it will not be affected by the above problems, and there are no parameters that need to be manually adjusted in the instance normalization. Therefore, LN is introduced to stabilize the training and improve the performance. The normalization formula is as follows:where represents the feature map, is a small constant, is mean, is variance, and and is scale and shift. The same normalization method is used as BN, but the difference is that LN normalizes each single batch rather than normalizing all batches together like BN.

3.2.2. Parameter-Free Chunking Fusion Block (PCFB)

In order to improve the propagation of feature information, Zhao et al. [41] designed module CSB to help the neural networks deal with hierarchical features with different attributes. Because CBF contains a large number of parameters that need to be learned and the fitting speed is slow, we propose PCFB that does not need to learn a large number of parameters on the basis of maintaining image quality. In PCFB, chunking and fusing are represented as channel splitting and channel merging, respectively.

The difference from CSB is that the size of the chunking is determined by the parameter , where each input feature is divided into chunks, and each chunk is the size of . Subsequently, in order to carry out targeted feature enhancement for each block of data, SimAM is used to process features of different blocks, and SimAM does not need redundant parameters to be learned, so the number of model parameters will not be increased.

(1) Chunking and Fusing. The input feature can be divided into chunks along the channel direction, and the dimension of each chunk is . It can be formally expressed as follows:where is the chunking function which split feature map into chunks . In contrast, is the fusing function, which can merge back to the original dimension use concat function.

(2) Parameter-Free Attention. Normally, spatial attention is often used for spatial information, while channel attention is often used for channel information to focus on feature information. However, in human eyes, spatial attention and channel attention coexist and jointly promote information selection in visual processing. Therefore, we need a three-dimensional attention to focus on the features in each channel and spatial position, so a parametric 3D attention SimAM is used to enhance the features of different chunks in the paper. The structure of the proposed method is shown in Figure 2.

SimAM evaluates the importance of each neuron by constructing an energy function . The lower the energy, the greater the difference between the neuron and surrounding neurons, and the higher the importance of features. The energy function is as follows:where is a neuron which means a pixel of feature map , , and represent the mean and standard deviation of the characteristic map, respectively, and is a hyper parameter.

Therefore, the importance of neurons can be obtained by . In addition, the attention mechanism can be realized by weighting the feature map through the sigmoid function. The formula is as follows:where means element-wise multiplication, and is the energy matrix containing all . This module does not introduce any additional training parameters, so it does not increase the original network parameters on the premise of improving performance.

(3) Parameter-Free Chunking Fusion Block. In order to better learn and enhance the features, we use equation (5) to obtain chunks and then let each chunk pass through equation (8) alone for 3D weighted attention. Equation (6) is used to fuse them into the original size like equation (9). The process is shown in Figure 1.

3.2.3. Image Reconstruction Block

In order to change the image to the super-resolution size, the upsampling operation is required, and we build the image reconstruction part to realize it. As shown in Figure 1, image reconstruction includes convolution, convolution, PReLU, and PixelShuffle [42] layers.

The main function of PixelShuffle is to obtain high-resolution feature maps by multichannel recombination of low-resolution feature maps. As shown in Figure 3, the feature mapping of the channels is recombined into the supersampling result of of a single channel. Pixel shuffle transforms the feature map from low-resolution space to high-resolution space.

3.3. Our Composite Loss for Super-Resolution

3.3.1. Conventional Loss

Most super-resolution methods use pixel loss to optimize the network. Pixel loss measures the pixel-wise difference between SR image and HR image, which contains L1 loss and L2 loss. Compared with L1 loss, L2 loss penalizes large errors but has a higher tolerance for small errors. In actual training, L1 loss [25, 43] shows better convergence than L2 loss. Finally, a higher peak signal-to-noise ratio (PSNR) index will be obtained, so it is the most widely used loss function in the super-resolution field. The formula is as follows:

However, since such pixel loss does not consider the image quality, such as edges, textures, and high-frequency details, which may be too smooth to maintain sharp edges to obtain visual effects.

3.3.2. Perceptual Loss

In order to incorporate high-level feature loss on the basis of pixel loss, perceptual loss [44] is introduced. The perceptual loss uses the pretrained VGG [45] network to extract the high-level features of the image and constructs the perceptual loss through the Euclidean distance between the HR image features and the SR image features to restore the perceptual quality of the image. The formula of perceptual loss is as follows:where denotes the -th layer output of the VGG model.

3.3.3. Edge-Aware Loss

In order to combine the loss of image edge information on the basis of pixel loss, we further introduce edge-aware loss [46]. In edge-aware loss, edges of the SR image and HR image are extracted according to the edge extraction operator, and then the difference is calculated between the output and the label edge. In this paper, Laplacian operator is used to extract edge features. The formula of edge-aware loss is as follows:where denotes an edge extraction method based on Laplacian operator.

3.3.4. Our Composite Loss

Our loss function uses L1 loss as the basic loss function, adds perceptual loss to avoid the loss of high-level features, and adds edge perceptual loss to further monitor the integrity of image edge information. The formula is as follows:where and are hyper-parameters.

We use our composite loss to optimize the proposed model, and the algorithm for training the model is shown in Algorithm 1.

4. Experiments

4.1. Dataset

The IXISR dataset was constructed by Zhao et al. through further processing of IXI dataset [41], which contains three types of MR images: 81 T1 volumes, 578 T2 volumes, and 578 PD volumes. In this work, we take the intersection of these three types of MR images to obtain 576 3D volumes of each type of MR image. These 3D volumes are then trimmed to 240 240 96 to fit the three scaling factors. For SISR, each 3D MR voxel is divided into 96 gray-scale images. LR images are generated based on bicubic downsampling and K-space truncation. As for truncation degradation, HR images are first converted to k-space by discrete Fourier transform (DFT) and then truncated along the height and width directions.

	Input: High-resolution image dataset , magnification times and patch size .
	Output: Super-resolution (SR) Model .
(1)	Initializing the SR model ;
(2)	for in do
(3)	;
(4)	;
(5)	;
(6)	;
(7)
(8)
(9)
(10)
(11)	Back propagation update according to gradient .
(12)	return Model ;

4.2. Implementation Details

Our method is implemented by using the paddle framework. Similar to the previous work, in the IXISR [41] dataset, we use 70% of the images as the training dataset, 10% as the validation dataset, and 20% as the test dataset. The size of the small batch is set to 16, and the parameter in the loss function is set to 0.3, the parameter is set to 0.1, and the parameter is set to 2. We use a size of randomly extracted from LR slices and the corresponding HR area. Data enhancement is simply achieved by random horizontal flipping and 90 degree rotation [25]. And millions of iterative trainings are conducted on the NVIDIA GeForce GTX 3090 GPU. We use Xavier initialization [47] and Adam optimizer for all model parameters and an initial learning rate of 0.001 for iterative optimization. Through the optimization of Algorithm 1, a single iteration of the proposed model including all modules takes about one minute. The space complexity depends on the number of parameters involved in the calculation. Specifically, the representation of the number of parameters is reflected in Table 1.

4.3. Evaluation Metrics

For quantitative comparison, highly reliable metrics are introduced, such as root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM). The calculated metric scores are derived from the comparison of the results obtained by the super-resolution method and the high-resolution image .

4.3.1. Root Mean Square Error (RMSE)

where and together represent the position of the pixel in and .

4.3.2. Peak Signal-to-Noise Ratio (PSNR)

where is the number of bits per pixel value, which generally takes 8.

4.3.3. Structural Similarity Index (SSIM)

where is obtained by the super-resolution method and is the high-resolution image, respectively; and are the average; and are the standard deviation; is the covariance of and ; and and are small constants.

4.4. Experimental Results

In this paper, the expressiveness of different models is compared in the case of the IXISR dataset (PD, T1, and T2) of super-resolution. PSNR, SSIM, and RMSE are used to evaluate the expressiveness of the model. Subdatasets are used under two different sampling (bicubic degradation and truncation degradation) in the dataset. Bicubic downsampling is widely used by LR image generation simulation in SR images, where bicubic downsampling is used to downsample HR images and generate LR images. Truncation degradation is a process that simulates the real image acquisition process. The LR image is obtained by k-space truncation, which means that the HR image is intercepted in frequency space for sampling.

Tables 1 and 2, respectively, show the evaluation results of different models of PD, T1, and T2 datasets under the bicubic downsampling and truncation degradation methods. From Figures 4 and 5, we can see that our model has higher expression ability than other models. Compared with the two residual-based networks SRResNet and EDSR, our module adds PCFB, which helps to improve the performance of the model.

4.5. Ablation Studies

The proposed method is based on the improvement of SSResNet, so the ablation experiment will also be carried out around SSResNet. In Tables 3 and 4, we compare the number of parameters and the performance in PSNR, SSIM, and RMSE for all methods. Note that all results are the average values of PSNR, SSIM, and RMSE calculated from MR images on the same dataset. The experimental results show that the proposed method improves the PSNR, SSIM, and RMSE of LR images obtained from BD and TD by and , respectively, compared with SRResnet, although the amount of parameters is only lower. This shows that PCFB is more effective.

In order to evaluate the effectiveness of the composite loss we constructed, we performed ablation experiments with different loss functions on the PD data in the dataset, as shown in Table 5. Compared with L1 and L2 loss functions, the PSNR performance of our composite loss function is and higher than that of only using L1 and L2 loss, respectively, which is a very significant increase. However, there is no significant decrease in RMSE, which only decreases by 0.0013 and 0.0014. In conclusion, the above results show that the loss function designed by us can retain more effective features and provide more reference value for medical applications.

4.6. Model Visualization

In order to understand the ability of the proposed model, the model trained in the comparative experiment is used to visually predict the test data. Our method is compared with Bicubic, ESPCNN, VRCNN, SRResNet, and EDSR on the datasets obtained by the two down sampling methods. The visual results are shown in Figures 4 and 5. It can be seen from the enlarged detail feature map that the image reconstructed by Bicubic, ESPCNN, VRCNN, SRResNet, and EDSR methods still has fuzzy distortion to varying degrees, and the visual perception effect is inferior to our method.

5. Conclusion and Future Work

High-resolution MR images have smaller voxel sizes, providing clinical physicians with more accurate structural and textural details. However, generating high-resolution MR images usually incurs enormous costs. Image super-resolution is an effective and cost-efficient alternative technique for high-resolution restoration of low-resolution images. In this work, we propose a novel end-to-end MR image super-resolution method. First, we introduced a parameter-free block fusion block (PCFB) that can split the feature map into n branches for better fusion features without parameters. Second, a training strategy combining perceptual loss, gradient loss, and LI played an important role in accelerating model fitting and improving prediction accuracy. Finally, the proposed method is effective in the super-resolution task of MR images, improving model accuracy. Our future work needs to focus more on lightweight processing of the model to reduce the model’s parameters while achieving the optimal model accuracy mentioned in the paper.

Data Availability

The IXISR dataset used to support the findings of this study are included within the article [41].

Disclosure

Mingyang Hou and Hongyi Wang should be considered as co-correspondents.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Mingyang Hou and Hongyi Wang have made equal contributions to this work.

Acknowledgments

This work was supported in part by the West Light Foundation of the Chinese Academy of Science, the Research Foundation of the Natural Foundation of Chongqing City (cstc2021jcyjmsxmX0146), the Scientific and Technological Research Program of Chongqing Municipal Education Commission (KJZDK201901504, KJQN201901537), the Bingtuan Science and Technology Program in China (Grant No. 2021AB026), the Scientific Research Foundation of Chongqing University of Science and Technology (Grant no. ckrc2020027), and the Chongqing Science and Technology Military-Civilian Integration Innovation Project (2022).

References

C. Tian, X. Zhang, J. C.-W. Lin, W. Zuo, Y. Zhang, and C.-W. Lin, “Generative adversarial networks for image super-resolution: a survey,” 2022, https://arxiv.org/abs/2204.13620.
View at: Google Scholar
W. Su, Y. Huang, Q. Li, F. Zuo, and L. Liu, “Infrared and visible image fusion based on adversarial feature extraction and stable image reconstruction,” vol. 71, pp. 1–14, 2022.
View at: Google Scholar
Q. Wu, Y. Li, Y. Sun et al., “An arbitrary scale super-resolution approach for 3d mr images via implicit neural representation,” vol. 27, pp. 1004–1015, 2023.
View at: Google Scholar
H. Wu, N. Ni, and L. Zhang, “Learning dynamic scale awareness and global implicit functions for continuous-scale super-resolution of remote sensing images,” vol. 61, pp. 1–15, 2023.
View at: Google Scholar
Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, p. 436, 2015.
View at: Google Scholar
J. Zhang, X. Zou, L.-D. Kuang, J. Wang, R. S. Sherratt, and X. Yu, Cctsdb 2021: A More Comprehensive Traffic Sign Detection Benchmark, Springer, Berlin Heidelberg, 2022.
R. Xia, Y. Chen, and B. Ren, “Improved anti-occlusion object tracking algorithm using unscented rauch-tung-striebel smoother and kernel correlation filter,” Elsevier, vol. 34, pp. 6008–6018, 2022.
View at: Google Scholar
J. Zhang, W. Feng, T. Yuan, J. Wang, and A. Kumar Sangaiah, “Scstcf: spatial-channel selection and temporal regularized correlation filters for visual tracking,” Elsevier, vol. 118, Article ID 108485, 2022.
View at: Google Scholar
C. Tian, Y. Zhang, W. Zuo, C.-W. Lin, D. Zhang, and Y. Yuan, “A heterogeneous group cnn for image super-resolution,” 2022, https://arxiv.org/abs/2209.12406.
View at: Google Scholar
C. Tian, Y. Xu, W. Zuo, B. Zhang, L. Fei, and C.-W. Lin, “Coarse-to-fine cnn for image super-resolution,” Transactions on Multimedia, vol. 23, pp. 1489–1502, 2020.
View at: Google Scholar
C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 2, pp. 295–307, 2016.
View at: Google Scholar
C. Dong, C. Change Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in Proceedings of the European Conference on Computer Vision, pp. 391–407, Amsterdam, The Netherlands, October 2016.
View at: Google Scholar
J. Kim, K. Jung, and K. M. Lee, “Deeply-recursive convolutional network for image super-resolution,” in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1637–1645, Las Vegas, NV, USA, June 2016.
View at: Google Scholar
Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deep recursive residual network,” in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2790–2798, Honolulu, HI, USA, July 2017.
View at: Google Scholar
J. Kim, K. Jung, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1646–1654, Las Vegas, NV, USA, June 2016.
View at: Google Scholar
Y. Chen, R. Xia, K. Zou, and K. Yang, “Ffti: image inpainting algorithm via features fusion and two-steps inpainting,” Journal of Visual Communication and Image Representation, vol. 91, Article ID 103776, 2023.
View at: Google Scholar
H. Zheng, K. Zeng, D. Guo et al., “Multi-contrast brain mri image super-resolution with gradient-guided edge enhancement,” IEEE Access, vol. 6, pp. 57856–57867, 2018.
View at: Google Scholar
F. Deeba, S. Kun, F. Ali Dharejo, and Y. Zhou, “Wavelet-based enhanced medical image super resolution,” IEEE Access, vol. 8, pp. 37035–37044, 2020.
View at: Google Scholar
Z. Chen, X. Guo, Y. Peter, M. Woo, and Y. Yuan, “Super-resolution enhanced medical image diagnosis with sample affinity interaction,” IEEE Transactions on Medical Imaging, vol. 40, no. 5, pp. 1377–1389, 2021.
View at: Google Scholar
X. Bing, W. Zhang, L. Zheng, and Y. Zhang, “Medical image super resolution using improved generative adversarial networks,” IEEE Access, vol. 7, pp. 145030–145038, 2019.
View at: Google Scholar
S. Zhang, G. Liang, S. Pan, and L. Zheng, “A fast medical image super resolution method based on deep learning network,” IEEE Access, vol. 7, pp. 12319–12327, 2019.
View at: Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, Las Vegas, NV, USA, June 2016.
View at: Google Scholar
A. Dosovitskiy, L. Beyer, K. Alexander et al., “An image is worth 16x16 words: transformers for image recognition at scale,” in Proceedings of the International Conference on Learning Representations, Vienna, Austria, May 2021.
View at: Google Scholar
L. Yang, R. Y. Zhang, L. Li, and X. Xie, “Simam: a simple, parameter-free attention module for convolutional neural networks,” in Proceedings of the International Conference on Machine Learning, Las Vegas, NV, USA, July 2021.
View at: Google Scholar
B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1132–1140, Honolulu, HI, USA, July 2017.
View at: Google Scholar
C. Tian, Y. Yuan, S. Zhang, C. W. Lin, W. Zuo, and D. Zhang, “Image super-resolution with an enhanced group convolutional neural network,” Neural Network, vol. 153, pp. 373–385, 2022.
View at: Google Scholar
Y. Chen, R. Xia, K. Yang, and K. Zou, Mffn: Image Super-Resolution Via Multi-Level Features Fusion Network, Springer, Berlin, Germany, 2023.
Y. Li, B. Sixou, and F. Peyrin, “A review of the deep learning methods for medical images super resolution problems,” IRBM, vol. 42, no. 2, pp. 120–133, 2021.
View at: Google Scholar
J. Zhu, C. Tan, J. Yang, G. Yang, and P. Lio’, “Arbitrary scale super-resolution for medical images,” International Journal of Neural Systems, vol. 31, no. 10, Article ID 2150037, 2021.
View at: Google Scholar
Z. Niu, G. Zhong, and Y. Hui, “A review on the attention mechanism of deep learning,” Neurocomputing, vol. 452, pp. 48–62, 2021.
View at: Google Scholar
Y. Liu, Z. Dong, K. P. Lim, and L. Nam, “A densely connected face super-resolution network based on attention mechanism,” in Proceedings of the 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), pp. 148–152, Kristiansand, Norway, November 2020.
View at: Google Scholar
X. Wang, R. Chen, B. Huang, and Q. Zhou, “Enhanced context attention network for image super resolution,” IEEE Sensors Journal, vol. 21, no. 10, pp. 11665–11673, 2021.
View at: Google Scholar
B. Liu and J. Chen, “A super resolution algorithm based on attention mechanism and srgan network,” IEEE Access, vol. 9, pp. 139138–139145, 2021.
View at: Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: surpassing human-level performance on imagenet classification,” in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034, Santiago, Chile, December 2015.
View at: Google Scholar
X. Zhang, W. Zhang, Y. Wei, J. Sun, J. Yan, and R. Wan, “Towards stablizing batch statistics in backward propagation of batch normalization,” in Proceedings of the International Conference on Learning Representations, Addis Ababa Ethiopia, May 2020.
View at: Google Scholar
D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Improved texture networks: maximizing quality and diversity in feed-forward stylization and texture synthesis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6924–6932, Honolulu, HI, USA, July 2017.
View at: Google Scholar
L. Chen, X. Lu, J. Zhang, X. Chu, and C. Chen, “Hinet: half instance normalization network for image restoration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 182–192, Nashville, TN, USA, June 2021.
View at: Google Scholar
Z. Liu, Y. Lin, Y. Cao et al., “Swin transformer: hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022, Montreal, BC, Canada, October 2021.
View at: Google Scholar
S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M. H. Yang, “Restormer: efficient transformer for high-resolution image restoration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5728–5739, New Orleans, LA, USA, June 2022.
View at: Google Scholar
X. Chu, L. Chen, and W. Yu, “Nafssr: stereo image super-resolution using nafnet,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 1239–1248, New Orleans, LA, USA, June 2022.
View at: Google Scholar
X. Zhao, Y. Zhang, T. Zhang, and X. Zou, “Channel splitting network for single mr image super-resolution,” IEEE Transactions on Image Processing, vol. 28, no. 99, pp. 5649–5662, 2019.
View at: Google Scholar
W. Shi, J. Caballero, F. Huszár, J. Totz, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, June 2016.
View at: Google Scholar
Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2472–2481, Salt Lake City, UT, USA, June 2018.
View at: Google Scholar
C. Ledig, L. Theis, F. Huszár et al., “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 105–114, Honolulu, HI, USA, July 2017.
View at: Google Scholar
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014, https://arxiv.org/abs/1409.1556.
View at: Google Scholar
W. Li, X. Tao, T. Guo, Q. Lu, J. Lu, and J. Jia, “Mucan: multi-correspondence aggregation network for video super-resolution,” in Proceedings of the European Conference on Computer Vision, pp. 335–351, New York City, NY, USA, August 2020.
View at: Google Scholar
X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of the Thirteenth International Conference On Artificial Intelligence And Statistics, pp. 249–256, Sardinia, Italy, May 2010.
View at: Google Scholar
Y. Dai, L. Dong, and F. Wu, “A convolutional neural network approach for post-processing in hevc intra coding,” in Proceedings of the International Conference on Multimedia Modeling, Reykjavik, Iceland, January 2017.
View at: Google Scholar
H. Zhao, X. Kong, J. He, Y. Qiao, and C. Dong, “Efficient image super-resolution using pixel attention,” in Proceedings of the Computer Vision–ECCV 2020 Workshops, pp. 56–72, Glasgow, UK, August 2020.
View at: Google Scholar
L. Chen, X. Chu, X. Zhang, and J. Sun, “Simple baselines for image restoration,” in Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, pp. 17–33, Aviv, Israel, October 2022.
View at: Google Scholar
C. Ledig, L. Theis, F. Huszar et al., Photo-realistic Single Image Super-resolution Using a Generative Adversarial Network, 2016, https://arxiv.org/abs/1609.04802.

Copyright

Copyright © 2023 Qi Han et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Journal of Healthcare Engineering

A Network with Composite Loss and Parameter-free Chunking Fusion Block for Super-Resolution MR Image

Abstract

1. Introduction

2. Related Work

2.1. Super-Resolution in Deep Learning

2.2. Super-Resolution in Medical Imaging

2.3. Attention Mechanism for Vision Tasks

3. Methods

3.1. Overview

3.2. Network Architecture

3.2.1. Feature Extraction

3.2.2. Parameter-Free Chunking Fusion Block (PCFB)

3.2.3. Image Reconstruction Block

3.3. Our Composite Loss for Super-Resolution

3.3.1. Conventional Loss

3.3.2. Perceptual Loss

3.3.3. Edge-Aware Loss

3.3.4. Our Composite Loss

4. Experiments

4.1. Dataset

4.2. Implementation Details

4.3. Evaluation Metrics

4.3.1. Root Mean Square Error (RMSE)

4.3.2. Peak Signal-to-Noise Ratio (PSNR)

4.3.3. Structural Similarity Index (SSIM)

4.4. Experimental Results

4.5. Ablation Studies

4.6. Model Visualization

5. Conclusion and Future Work

Data Availability

Disclosure

Conflicts of Interest

Authors’ Contributions

Acknowledgments

References

Copyright