Abstract

To overcome the disadvantages of the traditional block-matching-based image denoising method, an image denoising method based on block matching with 4D filtering (BM4D) in the 3D shearlet transform domain and a generative adversarial network is proposed. Firstly, the contaminated images are decomposed to get the shearlet coefficients; then, an improved 3D block-matching algorithm is proposed in the hard threshold and wiener filtering stage to get the latent clean images; the final clean images can be obtained by training the latent clean images via a generative adversarial network (GAN).Taking the peak signal-to-noise ratio (PSNR), structural similarity (SSIM for short) of image, and edge-preserving index (EPI for short) as the evaluation criteria, experimental results demonstrate that the proposed method can not only effectively remove image noise in high noisy environment, but also effectively improve the visual effect of the images.

1. Introduction

The typical image denoising methods can be commonly classified into three schemes: the filtering-based methods, the decomposition-based methods, and dictionary learning-based methods [1]. The classical filtering includes the median filter [2] and the wiener filter [3]. The basic principle of decomposition-based methods is to decompose the contaminated images into low-pass and high-pass subbands and then separate the image noise by manipulating the obtained coefficients. And, the wavelet transform is the typical decomposition tool. The basic principle of the dictionary learning-based methods is to sparsely represent the noisy image by overcomplete atoms, and only the large representation coefficients are used to reconstruct the original image while the small coefficients are discarded [46]. Another famous strategy for denoising is based on the self-similarity of the image, such as the block-matching and 3D filtering (BM3D) algorithm [7]. The above methods usually perform well, but they will lose the edge, texture, and other details and result in blur and block effect when the noise is heavy.

Nowadays, since the adaptive space estimation strategy for nonlocal mean [8]can effectively alleviate the high complexity and low efficiency of nonlocal mean filtering, the nonlocal similarity becomes an effective feature for image denoising [9]. Its main disadvantage, however, is the loss of the directions. So, the geometric regularity of images cannot be effectively captured to sparsely represent the features of the original images [10]. Besides, the computation of the nonlocal similarity is implemented only in the spatial domain. As reported, the multiscale geometric transformations, such as contourlet transform, can greatly help to suppress the heavy noise in the frequency domain [11]. But this scheme is limited to the mathematical properties of the selected transformation. On the other hand, the deep learning technologies, such as the convolutional neural network [12] and the generative adversarial network [13], have achieved great success in the areas of image classification, target recognition [14], and image fusion [15].

Therefore, a novel image denoising method based on block matching with 4D filtering in the 3D shearlet transform domain and the generative adversarial network is proposed. The 3D shearlet transform provides better mathematical properties than the commonly used wavelet or contourlet to capture the anisotropic features of images in different scales and directions. In addition, the traditional BM3D is extended into four 4D space, which can effectively improve the edge and texture details of the images. The output of the BM4D is used as the input to the designed generative adversarial network to make full use of its good learning ability.

The remainder of this paper is organized as follows. The details of the proposed method are presented in Section 2. Experimental results and discussions are shown in Section 3. Finally, the whole paper is concluded in Section 4.

2. Methodology

2.1. The Architecture of Proposed Method

The overall structure of the proposed method is shown in Figure 1. Through multiscale decomposition and directional partition, the 3D shearlet coefficients are obtained. Then, the coefficients are input into the hard threshold and wiener filtering contained in the BM4D model. For the hard threshold, the similar cubes are extracted from volume observation and then they are combined together. If the distance between the cubes is smaller than the setting threshold, the collaborative filters are carried out to apply on the similar cubes. In the aggregation process, the basic volume estimation is generated by the adaptive convex combination. For the process of the wiener filtering, the aggregation is implemented by the inversion of the shearlet transformation. Finally, the clean results are obtained by the designed generative adversarial network.

2.2. The 3D Shearlet Transform

In the 3D space, the shearlet region is obtained by combining the function system associated with the pyramid region in the 3D Fourier space .

For , the 3D shear system associated with in the shearlet region is defined as a set:where , , and . is the specific anisotropic expansion matrix and is the specific shear matrix. More details can be found in the literature [16, 17]. In Figure 2, a shearlet in 3D space is shown.

2.3. The Improved 3D Block-Matching Algorithm

Let be the noise form:where is the original unknown volume signal, is the 3D signal, and is an independent and uniformly distributed Gaussian noise, whose standard deviation is noted as .

The goal of the improved 3D block-matching algorithm [18]was to obtain the estimation of from the noise observations. The implementation of the improved algorithm was divided into two cascade stages: the hard threshold and wiener filtering [19], each of which includes 3 steps: grouping, collaborative filtering, and aggregation.

For the hard threshold, let represent a cube whose volume is , where was extracted from . The similarity between two cubes is measured by photometric distance:where represents the summation of the squared differences between the corresponding intensities of the two cubes and the denominator is the normalization factor. No prefiltering is performed before cube matching, so the similarity of noise observations can be directly tested.

In the grouping step, cubes similar to each other are extracted from and combined to form a group for each cube . If the distance between two cubes was not larger than the predefined threshold , the two cubes are considered to be similar. Similar to , we here firstly define a set that contains the indexes for the cubes as follows:

Then, a four-dimensional group is built by the above formula:where the reference cube (represented by ) matches a set of similar cubes located in the 3D data. Particularly, the coordinates and correspond to the tail and head of the arrow connecting the cubes in formula (4), respectively. Note that since the distance from any cube to itself is always 0, according to the definition of formula (4), each formula (5) must contain at least its reference cube.

In the collaborative filtering step, a joint four-dimensional transformation was applied to each dimension of equation (5), respectively. Then, by a hard threshold operator with the threshold , the obtained four-dimensional group spectrum is

Representing the filter group, it is transformed into the following form:

For each unknown volume data , the estimated of the original was extracted separately. Formula (7) was an overcompleted representation of the denoising data because there may be overlap between the cubes in the same group and different groups.

In the aggregation step, the redundancy is used to generate a basic volume estimation by adaptive convex combination:where is the group-related weight and is the feature (indicator) function of the domain, that is, , at the coordinates of . The weight is defined aswhere is the standard deviation of noise in and is the number of nonzero coefficients in formula (6). Since the coefficient always remains the same after doing the threshold operation, that is, , the denominator of equation (9) is never equal to zero. The numerical has two functions: on the one hand, it measures the sparsity of the threshold spectrum in (5) and on the other hand, it approximates the total residual noise variance of the group estimation in (6). As a result, the groups that are in a high degree of correlation will be given more weight, while other groups with larger residual noise are punished with less weight.

For the Wiener filtering, the cube matching is performed within the basic estimation of . In fact, since the noise level in is much smaller than that in the noise observation , it is expected to obtain the more reliable match to make the packet data more sparse. Formally, for each reference cube extracted from the basic estimated , its cube-like coordinate set is constructed as follows:

The collaborative filtering here is implemented as an empirical Wiener filter. Similar to formula (6), it firstly uses the coordinate set (9) to extract a set of from and then defines the empirical Wiener filter coefficients aswhere is the standard deviation of noise and is a transformation operator that is composed of four one-dimensional linear transformations. Such transformations are usually different from those in . Subsequently, the same set of formula (10) is used to extract the second noise group from the observed , noted as . An element multiplication is implemented between the spectrum of the noise group and the wiener filter coefficient formula (11) as the coefficient shrinkage rate. The group’s estimations are

Then, the inversion of the 3D shearlet transform [14] is applied to shrink the spectrum. The final estimation of is generated by convex combination, which is similar to formula (8), and formula (4) is replaced by formula (10). The aggregation weight of the specific group estimation (11) is defined by the energy of the wiener filter coefficient (12):

In this way, each formula (13) provides an estimation of the total residual noise variance of the corresponding formula (12).

2.4. The Generative Adversarial Network for Training

After obtaining the intermediate results, we can obtain the final denoising image by training the generative adversarial network (GAN) [20]. The training process is shown in Figure 3.

2.4.1. The Generator Network

The generated network generates a fake image from the noise image, as shown in Figure 4. The generation network consists of 11 cascaded convolution layers, which are trained to learn the label image and the residual image of the input image. Internal connection is introduced into each block to save information and reduce the training time. In order to maintain good performance and reduce computational complexity, the network adopts a bottleneck structure, in which the number of the first feature mapping, the middle layer, and the last layers are 64 layers. According to suggestion from reference [21, 22], for low-level computer vision problems, a 3×3 convolution kernel is used in each convolution layer and the linear unit (ReLU) is used as the activation function.

2.4.2. The Discriminator Network

As shown in Figure 5, the discriminator network is trained to distinguish the fake image and the real image. It has four convolution blocks and two fully connected layers. Each convolution block consists of the convolution layer, the batch normalization layer, and the ReLU activation function. The size of the core K is 3×3, and the number of filters N increases from 64 to 256. The step size S of each convolution layer is 2 to reduce the resolution of the image. The probability that the inputting image is noiseless is generated by a fully connected layer of 1024 neurons.

2.4.3. Adversarial Training

The aim is to use the adversarial strategy to train a model to remove the image noising. Adversarial training is a way to train the generator network to generate samples from real data . The generator is input into a noise variable with a distribution and then trained to learn the mapping to the data space. The distribution of the generator model iswhere is the parameter of the generator network. When training a generator, the essentially exception is to maximize the probability that the samples match the data, which can be noted as .

To guarantee the above probability, the discriminator network , whose input is the data sample and the output is the , should learn to distinguish the generated samples from real samples. It must maximize the probability value assigned to the actual data samples and minimize the probability value assigned to the generated samples, that is,

Both the generator and discriminator networks are alternately trained, and they try to cheat each other. Finally, when the generator has successfully learned how to generate the samples from , the whole process is converged.

In Figure 6, the experiments on four groups of color images show the necessity and effectiveness of using the GAN.

3. Experimental Results and Discussions

In this section, two groups of experiments are implemented to show the performance of the proposed method. The platform is the Dell workstation M4800 with the Intel CPU 2.5 GHz and 32G RAM, operating under Matlab and Python. PSNR [23], SSIM [24], and the edge-preserving index (EPI for short) [25] are used as objective evaluation measurements. PSNR can be computed by the following formula:where is the peak of , , and is the base of .

The SSIM is defined aswhere , , and can be computed asin which and are the reference image and the image to be tested, respectively, and are the mean values of the two images, and are the standard deviation, is the covariance of and , and , , and are the small constants whose values are positive. It is mainly to avoid the instability when the denominator is 0 in the above formula. When and , then we can get

Based on the edge contrast, the edge-preserving index (EPI) is defined aswhere is a pixel from the testing image, is a pixel from the original image, and are located in the edge region, is the number of rows, and is the number of columns. The range of EPI is 0 to 1. When EPI equals to 1, the edge of the image is completely maintained. When the EPI equals to 0, it means the image becomes a plane without any change. The larger the EPI value, the stronger the edge holding ability.

3.1. Experiment 1

Adding to the Gaussian white noise with different standard deviations, the first experiment is implemented under some popularly used natural images shown in Figure 7. Five famous methods are used to be the comparative methods, including BM3D [26], EPLL [12], TNRD [27], WNNM [28], and WSNM [29]and the proposed method (proposed for short). All the results are reported in Tables 13, respectively.

In addition, the visual performance of different methods when the noise level is shown in Figure 8 to provide the subjective comparison. A small region, marked by the red rectangle, is enlarged to clearly display the visual difference.

3.2. Experiment 2

The second experiment is implemented under Gaussian noise and Rician noise. The testing volume data is a T1 brain network voxel with a size of 181 × 217 × 181. The slice thickness is 1 mm. Without loss of generality, it is assumed that all the images are normalized to a real-valued signal in the intensity range [0, 1]. The experimental results are shown in Figures 9 and 10.

In Figures 9 and 10, the first row is one slice of the original volume data. In each column, it shows the lateral, coronal, sagittal, and cross results when the standard deviation is 0.05, 0.11, and 0.015, respectively. The results of under OB-NLM3D [30], OB-NLM3D-WM [31], ODCT3D [32], PRI-NLM3D [33], BM4D [34] and the proposed method are reported in Tables 4 and 5. All the quantitative results are shown in Tables 4 and 5, respectively.

By the comparison of all the above experimental results, it strongly demonstrates the efficiency of the proposed fusion methods, both visually and quantitatively. Specially, among the algorithms considered, the PSNR and SSIM performance of the proposed method shows that the better results can be obtained, when the noise level increases. In addition, better visual effects can be obtained from the smoothness of flat areas, the preservation of details along the edges, and the accurate restoration of phantom intensity.

The time complexity of the proposed method is . The cost of computing time is indeed an annoying problem at present. The main reason is that the process of our method contains more steps than other methods. The good news is that our method is not the slowest. We think it will be effectively solved in the future by improving the hardware conditions. In addition, some parallel computing methods (CUDA will be used in our plan) can also be used to reduce the time cost. This is also the work we will deal in the future.

4. Conclusion

In this paper, an image denoising method that is based on the BM4D in the 3D shearlet transform domain and GAN is proposed. The proposed method can make full use of the sparse representation of the 3D shearlet transform and the good deeply learning ability of the generative adversarial network. All the experimental results prove the effectiveness and accuracy of the proposed method in the form of subjective comparison and objective quantification, strongly demonstrating the superiority of the proposed method over the traditional image denoising methods.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by a Project of Shandong Province Higher Educational Science and Technology Program (J18KA362), the National Natural Science Foundation of China (61502282, 61902222), the Natural Science Foundation of Shandong Province (ZR2015FQ005), the Taishan Scholars Program of Shandong Province (tsqn201909109), and the Open Fund Project of Shandong Key Laboratory of Information Technology of Intelligent Mine in Shandong University of Science and Technology.