Spectral Normalized CycleGAN with Application in Semisupervised Semantic Segmentation of Sonar Images

Zhang, Zhisheng; Tang, Jinsong; Zhong, Heping; Wu, Haoran; Zhang, Peng; Ning, Mingqiang

doi:https://doi.org/10.1155/2022/1274260

Computational Intelligence and Neuroscience

On this page

Abstract Introduction Related Work Analysis Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 1274260 | https://doi.org/10.1155/2022/1274260

Spectral Normalized CycleGAN with Application in Semisupervised Semantic Segmentation of Sonar Images

Zhisheng Zhang,¹Jinsong Tang,¹Heping Zhong,¹Haoran Wu,¹Peng Zhang,¹and Mingqiang Ning¹

Academic Editor: Andrea Loddo

Received19 Feb 2022

Revised26 Mar 2022

Accepted15 Apr 2022

Published28 Apr 2022

Abstract

The effectiveness of CycleGAN is demonstrated to outperform recent approaches for semisupervised semantic segmentation on public segmentation benchmarks. In contrast to analog images, however, the acoustic images are unbalanced and often exhibit speckle noise. As a consequence, CycleGAN is prone to mode-collapse and cannot retain target details when applied directly to the sonar image dataset. To address this problem, a spectral normalized CycleGAN network is presented, which applies spectral normalization to both generators and discriminators to stabilize the training of GANs. Without using a pretrained model, the experimental results demonstrate that our simple yet effective method helps to achieve reasonably accurate sonar targets segmentation results.

1. Introduction

Compared with real aperture side-scan sonar, synthetic aperture sonar (SAS) has higher hydrographic surveying and charting speed and can produce higher resolution images [1–3]. The accurate detection and identification of underwater targets in synthetic aperture sonar continue as a significant issue [4–6]. However, according to the principle of imaging by backprojection [2], images of the same underwater target obtained from different views are different and complex in shape and contour [7], which is hard to be labeled by a supervised detection method [8]. By contrast, semantic segmentation labels mark the outline of the target on the image, which excludes the background area. Thus, the accurate result of semantic segmentation is significant for identifying underwater targets in synthetic aperture sonar and estimating their type, location, scale, direction, and so on [9, 10].

In recent years, deep convolutional neural networks (DCNNs) have been widely used in acoustic image processing. Supervised learning is a generally accepted method for acoustic image semantic segmentation [11]. Combined with specific modules based on the characteristics of sonar images, the improved network can achieve better segmentation results than the original structure. For instance, recurrent residual convolutional neural network is combined with the self-guidance module to help discriminate whether the input image is the segmentation result or the ground truth label [12]. FCN is combined with dilated convolution, dense module, and inception [13]. This method can decrease the parameters of the network and speed up segmentation. Receptive field block and attention search function were integrated into the residual convolutional neural network. This model can help enhance the contrast between the underwater target and the background [14].

However, the main reason for limiting its application to sonar image semantic segmentation is that supervised learning requires a large amount of pixel-level labeled data, which is time-consuming. Besides, a large amount of training data are often unavailable as actual experiments are very expensive and often limited in scale. Thus, semisupervised learning is an important area of research to overcome the problem of limited labeled data. To our knowledge, little work has been done to investigate semisupervised learning for sonar image semantic segmentation.

As soon as the generative adversarial network (GAN) is proposed [15], it is widely used in the field of semisupervised learning. For example, SGAN (semisupervised GAN) [16] is used for multiobjective classification, CC-GAN (context-conditional GAN) [17] can generate oil painting images, and BUS-GAN [18] is applied to improve the segmentation quality of breast lesions from ultrasound images. Recently, the CycleGAN model [19] has become the mainstream choice of image style conversion between domains because it reduces the limitation of image pairing in the training process. It can be applied to semisupervised semantic segmentation by learning a bidirectional mapping from unlabeled real images to available ground truth labels. Jiang et al. firstly exploited this capability to transfer CT to MRI for lung cancer segmentation [20]. Mondal et al. leveraged cycle-consistency loss, which preserves critical attributes between the input and the transformed image to add an unsupervised regularization effect that boosts the segmentation performance when labeled data are limited [21]. His experiments were conducted on three different public semantic segmentation benchmarks: PASCAL VOC 2012 [22], Cityscapes [23], and the Automated Cardiac Diagnosis Challenge (ACDC) [24], whose accuracy is proved better than the traditional adversarial learning method.

However, it is found that CycleGAN tends to generate the same type of segmentation results (mode-collapse) and fails to preserve targets’ details for the case of the scarcity and imbalance of sonar image target samples. It is demonstrated by previous research that constraining the Lipschitz constant of the discriminator mapping function can stabilize the training of GANs (L-constraint). The first method to satisfy L-constraint was proposed by WGAN: gradient penalty item was added to the discriminator loss function. The disadvantage of this method is that it can only approximately satisfy the L-constraint only if the number of categories in the training sample is small. The spectral normalization was proposed to limit the Lipschitz constant of the discriminator by limiting the spectral norm of each neural network layer. It can satisfy the L-constraint accurately and does not need the additional hyperparameters tuning. Thus, compared with other normalization techniques, the computation of spectral normalization is relatively tiny. Therefore, it is reasonable to apply spectral normalization to both generators and discriminators.

The main contributions of this paper are as follows:(1)We refine Mondal’s [21] semisupervised model and validate its efficiency for two acoustic image datasets. To our knowledge, this is the first investigation applying semisupervised learning for acoustic image segmentation.(2)The spectral normalization is applied to both generator and discriminator to improve the training stability of the CycleGAN.(3)We make two sonar image datasets, SCTDI and SCTDII. SCTDI contains 300 images of three types of targets (shipwreck, aircraft wreckage, and victims). SCTDII contains 800 images of the tiny targets. All images have a fixed resolution of 320 × 320 and 9.6 bits per pixel.

In this section, we review work related to semisupervised segmentation concerning three different aspects-The Application of CycleGAN, Techniques to Stabilize Training of GAN, and Recent Work in Semisupervised Semantic Segmentation.

2.1. The Application of CycleGAN

One of the applications of CycleGAN is image synthesis, which is widely used in medical imaging data augmentation.

Image synthesis refers to the mapping between different domains. The image domain in the medical field includes CT images and MRI images. Hiasa et al. [25] proposed a CT to MRI synthesis method using CycleGAN. They extended the CycleGAN approach by adding the gradient consistency loss to improve the accuracy at the boundaries. Huo et al. [26] proposed a novel end-to-end synthesis and segmentation network (EssNet). It can achieve the unpaired MRI to CT image synthesis and CT splenomegaly segmentation simultaneously. Without using manual labels on CT, it can alleviate the manual efforts.

Apart from data augmentation for medical images, CycleGAN can also be employed as a semantic segmentation network and detection framework in remote sensing images.

Dong et al. [27] estimated both segmentation results and monocular depth of three-dimensional (3D) images using CycleGAN, which is meaningful for the study of augmented reality (AR) and autonomous driving applications. Mondal et al. [21] proposed a strategy that enforces cycle consistency to learn a bidirectional mapping between unlabeled real images and real labels. Experiments on three different public segmentation benchmarks (PASCAL VOC 2012, Cityscapes, and ACDC) demonstrate the effectiveness of the proposed method, which outperforms recent approaches based on adversarial learning for semisupervised segmentation.

In the remote sensing images detection field, CycleGAN has been proven generally accepted methods for domain adaptation. For instance, CycleGAN is used to mitigate multisensor differences in a CNN-based unsupervised multiplechange detection approach proposed by Saha et al. [28]. Soto Vega et al. [29] applied the domain adaptation ability of CycleGAN to change detection tasks. This framework can employ previously trained classifiers for new data without a significant drop in classification accuracy. Yang et al. [30] proposed a change detection framework based on selective adversarial adaptation. Adversarial learning further reduces the distribution discrepancy between the target and selected source samples. They prove that not only the positive transfer is enhanced but also the negative transfer is alleviated.

In summary, it has been shown from this review the wide use of CycleGAN’s domain adaption ability, which is applied to semisupervised sonar image semantic segmentation task in our work.

2.2. Techniques to Stabilize Training of GAN

Goodfellow et al. [15] hold the view that if both the generator and discriminator are powerful enough to approximate any real-valued function. However, GANs can be hard to train, and in practice, it is often observed that gradient descent-based GAN optimization leads to divergence and mode-collapse. A possible explanation for this might be that the network does not satisfy L-constraint.

Researchers have tried to address this instability and improve generators through several techniques. Energy-based GAN [31] and Wasserstein GAN [32] attempt to modify the objective function to improve the quality of gradients. Neyshabur et al. [33] proposed stabilizing GAN training with multiple random projections, namely, training a single generator simultaneously against an array of discriminators, which shows only a low-dimensional projection of the data. Salimans et al. [34] proposed virtual batch-normalization and semisupervised learning to provide additional supervision to the generator.

In this paper, we use spectral normalization [34] to stabilize the training of CycleGAN for semisupervised learning.

2.3. Recent Work in Semisupervised Semantic Segmentation

Compared with supervised learning, semisupervised learning can achieve satisfying performance with a small set of labeled data. Quantitative research is generally associated with consistency regularization and has yielded ground breaking results in semisupervised classification problems.

French et al. [35] investigated the conditions that can allow consistency regularization to operate in semisupervised semantic segmentation. Lai et al. [36] presented the context-aware consistency to address the problem that semisupervised models overly rely on the contexts available in the training data. Gurubisic et al. [37] presented a method with one-way consistency for practical real-time applications.

In this paper, we enforce cycle-consistency to achieve satisfying segmentation results.

3. Methodology

In this section, domain mapping is firstly introduced to illustrate the unpaired domain adaptation ability of CycleGAN. The loss function is secondly used to describe the optimization goals of the CycleGAN in the semantic segmentation task. Last, it explains why the spectral normalization method applied to both the generator and discriminator of CycleGAN can stabilize the training.

3.1. Domain Mapping

The domain adaption ability has been explained clearly by [19]. In our work, the source domain refers to sonar images, including real images and generated images. In respect, the target domain refers to labels, including ground truth labels and generated labels. Figure 1 shows examples of real images (first column), ground truth labels (second column), generated images (third column), and generated labels (fourth column) obtained for the three targets used in our experiments. The different palettes are used to distinguish them for convenience.

The types of domain mappings can be divided into unidirectional mappings (Figure 2) and circular mappings (Figure 3). The first unidirectional mappings from sonar images to generated labels refer to sonar image semantic segmentation. The circular mappings enforce cycle consistency as the regularization to enhance the semisupervised semantic segmentation performance.

3.2. Loss Functions

The data of the semisupervised dataset include three types such as labeled images , unlabeled images , and ground truth labels corresponding to labeled images .

In this work, the total loss function follows the definition of

Here, are all constant. Data from [21] suggest that these constants can be set as . The object function that boots our network to achieve reasonably accurate sonar targets segmentation from limited labeled data is as follows:

Expression (1) consists of six loss functions of training defined by Mondal [21], which can be classified as three types, generator loss (orange part), discriminator loss/adversarial loss (green part), and cycle-consistency loss (red region), shown in the following expression and Figure 4:

3.3. Spectral Normalization

Previous research has proved that constraining the Lipschitz constant of the discriminator’s mapping function can stabilize GAN training [3237]. The reason why applying the spectral normalization to the CycleGAN network can satisfy the L-constraint is given as follows.

For one layer of the fully connected neural network, the definition of the L-constraint is as follows:

Here, is the activation function, is the network parameter matrix, is the bias, is a variable about and , and and are the input parameters.

When and are close enough, equation (4) can be approximated by the first-order term as follows:

Because we use ReLU as the activation function, ; equation (5) can be written as follows:

Here, is equal to the spectral norm of the network parameter matrix , and the definition is as follows:

Here, is the maximum singular value of .

According to [38], the output and input of the whole network can be written as follows:

Here, is the activation function of each layer, N is the number of layers of network, x is the input data, and i is any single layer in the total N layers.

Taking the gradient on both sides of equation (8),

Here, is the gradient operator and is the spectral norm of the network parameter matrix. because the activation function used in the CycleGAN is ReLU. Therefore, equation (9) can be written as follows:

Finally, both sides of equation (10) are divided by , namely, spectral normalization, which makes the network satisfy L-constraint:

It means that spectral normalization of each layer helps of network satisfy the L-constraint.

The network of the generator and the discriminator which apply spectral normalization is shown in Figures 5 and 6. The architecture is based on ResNet [39], which has four layers. CSN is Conv spectral normalization, which applies spectral normalization to the conv block, BN is batch-normalization, and ReLU is the activation function. Classifier changes the input features into generated labels or images.

4. Experiments and Analysis

4.1. Sonar Image Datasets

The dataset SCTDI we made is added segmentation labels and dropped too similar images on the basis of the SCTD dataset [8]. It is composed of 300 images, including three categories: aircraft wrecks, shipwrecks, and victims, which are randomly divided into training (270 images) and validation (30 images) subsets.

The dataset SCTDII is acquired from the side-scan sonar Klein series 5000, and the website is “https://www.kleinsonar.com”. It is composed of 800 images of tiny targets, which are randomly divided into training (720 images) and validation (80 images) subsets.

All images have a fixed resolution of 320 × 320 and 9.6 bits per pixel. To reduce the need for memory, the short edges of both datasets fed into the network are shrunken into 200 pixels.

4.2. Evaluation Protocol

The mean intersection over union (mIoU) metric [40] is used to evaluate the segmentation results of all the models (supervised model, AdvSemSeg, MT-CutMix, CycleGAN, and ours), which is defined as follows:where TP, FP, and FN are the true positive, false positive, and false negative pixels, respectively, determined over the whole validation set. The larger the value of mIoU, the better the result of semantic segmentation.

4.3. Results

In this section, the training effects of our method on the sonar image datasets SCTDI and SCTDII are firstly described. We also compare the performance of spectral normalization applied to the ResNet with other network structures. Besides, we show the comparison between spectral normalization and the other stabilization methods.

The supervised training results serve as a benchmark using all the labeled images. Three state-of-the-art semisupervised methods and our model are trained on the same training subsets, which are scratched 10%, 20%, 30%, 40%, and 50% of labeled images. All methods are trained without using the pretrained model to have an unbiased comparison.

Tables 1 and 2 compare the semisupervised semantic segmentation accuracy (mIoU/%) of our model with other state-of-the-art methods on SCTDI and SCTDII dataset. The results would seem to suggest that the proposed model outperforms other methods when training with a reduced set of labeled images in all cases. Furthermore, this difference is particularly significant when pixel-level annotations are scarce (i.e., 10% and 20% of the whole training set), where the proposed model achieves 7%–11% of improvement.

The visual comparison of segmentation results is shown in Figure 7. It shows that the proposed method predicts a segmentation closer to the ground truth than other state-of-the-art methods where labeled images are limited. In addition, our model seems to capture better details like legs of persons, wings of planes, and so on. More segmentation results of different shapes and numbers are shown in Figures 8 and 9. Therefore, our approach seems robust when applied to semisupervised segmentation of acoustic images.

(a)

(b)

(c)

Table 3 shows a comparison of the semisupervised semantic segmentation accuracy (mIoU/%) when different network structures are chosen for both the generator and the discriminator. These results would seem to suggest that the spectral normalization does not rely on the different chosen networks and the ResNet has the best performance.

Table 4 shows comparison of the semisupervised semantic segmentation accuracy (mIoU/%) between the spectral normalization and the other stabilization methods. The results prove that the spectral normalization has the best performance and could be a reasonable approach to tackle the issue that limited labeled data are available for segmentation task.

4.4. Ablation Study

To further analyze the effect of the different components of the proposed model, we conduct an ablation study. The results of our ablation study are summarized in Table 5, and the visual comparisons of different methods are in Figure 10. The previous model is CycleGAN. Method 1 refers to applying transfer learning to the CycleGAN. Method 2 refers to applying spectral normalization only to the discriminator. Method 3 refers to applying spectral normalization to both the discriminator and the generator. Method 4 refers to adding a pretrained model to method 3.

The proposed model uses spectral normalization without transfer learning reaches an MIoU value of 0.6437. If we remove the spectral normalization on the generator, this value is reduced to 0.4517. However, removing the spectral normalization on the CycleGAN leads to an even lower accuracy of 0.4138, suggesting that the spectral normalization on segmentation masks has a more substantial impact on the model. Besides, we observe that adding a pretrained model to the proposed model and CycleGAN only helps improve the accuracy of the segmentation results from 0.6437 to 0.6471 and 0.4138 to 0.4229.

5. Conclusion

This paper presented an improved semisupervised semantic segmentation method for sonar image based on the CycleGAN network combining spectral normalization. The spectral normalization is applied to both generator and discriminator to solve the problem that the generator tends to generate the same segmentation results when labeled data are limited. According to the experimental results, it has been proved that this strategy can improve the performance of semisupervised segmentation, especially when labeled data are scarce. The segmentation results are robust for underwater objects with different shapes and numbers without transfer learning.

Data Availability

The link is “https://github.com/freepoet/SCTD.”

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by National Natural Science Foundation of China (Grant Nos. 42176187 and 41906162) and China’s National Natural Science Foundation (Grant Nos. 42176187 and 41906162).

References

P. Cervenka and C. de Moustier, “Sidescan sonar image processing techniques,” IEEE Journal of Oceanic Engineering, vol. 18, no. 2, pp. 108–122, 1993.
View at: Publisher Site | Google Scholar
D. Marx, M. Nelson, E. Chang, W Gillespie, A Putney, and K Warman, “An introduction to synthetic aperture sonar,” in Proceedings of the Tenth IEEE Workshop on Statistical Signal and Array Processing (Cat. No.00TH8496), pp. 717–721, Pocono Manor, PA, USA, August 2000.
View at: Google Scholar
A. Putney, E. Chang, R. Chatham, D Marx, M Nelson, and L. K Warman, “Synthetic aperture sonar-the modern method of underwater remote sensing,” IEEE Aerospace Conference Proceedings (Cat, No.01TH8542, vol. 4, pp. 4/1749–1754/1756, 2001.
View at: Google Scholar
Xu Jia, X. Jiang, J. Tang, L Lu, and J Zhang, “The research of underwater target imaging with high moving sonar based on synthetic aperture method,” in Proceedings of the MTS/IEEE Oceans 2001. An Ocean Odyssey. Conference proceedings, vol. 2, pp. 995–1000, Honolulu, HI, USA, November 2001.
View at: Google Scholar
M. Sung, M. Lee, B. Kim, and S.-C. Yu, “Imaging-sonar-based underwater object recognition utilizing object's yaw angle estimation with deep learning,” IFAC-PapersOnLine, vol. 53, no. 2, pp. 15475–15480, 2020.
View at: Publisher Site | Google Scholar
J D R Vera, E. Coiras, J. Groen, and B Evans, “Automatic target recognition in synthetic aperture sonar images based on geometrical feature extraction,” EURASIP Journal on Applied Signal Processing, vol. 2009, pp. 1–9, 2009.
View at: Publisher Site | Google Scholar
Y. Petillot, Y. Pailhas, and J. Sawas, Eds., Target Recognition in Synthetic Aperture Sonar and High Resolution Side Scan Sonar Using AUVs, Citeseer, Pennysluvania, 2010.
P. Zhang, J. Tang, H. Zhong, M Ning, D Liu, and K Wu, “Self-trained target detection of radar and sonar images using automatic deep learning,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2021.
View at: Google Scholar
I. Karoui, R. Fablet, J.-M. Boucher, and J.-M. Augustin, “Seabed segmentation using optimized statistics of sonar textures,” IEEE Transactions on Geoscience and Remote Sensing, vol. 47, no. 6, pp. 1621–1631, 2009.
View at: Publisher Site | Google Scholar
R. E. Hansen, H. J. Callow, and T. O. Sabo, “Challenges in seafloor imaging and mapping with synthetic aperture sonar,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 10, pp. 3677–3687, 2011.
View at: Publisher Site | Google Scholar
H. Xu, L. Zhang, M. J. Er, and Q Yang, “Underwater sonar image segmentation based on deep learning of receptive field block and search attention mechanism,” in Proceedings of the 2021 4th International Conference on Intelligent Autonomous Systems (ICoIAS), pp. 44–48, IEEE, Wuhan, China, May 2021.
View at: Google Scholar
M. Rahnemoonfar and D. Dobbs, “Semantic segmentation of underwater sonar imagery with deep learning,” in Proceedings of the IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, pp. 9455–9458, Yokohama, Japan, August 2019.
View at: Publisher Site | Google Scholar
M. Wu, Q. Wang, E. Rigall et al., “ECNet: efficient convolutional networks for side scan sonar image segmentation,” Sensors, vol. 19, no. 9, 2019.
View at: Publisher Site | Google Scholar
F. Yu, B. He, K. Li et al., “Side-scan sonar images segmentation for AUV with recurrent residual convolutional neural network module and self-guidance module,” Applied Ocean Research, vol. 113, p. 102608, 2021.
View at: Publisher Site | Google Scholar
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza et al., “Generative adversarial nets,” in Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, pp. 2672–2680, Montreal, Quebec, Canada, December 8-13 2014.
View at: Google Scholar
A. Odena, Semi-supervised Learning with Generative Adversarial Networks, 2016, https://arxiv.org/abs/1606.01583.
E. Denton, S. Gross, and R. Fergus, Semi-supervised Learning with Context-Conditional Generative Adversarial Networks, 2016, https://arxiv.org/abs/1611.06430.
L. Han, Y. Huang, H. Dou et al., “Semi-supervised segmentation of lesion from breast ultrasound images with attentional generative adversarial network,” Computer Methods and Programs in Biomedicine, vol. 189, Article ID 105275, 2020.
View at: Publisher Site | Google Scholar
J.-Y. Zhu, T. Park, P. Isola, and A. A Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, ICCV, pp. 2242–2251, Venice, Italy, October 2017.
View at: Publisher Site | Google Scholar
J. Jiang, Y.-C. Hu, N. Tyagi et al., “Tumor-aware, adversarial domain adaptation from CT to MRI for lung cancer segmentation,” Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, Springer International Publishing, Cham, pp. 777–785, 2018.
View at: Publisher Site | Google Scholar
A. K. Mondal, A. Agarwal, J. Dolz et al., “Revisiting CycleGAN for semi-supervised segmentation,” CoRR, abs/1908, vol. 11569, 2019.
View at: Google Scholar
M. Everingham, L. van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010.
View at: Publisher Site | Google Scholar
M. Cordts, M. Omran, S. Ramos et al., “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3213–3223, Las Vegas, NV, USA, June 2021.
View at: Google Scholar
O. Bernard, A. Lalande, C. Zotti et al., “Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved?” IEEE Transactions on Medical Imaging, vol. 37, no. 11, pp. 2514–2525, 2018.
View at: Publisher Site | Google Scholar
Y. Hiasa, Y. Otake, M. Takao et al., Cross-modality Image Synthesis from Unpaired Data Using CycleGAN: Effects of Gradient Consistency Loss and Training Data Size, 3/18/2018, http://arxiv.org/pdf/1803.06629v3.
Y. Huo, Z. Xu, S. Bao et al., Splenomegaly Segmentation using Global Convolutional Kernels and Conditional Generative Adversarial Networks, 2017, http://arxiv.org/pdf/1712.00542v1.
X. Dong, Y. Lei, S. Tian et al., “Synthetic MRI-aided multi-organ segmentation on male pelvic CT using cycle consistent deep attention network,” Radiotherapy & Oncology, vol. 141, pp. 192–199, 2019.
View at: Publisher Site | Google Scholar
S. Saha, F. Bovolo, and L. Bruzzone, in Proceedings of the IEEE International Geoscience & Remote Sensing Symposium: Proceedings, Yokohama, Japan, Piscataway, NJ, August 2019.
P. J. Soto Vega and G. A. O. P. d Costa, “An unsupervised domain adaptation approach for change detection and its application to deforestation mapping in tropical biomes,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 181, pp. 113–128, 2021.
View at: Publisher Site | Google Scholar
M. Yang, L. Jiao, B. Hou, F. Liu, and S. Yang, “Selective adversarial adaptation-based cross-scene change detection framework in remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 3, pp. 2188–2203, 2021.
View at: Publisher Site | Google Scholar
J. Zhao, M. Mathieu, and Y. LeCun, Energy-based Generative Adversarial Network, 9/11/2016, http://arxiv.org/pdf/1609.03126v4.
I. Gulrajani, F. Ahmed, M. Arjovsky et al., “Improved training of Wasserstein GANs,” in Proceeding of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp. 5767–5777, Long Beach, CA, USA, December 2017.
View at: Google Scholar
B. Neyshabur, S. Bhojanapalli, and A. Chakrabarti, Stabilizing GAN Training with Multiple Random Projections, 5/23/2017, http://arxiv.org/pdf/1705.07831v2.
T. Salimans, I. Goodfellow, W. Zaremba et al., “Improved techniques for training gans,” Advances in Neural Information Processing Systems, vol. 29, 2016.
View at: Google Scholar
G. French, S. Laine, T. Aila et al., Semi-supervised Semantic Segmentation Needs strong, Varied Perturbations, 2019, http://arxiv.org/pdf/1906.01916v5.
X. Lai, Z. Tian, L. Jiang et al., “Semi-supervised semantic segmentation with directional context-aware consistency,” in Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1205–1214, Nashville, TN, USA, June 2021.
View at: Publisher Site | Google Scholar
I. Grubisic, M. Orsic, and S. Segvic, “A baseline for semi-supervised learning of efficient semantic segmentation models,” in Proceedings of the 2021 17th International Conference on Machine Vision and Applications (MVA), pp. 1–5, IEEE, Aichi, Japan, July 2021.
View at: Publisher Site | Google Scholar
T. Miyato, T. Kataoka, M. Koyama et al., Spectral Normalization for Generative Adversarial Networks, 2018, https://arxiv.org/abs/1802.05957.
K. He, X. Zhang, S. Ren et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, Las Vegas, NV, USA, June 2016.
View at: Google Scholar
A. Garcia-Garcia, S. Orts-Escolano, S. Oprea et al., A Review on Deep Learning Techniques Applied to Semantic Segmentation, 2017, https://arxiv.org/abs/1704.06857.
W.-C. Hung, Y.-H. Tsai, Y.-T. Liou et al., Adversarial Learning for Semi-Supervised Semantic Segmentation, 2018, http://arxiv.org/pdf/1802.07934v2.
Y. Wang, Q. Zhou, J. Liu et al., LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation, 5/7/2019, http://arxiv.org/pdf/1905.02423v3.
A. Paszke, A. Chaurasia, S. Kim et al., ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation, 6/7/2016, http://arxiv.org/pdf/1606.02147v1.
O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” Lecture Notes in Computer Science, Springer International Publishing, Cham, vol. 9351, pp. 234–241, 2015.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Zhisheng Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Computational Intelligence and Neuroscience

Spectral Normalized CycleGAN with Application in Semisupervised Semantic Segmentation of Sonar Images

Abstract

1. Introduction

2. Related Work

2.1. The Application of CycleGAN

2.2. Techniques to Stabilize Training of GAN

2.3. Recent Work in Semisupervised Semantic Segmentation

3. Methodology

3.1. Domain Mapping

3.2. Loss Functions

3.3. Spectral Normalization

4. Experiments and Analysis

4.1. Sonar Image Datasets

4.2. Evaluation Protocol

4.3. Results

4.4. Ablation Study

5. Conclusion

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright