Dynamic Multiscale Feature Fusion Method for Underwater Target Recognition

Cai, Lei; Li, Yuejun; Chen, Chuang; Chai, Haojie

doi:https://doi.org/10.1155/2022/8110695

Journal of Sensors

On this page

Abstract Introduction Related Works Experimental Results Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 8110695 | https://doi.org/10.1155/2022/8110695

Dynamic Multiscale Feature Fusion Method for Underwater Target Recognition

Lei Cai,¹Yuejun Li,²Chuang Chen,²and Haojie Chai¹

Academic Editor: Mohammad Haider

Received31 Oct 2021

Revised16 May 2022

Accepted11 Jul 2022

Published21 Jul 2022

Abstract

The feature information of small-scale targets is seriously missing under the interference of complex underwater terrain and light refraction. Moreover, the unbalanced distribution of underwater target samples can also affect the accuracy of spatial semantic feature extraction. Aiming at the above problems, this paper proposes a dynamic multiscale feature fusion method for underwater target recognition. Firstly, this paper uses multiscale info noise contrastive estimation (MS-InfoNCE) loss to extract the significant features of the target at 4 scales. Secondly, the method learns the spatial semantic features of the target through a dynamic conditional probability matrix. Finally, this paper designs different feature fusion mechanisms for different scale targets, dynamically fusing multiscale significant features and spatial semantic features to recognize underwater weak targets. The experimental results show that the recognition accuracy of the proposed algorithm is 1.38% higher than that of the existing algorithm when recognizing underwater distorted targets.

1. Introduction

The underwater environment is poorly lit, and the background is complex. The target is easily submerged in the background. In the absence of data tags, AUVs have difficulty recognizing weak targets with very low signal-to-noise ratios. Unsupervised representation learning can extract significant features of target distinguishability. Downstream tasks such as recognition and classification can be effectively accomplished through significant features. However, multiple scales of targets may exist simultaneously in the underwater images acquired by the AUV. The small-scale targets tend to disappear during downsampling. Multiscale features can improve the recognition accuracy of small-scale targets. However, the method significantly increases the computational power of the algorithm [1–4].

Small-scale targets are more susceptible to environmental disturbances. The underwater image is distorted by the refraction of light. AUV cannot extract the complete significant features of the target. The graph neural network learns the spatial semantic features of the target based on the correlation matrix, which can compensate for the missing features of distorted targets. The traditional correlation matrix is usually obtained from the label cooccurrence relationship of the training set [5–7]. However, the difficulty of acquiring different target images in the underwater environment varies, making the number of tags in the training set unevenly distributed. Moreover, some of the rare cooccurrence relationships may be noisy. In this case, the correlation matrix constructed based on the label cooccurrence relationship has some limitations.

As shown in Figure 1, in response to the above problems, this paper proposes a dynamic multiscale feature fusion method for underwater target recognition. Firstly, this paper extracts the significant features of the target at multiple scales and maximizes the retention of feature information of the small-scale target. Secondly, the method learns the spatial semantic relations of the target which is used to compensate for the missing features caused by factors such as image distortion. Finally, the proposed algorithm dynamically fuses multiscale significant features and spatial semantic features for target recognition.

The main contributions of the method in this paper are as follows: (1)The proposed algorithm uses multiscale InfoNCE loss to extract the significant features of the target at 4 scales. In the absence of data labels, the method can extract distinguishability features of targets at different scales(2)Improve the tag cooccurrence correlation matrix. In this paper, a new type of dynamic conditional probability correlation matrix is constructed using the data labels of the training set and the current training batch. When the labels are unevenly distributed in the dataset, the matrix can effectively model the spatial semantic relationship of the target(3)This paper proposes a dynamic multiscale feature fusion mechanism. The method dynamically fuses multiscale significant features and spatial semantic features according to different target scales, which reduces the recognition time of the algorithm while improving its recognition accuracy

The refraction of light makes it difficult for the AUV to extract the complete significant features of the target. Spatial semantic features can compensate for the lack of features. Yang and Zhou [8] proposed to combine structured semantic relevance to solve the problem of missing labels in multilabel learning. This method captures the structured correlations between categories by constructing a semantic graph of the images. Yan et al. [9] proposed a feature attention network (FAN) containing a feature refinement network and a relevance learning network, which can solve the problems of inconsistent object scales and unbalanced category labels. Li et al. [10] use a graph convolutional network and adaptive labeled graphs to learn label correlation. The method generates adaptive labeled graphs by two convolutional layers. Yun et al. [11] proposed a dual aggregated feature pyramid network for multilabel classification. This network does not require region proposals and significantly reduces the computational burden. To solve the problem of difficulty in correctly classifying classes with complex features but small number of samples, Zhi [12] proposed an end-to-end convolutional neural network based on a multipath structure. Wang et al. [13] proposed a multilabel classification method that learns through privileged information. The method uses similarity constraints to capture the relationship between available and privileged information and ranking constraints to capture the dependencies between multiple labels. To improve the convergence speed of the algorithm, Cai et al. [14] designed an effective outer space acceleration algorithm (GAMP). Experimental results show that the algorithm has higher computational efficiency. Gao and Zhou [15] designed a multicategory attention area module that is aimed at keeping the number of attention areas as small as possible and at maintaining the diversity of these areas as much as possible.

Multiscale significant features contain richer feature information which facilitates the recognition of weak targets with very low signal-to-noise ratios. Ma et al. [16] proposed a multiscale spatial context-based semantic edge detection depth network (MSC-SED). This network obtains rich multiscale features while enhancing high-level feature details. Guo et al. [17] proposed a radar target recognition method based on feature pyramid fusion lightweight CNN. The method can improve the accuracy and robustness of radar target recognition under low signal-to-noise ratio conditions. Ju et al. [18] proposed adaptive feature fusion with attention mechanism (AFFAM) for multiscale target detection. The method can adaptively learn the importance and relevance of features at different scales. Jiang et al. [19] proposed a multiscale metric learning method (MSML) for small sample learning. This method extracts multiscale features and learns multiscale relationships between samples. Wang et al. [20] proposed an unsupervised multiview representation learning method. This method eliminates the differences in multiview data due to different distributions. To address the problem of inadequate performance of cross-entropy loss in the case of small samples, Lee and Yoo [21] computed enhanced feature extraction networks by contrastive learning. Chen et al. [22] extended the existing contrastive learning algorithm by embedding an attention mechanism. The method can improve the learning efficiency and generalization ability of the algorithm.

For the problems of underwater environment interference and real-time algorithm, Cai et al. [23] proposed a collaborative multi-AUV target recognition method based on migration reinforcement learning. Sun and Cai [24] proposed a multi-AUV target recognition method based on GAN-metalearning. The experimental results show that the method can improve the generalization ability of the model. Cai et al. [25] proposed a maneuvering target recognition method based on multiview optical field reconstruction. The method can ignore the effect of shooting angle on target recognition results. Chen et al. [26] proposed a new iterative visual inference framework. This framework can recognize targets using convolutional features and semantic features of images, which effectively improves the target recognition accuracy. To solve the problem of data double computation, Cai et al. [27] proposed a multiview optical field reconstruction method based on transfer reinforcement learning. Qin et al. [28] proposed a feature pyramid-based target detection algorithm. The method can solve the problem of low accuracy of small-size target detection. Cai et al. [29] propose an underwater distortion target recognition network. This method compensates for the problem of missing salient features by spatial semantic information, which effectively improves the accuracy of recognizing underwater distorted targets.

This paper proposes a dynamic multiscale feature fusion method for underwater target recognition. Firstly, this paper constructs a multiscale significant feature extraction network to extract the significant features of the target at 4 scales by multiscale InfoNCE loss. Secondly, this paper establishes a dynamic conditional probability matrix between underwater targets to learn the spatial semantic features of the targets which compensate for the missing features of distorted targets. Finally, the proposed algorithm dynamically fuses multiscale significant features and spatial semantic features according to the target scales for target recognition.

3. Dynamic Spatial Semantic Feature Extraction Model

The low light and complex background in the underwater environment cause weak target intensity in the images acquired by the AUV. Moreover, the target is distorted by the interference of light refraction. This paper enhances the original samples by color transformation, random cropping, and Gaussian blur. Different enhanced views from the same image are positive samples. Those from different images are negative samples. In this paper, we construct a feature library to store all the enhanced samples of the training process. For the images input to the feature extraction network, there are positive samples from the same image as the input samples and negative samples from different images in the feature library. The multiscale significant feature extraction model is shown in Figure 2.

This paper uses the ResNet network as the base network for the multiscale significant feature extraction model . The network can be divided into 5 layers in general and is denoted as . The features extracted by each layer of the network can be expressed as . This paper extracts the significant features of network layers 2-5, denoted as . The feature vector is projected nonlinearly through a fully connected layer as the vector . This paper trains the model by using a multiscale InfoNCE loss function. The cosine similarity of the samples is as follows: where denotes the nonlinear projection of the input image representation vector. denotes the nonlinear projection of positive and negative sample representations in the feature library. The multiscale InfoNCE loss function of the model is as follows. where denotes the weights of features at different scales. is the feature representation of the input image at different scales. denotes positive samples. are negative samples. is used to zoom in on the similarity measure of the image representation.

The feature library can make the number of negative samples larger and improve the training effect. However, the phenomenon also increases the difficulty of updating the feature library encoder . This paper dynamically updates the feature library encoder by encoder . The parameters of the encoder and are denoted as and , respectively. is updated as follows. where the momentum coefficient is . During the training process, updates the parameters by stochastic gradient descent. When is updated, updates the parameters according to the above way. After completing the training, the encoder can extract the multiscale significant features of the image. The multiscale significant features of the image can be expressed as where is the input test image and is the encoder with completed training.

4. Multiscale Significant Feature Extraction Model

Due to the uneven distribution of labels in the dataset, this paper designs a dynamic conditional probability correlation matrix to represent the semantic correlation between targets. The dynamic conditional probability matrix is , which represents the conditional probability of the appearance of label when label appears. Compared with the traditional correlation matrix, the dynamic conditional probability matrix is asymmetric; i.e., . This paper also calculates the local conditional probabilities of the current batch of training data, which further increases the robustness of the semantic relational model. We calculate the cooccurrence of targets in the training set and the current training batch separately to obtain the static cooccurrence correlation matrix and the local correlation matrix . This paper calculates the conditional probability matrix between the objectives by this matrix. where and are the weights. denotes the number of simultaneous occurrences of targets and in the training set. denotes the number of occurrences of target in the training set. indicates the number of simultaneous occurrences of and in the current batch. is the number of occurrences of in the current batch. denotes the probability that label appears when label appears. Since some of the rare cooccurrence relations may be noisy, a probability threshold is set to filter the noise in this paper. The filtered matrix is

This paper constructs a spatial semantic feature extraction network, as shown in Figure 3. The network utilizes a dynamic conditional probability matrix to represent the semantic correlation between targets and updates the feature representation through information transfer between nodes. The spatial semantic feature extraction network can be represented as where denotes the spatial semantic features and in the initial state. is the updated spatial semantic feature. is the normalized dynamic conditional probability correlation matrix. is the transformation matrix to be learned. is a nonlinear LeakyReLU activation function.

5. Dynamic Multiscale Feature Fusion Mechanism

The underwater images acquired by AUV may have targets of different scales. Multiscale features can effectively improve the recognition accuracy of small-scale targets. However, this method significantly increases the computational power of the algorithm. In this paper, a dynamic multiscale feature fusion mechanism is proposed. The method fuses feature from different scales to recognize targets at different scales.

The significant features of the shallow network output include more detailed information. When recognizing small-scale targets, the algorithm incorporates multiscale significant features and spatial semantic features. The fused features include both deep semantic information and shallow detailed information. This feature can improve the recognition accuracy of the algorithm. When recognizing normal-scale targets, the algorithm only fuses significant features and spatial semantic features from the output of the conv5 layer network. The fused features can be expressed as follows.

where denotes the feature fusion function. denotes the input test image. is the completed training encoder. denotes the number of network layers. denotes the weights of features at different scales. is the upsampling multiplier. denotes the significant features of the output of the conv5 layer network. are the spatial semantic features. denotes the ratio of the target size to the original image size.

The above features are input to the classification layer (Cls) and the regression layer (Reg). The Reg layer outputs the vertex coordinates of the sets of anchor boxes. The Cls layer outputs the labels and confidence levels of the anchor boxes. For the feature mapping, the proposed method generates target anchor boxes. This paper trains the model by minimizing the loss function. The loss function consists of two parts: classification loss and regression loss. The calculation is shown in where denotes the number of the anchor box. is the target class. denotes the predicted probability of the target class in anchor box . is the real label of anchor box . denotes the vertex coordinates of the target anchor box. are the vertex coordinates of the ground truth. is the smooth L1 function. and are given by the classification and regression layers. and denote the normalization of the loss function. is numerically equal to the minimum batch size for training. is equal to the number of target anchor boxes. denotes the balance weights. is the sigmoid function.

6. Experimental Results and Analyses

6.1. Experimental Settings

6.1.1. Controller Hardware

In this experiment, training and testing are performed in TensorFlow. The simulation calculation is run on a small server (RTX 3090 GPU, 64 G of RAM, and Win10 64-bit operating system).

6.1.2. Experimental Dataset

In this paper, the three datasets, cognitive autonomous diving buddy (CADDY) underwater dataset, Underwater Image Enhancement Benchmark (UIEB), and underwater target dataset (UTD), are used for training and testing. This paper trains a multiscale significant feature extraction model by 13,000 unlabeled images. In addition, this paper uses 1278 labeled images to train and test the spatial semantic feature extraction model and the target recognition network. The dataset is divided into training and test sets in the ratio of 7 : 3.

6.1.3. Implementation Details

This paper uses a stochastic gradient descent (SGD) optimizer to train the model with weight decay of 0.0005 and momentum of 0.9. The training minimum batch is 64, and the initial learning rate is 0.01. The entire training process is iterated 50,000 times, in which the learning rate decays when the number of iterations is 40,000 and 45,000, and the decay rate is 0.1.

6.2. Experimental Results

In this paper, multiscale significant features and spatial semantic features are dynamically fused to recognize the target. For underwater weak targets with different interference, three sets of simulation experiments are designed in this section to verify the effectiveness of the proposed algorithm and compare it with FISHnet [30], SiamFPN [31], SA-FPN [32], and literature [33]. The algorithm evaluation criteria are mean average precision (mAP) and recognition time.

This paper conducts ablation experiments to verify the effects of the multiscale significant feature (MSF) extraction module and the spatial semantic feature (SSF) extraction module, and the results are shown in Table 1. The multiscale significant feature extraction module improved the mAP by 1.29%. Spatial semantic features outperformed multiscale significant features, resulting in a 1.98% improvement in mAP. Therefore, this paper can effectively improve the recognition accuracy of underwater targets.

The results of conventional underwater image recognition are shown in Table 2. In the torpedo type and submarine type, the algorithm in this paper has the highest recognition accuracy, which is 0.6715 and 0.9573, respectively. In the frogman type, literature [33] has the highest recognition accuracy of 0.7614, which is slightly higher than that of the algorithm in this paper. In the AUV type, the recognition accuracy of SA-FPN is 0.93, which is higher than 0.8732 of the algorithms in this paper. In terms of recognition speed, literature [33] only needs 0.1 s, which is significantly higher than that of the algorithm in this paper. But the algorithm in this paper has a higher mAP. The above data analysis shows that although the method in this paper does not perform well in some categories, it has certain advantages in overall recognition accuracy. In addition, the recognition speed of the algorithm in this paper is slightly higher than that of the multiscale target recognition algorithms SiamFPN and SA-FPN. The visualization results of conventional underwater images are shown in Figure 4.

The recognition accuracy and visualization results of underwater blurred images are shown in Figure 5 and Table 3. It can be seen from Table 3 that the mAP of FISHnet is 0.6719, which is slightly higher than that of the algorithm in this paper. In addition, the FISHnet algorithm has the highest accuracy in recognizing torpedo targets. For submarine targets, SiamFPN has the highest recognition accuracy. The accuracy of the algorithm in this paper to recognize the frogman target is 0.7759, which is better than that of other comparison methods. The SA-FPN has the highest AUV target recognition accuracy. It can be seen from the above data that the recognition accuracy of all algorithms has decreased. This paper can improve the recognition accuracy of some targets through spatial semantic features. The mAP of the algorithm in this paper is equivalent to that of FISHnet, but the recognition speed of the algorithm in this paper has a significant lead.

The recognition accuracy and visualization results of underwater distorted images are shown in Figure 6 and Table 4. The mAP of the algorithm in this paper is 0.7081, which is 1.38% higher than that of SiamFPN. For submarine and AUV targets, SiamFPN has the highest recognition accuracy, 0.8933 and 0.9421, respectively. For the torpedo target, the algorithm in this paper has the highest recognition accuracy. For the frogman target, literature [33] has the highest recognition accuracy and the fastest recognition speed. It can be seen from the above data that the algorithm in this paper has the highest mAP when recognizing underwater distorted targets.

7. Conclusion

This paper proposes a dynamic multiscale feature fusion method for underwater target recognition. Firstly, this paper extracts the significant features of the target at multiple scales and maximizes the retention of feature information of the small-scale target. Secondly, the method learns the spatial semantic relations of the target which is used to compensate for the missing features caused by factors such as image distortion. Finally, the proposed algorithm dynamically fuses multiscale significant features and spatial semantic features for target recognition. The experimental results show that the multiscale significant feature extraction module improved the mAP by 1.29%. Spatial semantic features outperformed multiscale significant features, resulting in a 1.98% improvement in mAP. The recognition accuracy of the proposed algorithm is 1.38% higher than that of the existing algorithm when recognizing underwater distorted targets.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Acknowledgments

This paper is supported by the National Key Research and Development Project (2019YFB1311002).

References

Z. Chen, Q. Cui, X. S. Wei, X. Jin, and Y. Guo, “Disentangling, embedding and ranking label cues for multi-label image recognition,” IEEE Transactions on Multimedia, vol. 23, pp. 1827–1840, 2021.
View at: Publisher Site | Google Scholar
G. Qi, Y. Zhang, K. Wang, N. Mazur, Y. Liu, and D. Malaviya, “Small object detection method based on adaptive spatial parallel convolution and fast multi-scale fusion,” Remote Sensing, vol. 14, no. 2, p. 420, 2022.
View at: Publisher Site | Google Scholar
Y. Li, P. Xu, Z. Zhu, X. Huang, and G. Qi, “Real-time driver distraction detection using lightweight convolution neural network with cheap multi-scale features fusion block,” in Proceedings of 2021 Chinese Intelligent Systems Conference, pp. 232–240, Springer.
View at: Google Scholar
Z. Zhu, H. Wei, G. Hu, Y. Li, G. Qi, and N. Mazur, “A novel fast single image dehazing algorithm based on artificial multiexposure image fusion,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–23, 2020.
View at: Google Scholar
Z. M. Chen, X. S. Wei, P. Wang, and Y. Guo, “Multi-label image recognition with graph convolutional networks,” in IEEE/CVF conference on computer vision and pattern Recognition (CVPR), pp. 5172–5181, Long Beach, CA, June 2019.
View at: Google Scholar
Y. Qi, Y. Guo, and Y. Chen, “Multi-label image recognition with asymmetric co-occurrence dependency graphs,” in 2021 IEEE 6th International Conference on Big Data Analytics (ICBDA), pp. 287–294, Xiamen, China, March 2021.
View at: Google Scholar
T. S. Pan, H. C. Huang, J. C. Lee, and C. H. Chen, “Multi-scale ResNet for real-time underwater object detection,” Signal Image and Video Processing, vol. 15, no. 5, pp. 941–949, 2021.
View at: Google Scholar
H. Yang, J. T. Zhou, and J. Cai, “Improving multi-label learning with missing labels by structured semantic correlations,” European conference on computer vision, Springer, Cham, pp. 835–851, 2016.
View at: Google Scholar
Z. Yan, W. Liu, S. Wen, and Y. Yang, “Multi-label image classification by feature attention network,” IEEE Access, vol. 7, pp. 98005–98013, 2019.
View at: Publisher Site | Google Scholar
Q. Li, X. Peng, Y. Qiao, and Q. Peng, “Learning label correlations for multi-label image recognition with graph networks,” Pattern Recognition Letters, vol. 138, pp. 378–384, 2020.
View at: Publisher Site | Google Scholar
D. Yun, J. Ryu, and J. Lim, “Dual aggregated feature pyramid network for multi label classification,” Pattern Recognition Letters, vol. 144, no. 12, pp. 75–81, 2021.
View at: Publisher Site | Google Scholar
C. Zhi, “A multi-method network for multi-label classification,” in 2020 5th International Conference on Smart Grid and Electrical Automation (ICSGEA), pp. 441–445, Zhangjiajie, China, June 2020.
View at: Google Scholar
S. Wang, S. Chen, T. Chen, and X. Shi, “Learning with privileged information for multi-label classification,” Pattern Recognition the Journal of the Pattern Recognition Society, vol. 81, pp. 60–70, 2018.
View at: Publisher Site | Google Scholar
L. Cai, S. Tang, J. Yin, Z. Hou, and H. Jiao, “An out space accelerating algorithm for generalized affine multiplicative programs problem,” Journal of Control Science and Engineering, vol. 2017, 7 pages, 2017.
View at: Publisher Site | Google Scholar
B. B. Gao and H. Y. Zhou, “Learning to discover multi-class attentional regions for multi-label image recognition,” IEEE Transactions on Image Processing, vol. 30, pp. 5920–5932, 2021.
View at: Publisher Site | Google Scholar
W. Ma, C. F. Gong, S. B. Xu, and X. Zhang, “Multi-scale spatial context-based semantic edge detection,” Information Fusion, vol. 64, no. 1, pp. 238–251, 2020.
View at: Publisher Site | Google Scholar
C. Guo, H. Wang, T. Jian, Y. He, and X. Zhang, “Radar target recognition based on feature pyramid fusion lightweight CNN,” IEEE Access, vol. 7, pp. 51140–51149, 2019.
View at: Publisher Site | Google Scholar
M. Ju, J. Luo, Z. Wang, and H. Luo, “Adaptive feature fusion with attention mechanism for multi-scale target detection,” Neural Computing and Applications, vol. 33, no. 7, pp. 2769–2781, 2021.
View at: Publisher Site | Google Scholar
W. Jiang, K. Huang, J. Geng, and X. Deng, “Multi-scale metric learning for few-shot learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 3, pp. 1091–1102, 2021.
View at: Publisher Site | Google Scholar
X. Wang, D. Peng, P. Hu, and Y. Sang, “Adversarial correlated autoencoder for unsupervised multi-view representation learning,” Knowledge-Based Systems, vol. 168, no. 1, pp. 109–120, 2019.
View at: Publisher Site | Google Scholar
T. Lee and S. Yoo, “Augmenting few-shot learning with supervised contrastive learning,” IEEE Access, vol. 9, pp. 61466–61474, 2021.
View at: Publisher Site | Google Scholar
H. Chen, Y. Liu, Z. Zhou, and M. Zhang, “A2C: attention-augmented contrastive learning for state representation extraction,” Applied Sciences, vol. 10, no. 17, pp. 5902–5920, 2020.
View at: Publisher Site | Google Scholar
L. Cai, Q. Sun, T. Xu, Y. Ma, and Z. Chen, “Multi-AUV collaborative target recognition based on transfer-reinforcement learning,” IEEE Access, vol. 8, no. 1, pp. 39273–39284, 2020.
View at: Publisher Site | Google Scholar
Q. Sun and L. Cai, “Multi-AUV target recognition method based on GAN-meta learning,” in 2020 5th International Conference on Advanced Robotics and Mechatronics (ICARM), pp. 374–379, Shenzhen, China, December 2020.
View at: Google Scholar
L. Cai, P. Luo, G. Zhou, and Z. Chen, “Maneuvering target recognition method based on multi-perspective light field reconstruction,” International Journal of Distributed Sensor Networks, vol. 15, no. 8, Article ID 155014771987065, 2019.
View at: Publisher Site | Google Scholar
X. Chen, L. J. Li, L. Fei-Fei, and A. Gupta, “Iterative visual reasoning beyond convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7239–7248, Salt Lake City, UT, June 2018.
View at: Google Scholar
L. Cai, P. Luo, G. Zhou, T. Xu, and Z. Chen, “Multiperspective light field reconstruction method via transfer reinforcement learning,” Computational Intelligence and Neuroscience, vol. 2020, 14 pages, 2020.
View at: Publisher Site | Google Scholar
P. Qin, C. Li, J. Chen, and R. Chai, “Research on improved algorithm of object detection based on feature pyramid,” Multimedia Tools & Applications, vol. 78, no. 1, pp. 913–927, 2019.
View at: Publisher Site | Google Scholar
L. Cai, C. Chen, and H. Chai, “Underwater distortion target recognition network (UDTRNet) via enhanced image features,” Computational Intelligence and Neuroscience, vol. 2021, 10 pages, 2021.
View at: Publisher Site | Google Scholar
G. Ascenso, M. H. Yap, T. B. Allen, S. S. Choppin, and C. Payton, “FISHnet: learning to segment the silhouettes of swimmers,” IEEE Access, vol. 8, pp. 178311–178321, 2020.
View at: Publisher Site | Google Scholar
Y. Shan, X. Zhou, S. Liu, Y. Zhang, and K. Huang, “SiamFPN: a deep learning method for accurate and real-time maritime ship tracking,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 1, pp. 315–325, 2021.
View at: Google Scholar
F. Xu, H. Wang, J. Peng, and X. Fu, “Scale-aware feature pyramid architecture for marine object detection,” Neural Computing and Applications, vol. 33, no. 8, pp. 3637–3653, 2021.
View at: Publisher Site | Google Scholar
A. Naseer, E. N. Baro, S. D. Khan, and Y. Vila, “Automatic detection of Nephrops norvegicus burrows from underwater imagery using deep learning,” CMC-Computers, Materials & Continua, vol. 70, no. 3, pp. 5321–5344, 2022.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Lei Cai et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Journal of Sensors

Dynamic Multiscale Feature Fusion Method for Underwater Target Recognition

Abstract

1. Introduction

2. Related Works

3. Dynamic Spatial Semantic Feature Extraction Model

4. Multiscale Significant Feature Extraction Model

5. Dynamic Multiscale Feature Fusion Mechanism

6. Experimental Results and Analyses

6.1. Experimental Settings

6.1.1. Controller Hardware

6.1.2. Experimental Dataset

6.1.3. Implementation Details

6.2. Experimental Results

7. Conclusion

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright