The Aeroplane and Undercarriage Detection Based on Attention Mechanism and Multi-Scale Features Processing

Gao, Ruizhen; Zhang, Shuai; Wang, Haoqian; Zhang, Jingjun; Li, Hui; Zhang, Zhongqi

doi:https://doi.org/10.1155/2022/2582288

Mobile Information Systems

On this page

Abstract Introduction Results and Discussion Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 2582288 | https://doi.org/10.1155/2022/2582288

The Aeroplane and Undercarriage Detection Based on Attention Mechanism and Multi-Scale Features Processing

Ruizhen Gao,¹Shuai Zhang,¹Haoqian Wang,¹Jingjun Zhang,¹Hui Li,²and Zhongqi Zhang³

Academic Editor: Yugen Yi

Received05 Aug 2021

Accepted06 Jul 2022

Published19 Sept 2022

Abstract

Undercarriage device is one of the essential parts of an aeroplane, and accurate detection of whether the aeroplane undercarriage is operating normally can effectively avoid aeroplane accidents. To address the problems of low automation and low accuracy of small target detection in existing aeroplane undercarriage detection methods, an improved algorithm for aeroplane undercarriage detection YOLO V4 is proposed. Firstly, the convolutional network structure of Inception-ResNet is integrated into the CSPDarkNet53 framework to improve the algorithm’s ability to extract semantic information of target features; then an attention mechanism is added to the path aggregation network algorithm structure to improve the importance and relevance of different features after conceptual operations. In addition, aeroplane and undercarriage datasets were constructed, and finally, the generated partitioned test sets were tested to evaluate the test performance of Faster R-CNN, YOLO V3, and YOLO V4 target detection algorithms. The experimental results show that the improved algorithm has significantly improved the recall rate and the mean accuracy of detection for small targets in our dataset compared with the YOLO V4 algorithm. The reasonableness and advancedness of the improved algorithm in this paper are effectively verified.

1. Introduction

Aeroplane take-off and landing is one of the most important links, and more than 50% of aeroplane safety accidents occur in the take-off and landing stage [1]. In the process of taking off and landing, whether the undercarriage can be unfolded and folded normally is very important to flight safety. At present, whether the undercarriage is deployed or not is only fed back to the pilot by the internal signal of the aeroplane [2]. However, due to the lack of 100% reliability of electronic components, if the pilot receives the wrong signal, the pilot misjudges whether the undercarriage is deployed, and then the aeroplane cannot land safely. Therefore, it is necessary to conduct an external observation on the detection of the aeroplane undercarriage to ensure the normal deployment of the undercarriage and the safe landing of the aeroplane.

At present, machine vision and deep learning are springing up and widely used in various industries, such as face recognition [3–7], defect detection [8–10], remote sensing image detection [11–14], and driverless car [15–18]. Machine vision is used as an image acquisition method, and deep learning algorithm is further introduced to identify and detect objects. YOLO V4 [19] algorithm is an upgraded and improved version of YOLO [20], YOLO V2 [21], and YOLO V3 [22], and it is also one of the recently proposed classification algorithms that uses a regression network to realize object detection. Different from traditional network algorithms based on candidate regions, such as R-CNN [23], Fast R-CNN [24], and Faster R-CNN [25], YOLO series algorithms are one-stage algorithms. Although the detection accuracy of the one-stage algorithm is not as high as that of the two-stage algorithm, candidate region, and detection, the detection speed of the one-stage algorithm is much faster than that of the two-stage algorithm. This is because the YOLO series algorithm integrates the two stages of the candidate region and detection into one stage and directly treats the detection task as a regression problem, and it can realize end-to-end prediction.

However, in the actual test process, it is found that the detection effect of YOLO V4 for some small targets is not good enough, and there are still quite a number of occasions of false detection and missed detection [26]. At the same time, because the convolution network performs multiple downsampling operations, the image resolution gradually decreases, resulting in the loss of small targets. Compared with using the subsampling convolution layer, the convergence effect is reduced by about 3% if the expanded convolution layer is used instead, which shows that subsampling is indeed not conducive to CNN learning [27]. The down sampling operation has the greatest impact on the small target object because it may cause the loss of the feature value of the small target object, which will not be detected accordingly.

To solve the problems that the image features will be lost and the detection effect will be affected by the CNN sampling, several methods are proposed recently. Liu et al. [28] proposed a novel IPG-Net (Image Pyramid Guidance Network) based on FPN [29] (Feature Pyramid Networks). It contains an image pyramid-guided IPG subnetwork, a fusion module, and an image pyramid-based backbone. It effectively alleviates the problem of insufficient spatial information in the deeper convolution layer and the loss of small targets. Kim et al. [30] proposed a parallel feature pyramid network based on SPP-Net [31] (Spatial Pyramid Pooling in deep convolutional networks for visual recognition), which is constructed by widening the network width rather than increasing the network depth, so as to improve the detection effect of large, medium, and small targets. Although these improvements can improve the small target detection ability to a certain extent, they do not consider widening the width of the network rather than deepening the depth of the network, extracting features from multi-scale and giving different weights to features according to their different importance.

In order to improve the detection performance of an aeroplane and undercarriage, this paper adopts the improved YOLO V4 algorithm to carry out research on undercarriage detection. Firstly, after the trunk features are extracted from different feature layers of network output, Inception-ResNet [32] structure is added and multi-scale feature extraction, to improve the network to small object feature extraction ability. Secondly, the attention mechanism [33] is added after the PANet (Path Aggregation Network) [34] algorithm structure, to improve the importance and relevance of different features after concept operation. Finally, the divided test set was used to evaluate the detection performance of the algorithm under different improved methods. At the same time, the performance was compared with the currently commonly used Faster R–CNN, YOLO V3, and YOLO V4 algorithms.

The innovations and main contributions of this paper are described as follows:(1)In order to effectively improve the ability of small target detection, an improved YOLO V4 algorithm based on feature multi-scale processing and attention mechanism is proposed.(2)Based on the different detection targets, the feature multi-scale extraction module Inception-ResNet convolution network structure and attention mechanism are introduced.(3)The influence of different number of pixels on the feature extraction ability of the proposed network is analyzed.(4)A complete comparative experiment is designed to test the detection performance of the improved algorithm.

2. Attention Mechanism and Multi-Scale Feature Processing

2.1. Attention Mechanism

The convolutional neural network can extract relatively rich object feature information, but there will be more miscellaneous feature information, such as a lot of irrelevant background information and pixel information caused by different light. The concept operation used by YOLO V4 only simply lists and connects the feature information in the dimension, but the semantic information and importance of different features cannot be effectively associated, and the processed feature information cannot accurately describe the object.

Aiming at containing more efficient feature information in the feature map, this paper uses the squeeze-and-excitation block [33] structure unit with an attention mechanism to make different pixels have different weight values in the channel dimension. The structure unit mainly consists of three components, including squeeze, excitation, and scaling (Figure 1).

The squeeze part uses global average pooling to get a size feature map. The goal is to compress a single feature map into a point. Through the part of excitation, the weight learned based on the overall information of the channel is obtained. The calculation equations are as follows:

In formula (1), is the calculated value of the feature maps after global average pooling, and the output format is ; and are the width and height dimensions of the feature maps; represents the pixel value of position and in the feature maps.

In the excitation part, two fully connected layers are used for dimensionality reduction. The rule activation function is used to activate the middle fully connected layers after dimensionality reduction. Finally, Sigmoid activation function is used to activate the middle fully connected layers after dimensionality reduction, so that each weight in the fully connected layers is normalized. Through the reverse calculation and the learning of the whole dataset, the fully connected layers can learn all the channel information after the training of the whole dataset, as well as the strengthening and weakening rules of each channel. The calculation equations are as follows:

In formula (2), is the weight of eigenvalues in each channel, and the output format is ; is the ReLU activation function; is the Sigmoid activation function; is the characteristic kernel of dimension; is the characteristic kernel of dimension, where is the scaling parameter, and the value is 16 here [33].

In the scaling part, multiplying the feature maps and weights of each dimension of the excitation section to get the feature graph with different weights. The calculation equations are as follows:

In formula (3), is the updated feature maps.

2.2. Feature Multi-Scale Processing

In the process of feature extraction of the convolutional neural network, the depth of the convolutional layer should be increased if the recognition accuracy needs to be improved. However, with the increase of network depth, the intermediate parameters and operation time of the algorithm will be increased, and some phenomena such as gradient explosion may occur. To solve the problem that the undercarriage features of an aeroplane are easy to be lost or not obvious, this paper refers to the design idea of increasing network width and depth in GoogleNet [32] algorithm and YOLO V4 algorithm and introduces the Inception-ResNet [32] structural unit (Figure 2).

This structure unit consists of a four-channel parallel convolution structure, and the input feature maps are divided into four-channel convolutional operation. The first one is output to the total feature fusion layer through the ResNet [35] structure. In 2016, He proposed residual network (ResNet), which ensured the depth of the model and used residual blocks and batch normalization layers to effectively alleviate problems such as gradient disappearance and gradient explosion. The idea expressed by the residual block is the output mode of fusion between the input features and the features obtained after two or three convolution operations, thus solving the problem of model degradation. The general form of residual blocks is shown in Figure 3, where is the input of the residual blocks, and is output after two-layer convolution operation, which can be expressed as

In formula (4), and are the weight parameters of the convolution layer, and is the weight parameter of the transformation from input to output.

The second one is output to the partial feature fusion layer after convolution kernel operation. The third one is output to the partial feature fusion layer after and convolution kernel operation. The fourth one is output to the partial feature fusion layer after , , and convolution kernel operation; then the output feature maps of articles 2, 3, and 4 are fused with convolution kernel at the feature layer. Finally, the feature layer of the first output and the feature layer of the partial feature fusion layer output are fused.

3. Improvement of YOLO V4 Network Architecture

YOLO V4 is an improved version of YOLO V3. Firstly, the backbone feature extraction network structure has been improved from the Darknet53 network to the CSPDarknet53 structure (Figure 4), which contains five CSP [36] (Cross Stage Partial connections) modules and uses Mish [37] activation function instead of LeakyReLU activation function. Secondly, the SPP-Net module is introduced to extract the most important features from the bottom without affecting the network processing speed, and effectively expand the range of receptive field then, PANet was used as the neck network to replace the original FPN. Multi-channel feature fusion is carried out while more important context features are extracted. Finally, continue to use the YOLO V3 head as the detection head, as shown in Figure 4.

YOLO V4 algorithm uses CSPDarknet53 as the backbone feature extraction network. After convolution operation, it outputs P3, P4, and P5 feature layers, whose receptive fields are 1/8, 1/16, and 1/32 of the original input layer size, respectively. Among them, the output P3 and P4 feature layers enter the PANet structure for a feature fusion after a convolution operation, respectively, and the feature layer P5 enters the SPP structure for maximum pooling after a , , and convolution operation. To a certain extent, these methods can extract the characteristics of an aeroplane and undercarriage of different sizes and types. However, due to the small size of the undercarriage relative to the aeroplane, it is easy to lose features. If the YOLO V4 algorithm is directly used for training and detection, the detection effect is relatively poor.

In this paper, three feature layers of P3, P4, and P5 are output in the CSPDarkNet53 network, and the Inception-Resnet structure is added to optimize the convolutional layers of P3 and P4, then output to the PANet for the next step of operation. With the improvement of the two feature layers, on one hand, the width, depth, and complexity of the network can be increased and the ability of feature extraction can be improved and on the other hand, a larger receptive field can be obtained, and more global and higher semantic level feature information can be obtained, so as to improve the success rate of small object feature extraction, as shown in the yellow box in Figure 4. At the same time, the squeeze-and-excitation block structure unit is introduced into the three-scale feature map after PANet in the YOLO V4 algorithm. By introducing fewer parameters and computation, the performance of the algorithm can be effectively improved. The framework of the algorithm is shown in the orange box in Figure 4.

4. Network Training

4.1. Training Platform

Windows10 operating system is used in this paper experiment. The hardware configuration used in the experiment is as follows: CPU: Inter I CoITM) I7-10700K CPU @ 3.80 GHz 3.79 GHz; GPU: NVIDIA GeForce RTX 2070. The programming environment is PyCharm2020, and the deep learning framework is Keras.

4.2. Training Dataset

The data used in this experiment is composed of aeroplane pictures in the PASCAL VOC 2007 dataset and aeroplane images collected in the field at the airport. The pictures include an aeroplane and undercarriage. Due to the limitations of the data, this paper used exposure adjustment to divide it into normal exposure, overexposure, and underexposure to simulate over-strong light at noon and incomplete exposure to simulate normal weather, with high light at noon and low light at dusk, as shown in Figure 5. The dataset after image enhancement reached 1440 images, including 1296 images of the training set and 144 images of the test set. Due to the difference in size between the images collected in the field at the airport and the aeroplane images in the PASCAL VOC 2007 dataset, the size was processed uniformly as before training. In this paper, the labelImg software is used to label the aeroplane and undercarriage in the images, and the information such as the path of the picture, the label, and the coordinate of the labeled area is stored in the XML file.

4.3. Training Parameter Setting

The algorithm needs to set relevant parameters before training. In this experiment, the batch size was set as 8, and the number of training epochs was set to 100. In this paper, the learning rate of cosine annealing attenuation is adopted. The learning increases first and then decreases. The ascending stage is a linear rise, and the descending stage simulates the decline of cosine function. The initial learning rate is , the maximum learning rate and the minimum learning rate are and , and the execution is repeated. Each epoch iterates 145 steps, and the whole training process iterates 14,500 steps in total. After 100 Epoch iterations, the model tends to converge, and the loss function of the training set converges to 7.415 and 7.036 for the test set. The change of the loss function in the whole training process is shown in Figure 6.

4.4. Evaluation Indicators of Model

In deep learning object detection, the mAP (mean Average Precision) (7) is generally calculated by precision (4) and recall (5) of the model on the test set, and AP (Accuracy Precision) (6) is used to evaluate the accuracy of the model in a single detection category, and the larger the mAP value is, the higher the overall detection accuracy and the better the model’s prediction effect are. The mAP was used as the comprehensive evaluation index of the detection accuracy of the algorithm precision. The calculation equations are as follows:

In formula (5), is the precision; represents the number actually being the object and correctly predicted, namely the correctly predicted number; represents the number actually being the non-object but predicted as the object, namely the mispredicted number.

In formula (6), is the recall; represents the number actually being the object but not detected, namely the missed number.

In formula (7), is the number of images in the test set; is the precision value of the th image; is the change of the recall value from th to th images.

In formula (8), is the number of objective classes.

5. Results and Discussion

5.1. Results

In this paper, the improved method proposed for the YOLO V4 algorithm was compared with Faster R-CNN, YOLO V3, and YOLO V4. In addition, in order to test the performance of the two improved algorithms proposed in this paper, a set of ablation experiments were added to test the experimental effect of a single improved algorithm, respectively. The overall comparative experimental results are shown in Table 1.

Table 1 shows the overall performance of the detection effect of different algorithms in the dataset.

Firstly, it can be seen from Table 1 that the detection performance of the YOLO V4 algorithm is improved by the two improved methods proposed in this paper. When the two improved methods are used, the mAP value is the highest, which is 6.18% higher than that of the YOLO V4 algorithm. According to the analysis of Figures 7–9, the AP of an aeroplane increased by about 2%, the precision increased by about 5%, and the recall increased by about 11%; the AP of undercarriage increased by about 10%, the precision decreased by about 5%, and the recall increased by about 16%. It has effectively verified the rationality and advanced nature of the algorithm improvement.

(a)

(b)

(c)

(d)

(e)

(f)

(a)

(b)

(c)

(d)

(e)

(f)

Secondly, compared with the current mainstream two-stage detection network Faster R-CNN, the improved algorithm in this paper has a larger improvement in detection accuracy and detection speed. mAP is increased by 4.05% and detection time is decreased by 0.3254 s. Because the anchor of Faster R-CNN is prefixed, when the object to be detected is small, it cannot be freely adjusted according to the size of the object, resulting in frequent missed detections. In addition, Faster R-CNN is a two-stage algorithm, forming the candidate region firstly costs more time. It has a higher mAP but a slower detection speed than YOLO V4. Our algorithm has high mAP and fast detection speed.

5.2. Discussion

Figure 10 shows the detection effect of our algorithm and the YOLO V4 algorithm in this paper on some test sets.

(a)

(b)

(c)

In the image of field-collected aeroplane’s landing and taking off, the plane size is larger, and the undercarriage is relatively smaller. When the image of an aeroplane and undercarriage is single in the whole image background and occupies a relatively large proportion, and the image size is large, the detection effect of the improved algorithm in this paper and the YOLO V4 algorithm is basically the same. Both can better detect the position of the aeroplane and undercarriage (Figure 10(a)).

When the size of the aeroplane and undercarriage in the image is small (Figure 10(b)), the size of the image is 418 × 482 pixels, and the resolution of the upper and lower aeroplane is 204 × 79 pixels and 104 × 47 pixels, accounting for about 7% and 3%, respectively. Both the our algorithm and the YOLO V4 algorithm can detect the aeroplane object well. The number of pixels of the left undercarriage above Figure 10(b) is 55, accounting for about 0.02% (Figure 11), which belongs to small object detection. The proposed algorithm can detect the object effectively, but the YOLO V4 algorithm cannot detect the object correctly. The number of pixels of the left undercarriage above Figure 10(b) is 55, accounting for about 0.02% (Figure 11), which belongs to small object detection. The number of pixels of the middle undercarriage above Figure 10(b) is 26, accounting for about 0.01% (Figure 12). Neither the algorithm in this paper nor the YOLO V4 algorithm can be detected.

When there are a crowd of aeroplanes and undercarriages in the image, the image background is relatively complex, and object size proportion is relatively small, YOLO V4 algorithm appears relatively serious missed detection phenomenon, and the our algorithm can more accurately detect the aeroplane and undercarriage. Compared with the YOLO V4 algorithm, the number of aeroplane detection in our algorithm is basically the same,but seven more undercarriages were detected (Figure 10(c)).

6. Conclusions

In this paper, an improved detection algorithm based on the YOLO V4 network is proposed to combine attention mechanism with multi-scale feature processing to achieve accurate automatic detection of the aeroplane and undercarriage. Firstly, in order to improve the ability of the network to extract small target features, CSPDarkNet53 outputs P3 and P4 feature layers followed by convolutional layer optimization of Inconcept-ResNet structure and multi-scale feature processing. Secondly, the attention mechanism is added after the path aggregation network algorithm structure to improve the importance and relevance of different features after concept operation; Finally, the datasets of an aeroplane and undercarriage are constructed, and the divided test set is tested to evaluate the detection performance of the algorithm under different improvement methods. In our dataset, the comprehensive experimental results are better than YOLO V3 and Faster R-CNN. compared with the YOLO V4 algorithm, the improved algorithm improves the detection ability of an aeroplane and undercarriage. mAP improves by 6.18%, aeroplane AP improves by about 2%, accuracy improves by about 5%, and the recall rate improves by about 11%; undercarriage AP improves by about 10%, and the recall rate by about 16%. The disadvantage is that the small target detection capability has not improved enough and the detection speed has decreased, with an increase in detection time of 0.0016 s compared to YOLO V3 and 0.0039 s compared to YOLO V4. Our future work will be how to reduce the algorithm computation and increase the detection speed without decreasing the detection performance.

Data Availability

The partial data used to support the findings of this study can be found in the online versions at https://host.robots.ox.ac.uk/pascal/VOC/voc2007/devkit_doc_07-Jun-2007.pdf. The other data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 12002115), the Natural Science Foundation of Hebei Province (No. F2020402005), the Natural Science Foundation of Hebei Province (No. F 2021402011), the Science and Technology Research Projects of Colleges and Universities in Hebei Province (No. ZD2018207), the Scientific and Technological Research Projects of Universities in Hebei Province (No. QN2020181), and the Science and Technology Research and Development Project of Handan City (No.19422101008-8).

References

M. Zhang, N. Hong, and W. Xiaohui, “Dynamic analysis of aeroplane ground handling with multi-wheel undercarriages,” Journal of Nanjing University of Aeronautics & Astronautics, vol. 53, no. 04, pp. 542–546, 2008.
View at: Google Scholar
C. Liang, “Atypical failure analysis of A320 undercarriage system,” Aviation Maintenance and Engineering, vol. 60, no. 05, pp. 43–45, 2015.
View at: Google Scholar
X. Jia, W.-J. Tian, and Y.-Y. Fan, “Simulation of face key point recognition and location method based on deep learning,” Computer Simulation, vol. 37, no. 6, pp. 434–438, 2020.
View at: Google Scholar
B. Peng, X. Jin, Y. Wu, and D. Li, “Geometry guided feature aggregation in video face recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshop(ICCVW), pp. 2670–2677, Seoul, Korea(South), October 2019.
View at: Google Scholar
L. Yan-ping, Y. Jiang, J.-M. Hu, and L. Weiping, “Face recognition method based on Curvelet transform and cosine measure,” Computer Science, vol. 43, no. 5, pp. 294–297, 2016.
View at: Google Scholar
H. Zhao, X. Ying, Y. Shi, Tong X., Wen J., and Zha H., “RDCFace:radial distortion correction for face recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), pp. 7718–7727, Seattle, WA, USA, June 2020.
View at: Google Scholar
H. Liu, X. Zhu, and Z. Lei, “AdaptiveFace:adaptive margin and sampling for face recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, June 2019.
View at: Google Scholar
J. Masci, U. Meier, G. Fricout, and J. Schmidhuber, “Multi-scale pyramidal pooling network for generic steel defect classification,” in Proceedings of the International Joint Conference on Neural Networks, pp. 1–8, IEEE, Dallas, USA, USA, August 2013.
View at: Google Scholar
D. Soukup and R. Huber-Mork, “Convolutional neural networks for steel surface defect detection from photometric stereo images,” in Proceedings of the International Symposium on Visual Computing, pp. 668–677, Las Vegas, NV, USA, December 2014.
View at: Google Scholar
S. Zhou, Y. Chen, D. Zhang, J. Xie, and Y. Zhou, “Classification of surface defects on steel sheet using convolutional neural networks,” Materiali in tehnologije, vol. 51, no. 1, pp. 123–131, 2017.
View at: Publisher Site | Google Scholar
M. Yang, L. Jiao, F. Liu, B. Hou, and S. Yang, “Transferred deep learning-based change detection in remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 9, pp. 6960–6973, 2019.
View at: Publisher Site | Google Scholar
X. Niu, M. Gong, T. Zhan, and Y. Yang, “A conditional adversarial network for change detection in heterogeneous images,” IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 1, pp. 45–49, 2019.
View at: Publisher Site | Google Scholar
B. Hou, Q. Liu, H. Wang, and Y. Wang, “From W-net to CDGAN:bitemporal change detection via deep learning techniques,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 3, pp. 1790–1802, 2020.
View at: Publisher Site | Google Scholar
Y. Liao, H. Wang, L. Cunbao, L. Yang, F. Yuqiang, and N. Shuyan, “Research progress of optical remote sensing image target detection based on deep learning,” Journal of Communications, vol. 43, no. 05, pp. 190–203, 2022.
View at: Google Scholar
W. J. Luo, A. G. Schwing, and R. Urtasun, “Efficient deep learning for stereo matching,” in Proceedings of the 2016IEEE Conference on Computer Vision and Pattern Recognition, pp. 5695–5703, IEEE, Las Vegas, NV, USA, June 2016.
View at: Google Scholar
J. Schlosser, C. K. Chow, and Z. Kira, “Fusing LIDAR and images for pedestrian detection using convolutional neural networks,” in Proceedings of the 2016IEEE International Conference on Robotics and Automation, pp. 2198–2205, IEEE, Stockholm, Sweden, May 2016.
View at: Google Scholar
Y. Huang, W. Luo, and H. Lan, “Adaptive pre-aim control of driverless vehicle path tracking based on a SSA-BP neural network,” World Electric Vehicle Journal, vol. 13, no. 4, 55 pages, 2022.
View at: Publisher Site | Google Scholar
L. Yu, X. Wang, Z. Hou, Z. Du, Y. Zeng, and Z. Mu, “Path planning optimization for driverless vehicle in parallel parking integrating radial basis function neural network,” Applied Sciences, vol. 11, no. 17, 8178 pages, 2021.
View at: Publisher Site | Google Scholar
A. Bochkovskiy, C.-Y. Wang, and H.-Y. Mark Liao, “YOLOv4: optimal speed and accuracy of object detection,” 2020, https://arxiv.org/abs/2004.10934.
View at: Google Scholar
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: unified, real-time object detection,” in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788, Las Vegas, NV, USA, June 2016.
View at: Publisher Site | Google Scholar
J. Redmon and A. Farhadi, “YOLO9000: better,Faster,Stronger,” in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 6517–6525, Honolulu, HI, USA, July 2017.
View at: Publisher Site | Google Scholar
J. Redmon and A. Farhadi, “Yolov3: an incremental improvement,” 2018, https://arxiv.org/abs/1804.02767.
View at: Google Scholar
C. Chen, M. Y. Liu, O. Tuzel, and J. Xiao, “R-CNN for small object detection,” Computer Vision – ACCV 2016. ACCV 2016, Springer, Berlin, 2017.
View at: Publisher Site | Google Scholar
R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448, Santiago, December 2015.
View at: Google Scholar
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
View at: Publisher Site | Google Scholar
C. Fengmei, H. F. Tian, and L. J. Jun, “Image semantic segmentation combined with feature map segmentation,” Ch J Image Gr, vol. 24, no. 03, pp. 464–473, 2019.
View at: Google Scholar
S. Eivazi, T. Santini, A. Keshavarzi, and A. Mazzei, “Improving realtime CNN-based pupil detection through domain-specific data augmentation,” in Proceedings of the 11th ACM Symposium on Eye Tracking Research & Applications, Denver Colorado, June 2019.
View at: Google Scholar
Z. Liu, G. Gao, L. Sun, and L. Fang, “IPG-net: image pyramid guidance network for small object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recongnition Workshops, pp. 1026-1027, Seattle, WA, USA, June 2020.
View at: Google Scholar
T. Y. Lin, P. Dollar, R. Girshick, He K., Hariharan B., and Belongie S., “Feature pyramid networks for object detection,” in Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, HI, USA, July 2017.
View at: Google Scholar
S. W. Kim, H. K. Kook, J. Y. Sun, M. C. Kang, and S. J. Ko, “Parallel feature pyramid network for object detection,” in Proceedings of the 16th European Conference on Computer Vision, pp. 234–250, Munich, Germany, September 2018.
View at: Google Scholar
K. He, X. Zhang, and S. Ren, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904–1916, 2014.
View at: Google Scholar
C. Szegedy, Sergey Ioffe, and V. Vincent, “Inception-v4, inception-ResNet and the impact of residual connections on learning,” CoRR, 2016.
View at: Google Scholar
J. Hu, L. Shen, and S. Albanie, “Squeeze-and-Excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141, IEEE, Piscataway, NJ, USA Salt Lake City, UT, USA, June 2018.
View at: Google Scholar
S. Liu, “Path aggregation network for instance segmentation,” CoRR, 2018.
View at: Google Scholar
K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, Las Vegas, NV, USA, June 2016.
View at: Google Scholar
C. Y. Wang, H. Liao, Y. H. Wu, Y. H. Chen, P. Y. Hsieh, and Yeh I. H., “CSPNet: a new backbone that can enhance learning capability of CNN,” in Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1571–1580, Seattle, WA, USA, June 2020.
View at: Google Scholar
D. Misra, “Mish: a self regularized non-monotonic neural activation function,” CoRR, 2019.
View at: Google Scholar

Copyright

Copyright © 2022 Ruizhen Gao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies