Improved Real-Time Traffic Obstacle Detection and Classification Method Applied in Intelligent and Connected Vehicles in Mixed Traffic Environment

Du, Luyao; Chen, Xiongjie; Pei, Zhonghui; Zhang, Donghua; Liu, Bo; Chen, Wei

doi:https://doi.org/10.1155/2022/2259113

Journal of Advanced Transportation

On this page

Abstract Introduction Dataset Experimental Results Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Management and Control Methods of Mixed Traffic Flow with Connected Automated Vehicles

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 2259113 | https://doi.org/10.1155/2022/2259113

Improved Real-Time Traffic Obstacle Detection and Classification Method Applied in Intelligent and Connected Vehicles in Mixed Traffic Environment

Luyao Du,¹Xiongjie Chen,²Zhonghui Pei,³Donghua Zhang,⁴Bo Liu,⁴and Wei Chen¹

Academic Editor: Gen Li

Received22 Jan 2022

Revised20 Mar 2022

Accepted24 Mar 2022

Published07 Apr 2022

Abstract

Mixed traffic is a common phenomenon in urban environment. For the mixed traffic situation, the detection of traffic obstacles, including motor vehicle, non-motor vehicle, and pedestrian, is an essential task for intelligent and connected vehicles (ICVs). In this paper, an improved YOLO model is proposed for traffic obstacle detection and classification. The YOLO network is used to accurately detect the traffic obstacles, while the Wasserstein distance-based loss is used to improve the misclassification in the detection that may cause serious consequences. A new established dataset containing four types of traffic obstacles including vehicles, bikes, riders, and pedestrians is collected under different time periods and different weather conditions in urban environment in Wuhan, China. Experiments are performed on the established dataset on Windows PC and NVIDIA TX2, respectively. From the experimental results, the improved YOLO model has higher mean average precision than the original YOLO model and can effectively reduce intolerable misclassifications. In addition, the improved YOLOv4-tiny model has a detection speed of 22.5928 fps on NVIDIA TX2, which can basically realize the real-time detection of traffic obstacles.

1. Introduction

Traffic conditions in urban societies can be highly complex, since vehicles, pedestrians, and riders may be on the same road, especially in developing countries. In recent years, the rapid rise of bike-sharing and take-away industries has aggravated this phenomenon to a certain extent. The coexistence of vehicles, bikes, riders, and pedestrians has brought great challenges to driving safety in urban areas. The detection and classification of vehicles, bikes, riders, and pedestrians is essential for ICVs [1].

Vision-based object detection and classification is an important method to achieve traffic obstacle detection and classification. In traditional object detection methods, traditional machine learning methods such as scale-invariant feature transform (SIFT) [2] and histogram of oriented gradients (HOG) [3] extract the object features and input the extracted features into the classifiers like support vector machine (SVM) [4] and AdaBoost [5]. The design of these features can be very complicated; in particular, these features are handcrafted features, and their performances are task-dependent, which is not scalable to large-scale applications and can hardly be generalized. At this stage, traditional machine learning object detection methods can barely meet the requirements in practical applications; therefore, new target detection methods are needed. With the development of deep learning, many deep learning techniques have been applied to the field of object detection, among which the deep convolutional neural network (CNN) [6] is the most prominent one. Unlike traditional feature extraction algorithm relying on domain knowledge, CNN has shown to be invariant to geometric transformation, deformation, and illumination, thus effectively overcoming the difficulties caused by the variability of non-motorized vehicle appearance. It also can adaptively capture complex feature patterns by learning from data, leading to its high flexibility and generalization ability. Many deep learning-based object detection methods were proposed in recent years, including one-stage and two-stage detection methods, as shown in Figure 1 [7]. One-stage detection algorithms, such as YOLO [8], SSD [9], and Retina-Net [10], do not need to predict region proposals. In particular, they directly generate the label and the location of objects. After a single test, the final detection result can be obtained in an end-to-end manner, so that the detection speed is faster. In contrast, the two-stage detection algorithm divides the detection problem into two stages. Firstly, the region proposals are generated, and then the regional proposals are classified. In most cases, the predicted positions need to be refined. One typical example of the two-stage algorithms is the family of R-CNN algorithm, which is based on the region proposal, including R-CNN [11], SPPNet [12], Fast R-CNN [13], Faster R-CNN [14], and FPN [15].

Among all the aforementioned state-of-the-art object detection algorithms, YOLOv3 [16] and YOLOv4 [17] are arguably the most promising approaches. Proposed by Redmon et al. in 2018 and by Bochkovskiy et al. in 2020, YOLOv3 and YOLOv4 have both high detection speed and accuracy and can be used for the detection and classification of traffic obstacles. Researchers have conducted many research studies on the detection of traffic obstacles based on YOLO [18–22]. Wang et al. [18] used YOLOv3 to detect vehicles, pedestrians, and non-motor vehicles, which improved the detection accuracy. Narayanan et al. [19] proposed a model using HOG and YOLO algorithm for pedestrian detection in thermal images. Hung et al. [20] performed real-time obstacle detection with the YOLO model on an embedded system. Wang [21] proposed a real-time vehicle detection algorithm that integrates vision and lidar point cloud information, which achieved high detection accuracy and good real-time performance. Arvind et al. [22] developed a near-range obstacle sensing system based on vision sensor, which can ensure early detection and tracking of the obstacle. Zhang et al. [23] proposed a classification method for four classes of moving object using 3D point cloud, which recognized the moving objects effectively. Feng et al. [24] presented a 32-layer multibranch method for object detection in traffic scenes, which achieved the state-of-the-art performance. Li et al. [25] proposed an improved multivehicle detection method considering traffic flow, which achieved good performance and robustness. Wang et al. [26] presented a vision-based crash detection framework in mixed traffic flow environment, which achieved a high detection rate with relatively low false alarm rate. Cai et al. [27] presented an improved framework for object detection based on YOLOv4. Hnewa et al. [28] outlined the state-of-the-art frameworks for object detection under rainy conditions. Liu et al. [29] proposed a radar and camera information fusion method for object recognition. Bell et al. [30] presented a real-time system for night time vehicle detection. Satyanarayana et al. [31] proposed a vehicle method for heterogeneous and lane-less traffic. However, the above research seldom carried out on-vehicle real-time detection and classification of traffic obstacles based on the target characteristics of real hybrid traffic scenes, and the detection accuracy and real-time performance can be further improved.

In the task of traffic obstacle identification and classification, each misclassification is considered to be the same in terms of the potential costs it may bring. However, in actual applications, different misclassifications can result in significant different consequences for ICVs, and some may only lead to minor mistakes, while the others can bring disastrous consequences. To improve the safety of ICVs and avoid disastrous consequences caused by wrong predictions, one may need to assign different weights to different mislabelled results. Recently, the application of Wasserstein distance in object detection system has attracted much attention from the machine learning community [32]. Wasserstein distance [33] is a measure of distance between probability distributions, combining with which the loss function of YOLO could effectively reduce the probability of producing intolerable misclassification in ICVs, thereby reducing the security risk caused by misclassification.

In this paper, an improved Wasserstein distance loss is proposed based on the YOLO model. The main contributions of this paper can be summarized as follows:(i)A new dataset, containing traffic obstacles including vehicles, bikes, riders, and pedestrians under different time periods and different weather conditions in urban environment in Wuhan, China, is collected and established for detection.(ii)Based on YOLO network, the improved model is designed for traffic obstacle detection. The Wasserstein distance-based loss, which assigns different weights for one sample classified to different classes with different values, so that the misclassified objects are classified to similar classes with a higher probability, is combined with the loss function of YOLO to enhance the performance of traffic obstacle detection.(iii)The improved model is deployed on NVIDIA TX2 for real-time detection and then compared with the original model. Empirical experiments show that the improved model presents more accurate and robust results than the original model, and its real-time performance can basically meet the requirements of real-time detection applications.

The remainder of this paper is organized as follows. In Section 2, the dataset collected in real scenes is described. Section 3 presents the Wasserstein loss-based YOLO model, including the network architecture of the designed model and the loss function for training it. The experimental results are reported in Section 4. Finally, the conclusions are presented in Section 5.

2. Dataset

2.1. Data Acquisition

In order to achieve accurate and efficient traffic obstacle detection, image data specifically for traffic obstacles including vehicle, bike, rider, and pedestrian were collected by a camera at a 1920 1080 pixel resolution in Wuhan, China. The collection was conducted during different time periods including daytime and nightfall, and the weather conditions included sunny and cloudy. 496 image data were selected as the original image data to establish the dataset.

2.2. Data Classification

In the urban hybrid traffic scenario, vehicle, bike, rider, and pedestrian are the main traffic obstacles that affect the driving safety of intelligent and connected vehicles. Therefore, as shown in Figure 2, the detection objects in the collected data are divided into these four categories.

2.3. Data Augmentation

As shown in Figure 3, in order to enrich the dataset and enhance the robustness, data augmentation operations including rotation and brightness transformation were performed on the image data. The dataset after data augmentation contains a total of 2976 image data in hybrid traffic scenes.

(a)

(b)

(c)

(d)

(e)

(f)

2.4. Data Annotation

After the above processing, the dataset was manually labelled. In the images, objects with contours less than 50% and small targets that cannot be seen clearly were not labelled. The detailed sample size of each category that has been labelled is shown in Table 1.

3. Methodology

3.1. YOLO Model

In this paper, the YOLO-based detection models, including YOLOv3, YOLOv4, and YOLOv4-tiny, are established. In the YOLOv3 model [16], the image is divided into SS grid cells, and the grid cell at the center of the object is responsible for completing the prediction of the object. In view of the large number of vehicles, bikes, riders, and pedestrians in the urban hybrid traffic environment and the large difference in scale, the model uses the multiple scale fusion method to make predictions. The features of the three detection scales with sizes of 1313, 2626 and 5252 are fused, so as to be compatible with large and small objects.

The network is mainly composed of a series of 1 x 1 and 3 x 3 convolutional layers (each convolutional layer is followed by a BN layer and a LeakyReLU layer). Three detections were performed in the network, which were performed during 32 times downsampling (2^5), 16 times downsampling (2^4), and 8 times downsampling (2^3). After the 79th layer of the convolutional network, it passes through several convolutional layers to obtain a scale of detection results. Compared with the input image, the feature map used for detection here has 32 times downsampling. Due to the high downsampling factor, the receptive field of the feature map here is relatively large, so it is suitable for detecting objects of relatively large size in the image data. In order to achieve fine-grained detection, start sampling from the feature map of the 79th layer and then fuse it with the feature map of the 61st layer (concatenation) to obtain a fine-grained feature map of the 91st layer, which also passes through several convolutional layers, and then get a 16 times downsampled feature map relative to the input image, which has a medium-scale receptive field and is suitable for detecting medium-scale objects. Finally, the 91st layer feature map is upsampled again and fused with the 36th layer feature map to obtain a feature map that is downsampled 8 times relative to the input image. It has the smallest receptive field and is suitable for detecting small-sized objects.

YOLOv4 [17] has made a series of improvements on the basis of YOLOv3, mainly including the following: the backbone feature extraction network is changed from DarkNet53 to CSPDarkNet53 [34], the feature pyramid is changed to SPP [35] and PAN [36], the classification regression layer is unchanged for YOLOv3, etc.

The YOLOv4-tiny network structure is a simplified version of YOLOv4, which is a lightweight model with only 6 million parameters equivalent to one-tenth of the original. As shown in Figure 4, the overall network structure has 38 layers, using three residual units, the activation function uses LeakyReLU, the classification and regression of the target are changed to use two feature layers, and the feature pyramid network (FPN) is used when merging the effective feature layers. It also uses the CSPnet structure, performs channel segmentation on the feature extraction network, divides the feature layer channel output after 3x3 convolution into two parts, and takes the second part. The detection speed of the YOLOv4-tiny model has been greatly improved, which makes it possible to be deployed on mobile embedded terminals such as NVIDIA TX2 for real-time detection.

3.2. Wasserstein Distance-Based Loss

To alleviate the undesirable consequences caused by misclassification, we propose to incorporate the Wasserstein distance into the framework of YOLO and apply it to ICVs. The Wasserstein distance is a metric for measuring the discrepancy or dissimilarity between probability measures, and it calculates the cost of moving one distribution to another one [37]. For discrete distributions and , the Wasserstein distance between and can be formulated as follows:where is the set of all possible optimal transport plan between and and is the distance matrix, whose element measures the distance between and . In particular, for arbitrary optimal transport plan , it has to satisfywhich implies that the optimal transport plan can also be interpreted as a joint distribution of and . To say it in another way, the Wasserstein distance tries to find the optimal joint distribution of and , which can produce the minimal cost of transporting to . Compared to other distance metrics for probability measures such as Kullback–Leibler divergence, Hellinger distance, and Jensen–Shannon divergence, the Wasserstein distance has some favourable geometry properties. Firstly, it is a valid distance metric, i.e., it is symmetric and non-negative, and also satisfies the triangular inequality and identity of indiscernible. Secondly, it can capture the geometry in the underlying space [38].

In object detection, we consider the source distribution as the prediction of the probability distribution of the label of objects and as the ground truth of the label of objects. More specifically, in this paper, the discrepancy between the predictions produced by classifier and the ground-truth labels will be measured by the Wasserstein distance. In [39], it is proved that if either the source distribution or the target distribution is a one-hot histogram, there is only one possible transport plan, and the Wasserstein distance between the source distribution and the target distribution can be calculated bywhere is the index of the one-hot element in , is the number of object classes, and represents the -th row of the distance matrix . The distance matrix, which specifies the distance between categories, needs to be predefined. In this paper, there are four categories in the dataset, namely, vehicle, bike, rider, and pedestrian. As discussed in the Introduction, different misclassifications may result in different consequences, and if the classifier is able to discriminate two different misclassifications, then disastrous consequences can be avoided. For example, classifying “bike” as “rider” may not change the decision made by autonomous driving system, as bike and rider share the same behaviour pattern in a large degree. However, classifying “bike” as “vehicle” is very likely to have significant influence on the decision-making process of a self-driving vehicle, not only because bike and vehicle are different objects but also because they are expected to have distinct trajectories. To prevent the above undesirable problem, in the proposed method, the distance matrix is defined as in Figure 5.

Denote by the predicted location and the ground-truth location, the predicted confidence and ground-truth confidence, and the predicted class and ground-truth class; in the original Yolov3, the loss function is composed of three parts, the location loss , confidence loss , and classification loss . In this paper, we propose to use an additional loss, the Wasserstein loss, and thus the modified YOLO loss function becomeswhere is a hyperparameter that controls the weight of the Wasserstein distance.

4. Experimental Results

4.1. Experimental Environment

The experiments were trained and tested on a Windows PC with two Intel Xeon processors, a CPU at 3.5 GHz, 128 G DDR4, and an NVIDIA GeForce RTX 2080 with 8 GB memory. The established dataset is divided into training set and test set at a ratio of 9 : 1. During training, all but three output layers were first frozen to get a stable loss and then unfrozen, and training was continued to fine-tune. To avoid overfitting, when the loss cannot be reduced within ten epochs, training is terminated. In addition, the original and improved YOLO models were performed on NVIDIA TX2 for real-time detection.

4.2. Evaluation Metric

In this study, the precision-recall curve (P-R curve), F1 score, and mean average precision (mAP) were used to evaluate the performance of the model.

The P-R curve is a curve composed of the value of precision (P) as the ordinate and the value of recall (R) as the abscissa, where P can be defined as

R can be defined aswhere the definitions of TP, FN, and FP are shown in Table 2.

F1 score, an index that comprehensively considers the values of P and R to reflect the performance of the detection model, can be defined as

The area under the P-R curve is the value of the average precision (AP), and the AP value over four categories of the obstacle objects in the hybrid traffic scene is defined as mAP. The AP and mAP value can be defined as

4.3. Result of Designed Models on Established Dataset

In order to verify the detection effect of the designed models, the models including YOLOv3, YOLOv4, and YOLOv4-tiny were performed on the four categories of obstacle objects. The loss curves of the designed models are shown in Figures 6, 7, and 8, respectively.

It can be seen from the loss curves that the loss value of the improved model is higher than that of the original model at the beginning of training, and the loss value of the improved model and original model is basically the same when the loss value stabilizes. This is because of the addition of Wasserstein distance-based loss to the improved model. The final loss values of the YOLOv3, YOLOv4, and YOLOv4-tiny models are about 24.5, 10, and 11.5, respectively.

The experimental results of designed models are shown in Table 3, and the P-R curves are shown in Figure 9. It can be seen from the experimental results that the mAP of the improved YOLOv3, YOLOv4, and YOLOv4-tiny models is 98.57%, 98.19%, and 80.39%, respectively, slightly higher than that of each original model, and the F1 value of the improved models is basically the same as each original model.

(a)

(b)

(c)

(d)

(e)

(f)

4.4. Result of Designed Models on BDD Dataset

BDD is one of the latest published autonomous driving datasets with dense traffic scenes, on which the detection effect of the designed models is also verified. In the BDD dataset, there are few objects in the bike and rider categories, so we selected the data containing these two categories of objects for testing to maintain the relative balance between the various categories. The experimental results of designed models are shown in Table 4, and the P-R curves are shown in Figure 10.

(a)

(b)

(c)

(d)

(e)

(f)

It can be seen from the experimental results that the mAP of the improved YOLOv3, YOLOv4, and YOLOv4-tiny models is 92.97%, 91.23%, and 77.97%, respectively, higher than that of each original model, and the F1 value of the improved models is basically the same as each original model. The detection mAP value of the designed model on the BDD dataset is slightly lower than that on the established dataset. This is because the training of the model is carried out on the training set in the established dataset, which is similar to the testing set scene but different from the BDD dataset scene. However, the detection results on both datasets could meet the basic application requirements.

4.5. The Application-Oriented Performance on NVIDIA TX2

NVIDIA TX2 is a mobile terminal that can be deployed directly on the vehicle. The vehicle application scenarios on NVIDIA TX2 are shown in Figure 11. The trained improved and original models are deployed on NVIDIA TX2, respectively, and then tested on the established dataset. In addition, the NVIDIA TX2 with a camera is installed on the vehicle for real-time detection to verify the detection effect and real-time performance of the proposed model.

The detection speed of different models is shown in Table 5. As can be seen from the table, the detection speed of the improved YOLOv3 and YOLOv4 models on NVIDIA TX2 is between 3 fps and 4 fps, while on Windows PC, it is between 8 fps and 9 fps, which is a little poor in real-time performance. The detection speed of the improved YOLOv4-tiny model on NVIDIA TX2 is above 22 fps, while on Windows PC, it is above 27 fps, which can basically realize the real-time detection of traffic obstacles.

The real-time detection effect of the improved YOLOv4-tiny model was verified on the NVIDIA TX2 and compared with the original YOLOv4-tiny model. As shown in Figure 12, some misclassifications detected by the original model can be effectively and correctly classified by the improved model, proving that the improved model can effectively reduce intolerable misclassifications between different categories.

(a)

(b)

(c)

(d)

(e)

(f)

5. Conclusions

In this paper, an improved YOLO model for traffic obstacle detection and classification applied in ICVs is presented. A new dataset containing traffic obstacles collected under different time periods and different weather conditions in urban environment was established. The improved models, which reduce the intolerable misclassification and enhance the performance of traffic obstacle detection by combining the Wasserstein distance-based loss with the YOLO models, were designed and implemented. The improved model was trained and then tested on established dataset and selected BDD dataset and deployed on NVIDIA TX2 for real-time detection.

Experimental results showed that the mAP values of the improved YOLOv3, YOLOv4, and YOLOv4-tiny models are 98.57%, 98.19%, and 80.39%, respectively, higher than those of each original model. From the application-oriented performance on NVIDIA TX2, the detection speed of the improved YOLOv4-tiny model is 22.5928 fps, which is much better than that of the YOLOv3 and YOLOv4 models and basically meets the real-time detection requirements of traffic obstacles. In addition, in the real-time vehicle verification, the improved YOLOv4-tiny model can reduce the intolerable misclassifications between different categories more effectively than the original model. In practical applications, the improved model could effectively improve the accuracy of decision making for ICVs, thereby improving the driving safety. In the future study, the dataset could be enriched and the detection model could be further optimised.

Data Availability

The established dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This research was supported in part by the National Key R&D Program of China under grant no. 2018YFB0105205 and in part by the Hubei Province Technological Innovation Major Project under grant no. 2019AAA025.

References

K. Q. Li, Y. F. Dai, S. B. Li, and M. Y. Bian, “State-of-the-art and technical trends of intelligent and connected vehicles,” Journal of Automotive Safety and Energy, vol. 8, no. 1, pp. 1–14, 2017.
View at: Google Scholar
X. Y. Ma and W. E. L. Grimson, “Edge-based rich representation for vehicle classification,” in Proceedings of the 10th IEEE International Conference on Computer Vision, pp. 1185–1192, Beijing, China, October 2005.
View at: Google Scholar
Y. Taigman, M. Yang, and M. Ranzato, “Deepface: closing the gap to human-level performance in face verification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708, Columbus, OH, USA, 2014.
View at: Google Scholar
F. M. Kazemi, S. Samadi, and H. R. Poorreza, “Vehicle recognition using curvelet transform and svm,” in Proceedings of the 4th IEEE International Conference on Information Technology, pp. 516–521, Las Vegas, NV, USA, April 2007.
View at: Google Scholar
Y. K. L. Lai, Y. H. C. Chou, and T. Schumann, “Vehicle detection for forward collision warning system based on a cascade classifier using adaboost algorithm,” in Proceedings of the 7th IEEE International Conference on Consumer Electronics, pp. 47-48, Berlin, Germany, 3–6 Sept. 2017.
View at: Google Scholar
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, p. 640, 2017.
View at: Google Scholar
Z. X. Zou, Z. W. Shi, Y. H. Guo, and J. P. Ye, “Object detection in 20 years: a survey,” pp. 1–40, 2019, https://arxiv.org/abs/1905.05055.
View at: Google Scholar
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, Las Vegas, NV, USA, 27–30 June 2016.
View at: Google Scholar
W. Liu, D. Anguelov, D. Erhan et al., “Ssd: single shot multibox detector,” in Proceedings of the European conference on computer vision, pp. 21–37, Springer, Amsterdam, The Netherlands, December 2016.
View at: Google Scholar
T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 318–327, 2020.
View at: Publisher Site | Google Scholar
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587, Columbus, OH, USA, 23–28 June 2014.
View at: Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” in Proceedings of the European conference on computer vision, pp. 346–361, Springer, Cham, April 2014.
View at: Google Scholar
R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, Santiago, Chile, 7–13 Dec. 2015.
View at: Google Scholar
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: towards real-time object detection with region proposal networks,” Advances in Neural Information Processing Systems, pp. 91–99, 2015.
View at: Google Scholar
T. Y. Lin, P. Doll´ar, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, vol. 1, no. 2, p. 4, Honolulu, HI, USA, 21–26 July 2017.
View at: Google Scholar
J. Redmon and A. Farhadi, “YOLOv3: an incremental improvement,” 2016, https://arxiv.org/abs/1804.02767.
View at: Google Scholar
A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: optimal speed and accuracy of object detection,” 2020, https://arxiv.org/abs/2004.10934.
View at: Google Scholar
S. Y. Wang and T. Ahmad, “A real-time detection method of traffic targets based on YOLO,” Computer & Digital Engineering, vol. 48, no. 9, pp. 2162–2167, 2020.
View at: Google Scholar
A. Narayanan, R. Darshan Kumar, R. RoselinKiruba, and T. Sree Sharmila, “Study and analysis of pedestrian detection in thermal images using YOLO and SVM,” in Proceedings of the 2021 Sixth International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp. 431–434, Chennai, India, March 2021.
View at: Google Scholar
S. H. Hung, K. W. Chen, C. H. Chen, H. T. Chou, and C. Y. Yao, “Real-time obstacle detection on embedded system,” in Proceedings of the 2018 IEEE Visual Communications and Image Processing (VCIP), pp. 1-2, Taichung, Taiwan, China, December 2018.
View at: Google Scholar
H. Wang, X. Lou, Y. Cai, Y. Li, and L. Chen, “Real-time vehicle detection algorithm based on vision and lidar point cloud fusion,” Journal of Sensors, vol. 2019, Article ID 8473980, 9 pages, 2019.
View at: Publisher Site | Google Scholar
C. S. Arvind, R. Jyothi, K. Mahalakshmi, C. K. Vaishnavi, and U. Apoorva, “Vision based driver assistance for near range obstacle sensing under unstructured traffic environment,” in Proceedings of the Proceedings of 2019 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1163–1170, Xiamen, China, December 2019.
View at: Google Scholar
M. Zhang, R. Fu, Y. Guo, and L. Wang, “Moving object classification using 3D point cloud in urban traffic environment,” Journal of Advanced Transportation, vol. 2020, Article ID 1583129, 12 pages, 2020.
View at: Publisher Site | Google Scholar
J. Feng, F. Wang, S. Feng, and Y. Peng, “A multibranch object detection method for traffic scenes,” Computational Intelligence and Neuroscience, vol. 2019, Article ID 3679203, 16 pages, 2019.
View at: Publisher Site | Google Scholar
X. Li, Y. Liu, Z. Zhao, Y. Zhang, and L. He, “A deep learning approach of vehicle multitarget detection from traffic video,” Journal of Advanced Transportation, vol. 2018, Article ID 7075814, 11 pages, 2018.
View at: Publisher Site | Google Scholar
C. Wang, Y. Dai, W. Zhou, and Y. Geng, “A vision-based video crash detection framework for mixed traffic flow environment considering low-visibility condition,” Journal of Advanced Transportation, vol. 2020, Article ID 9194028, 11 pages, 2020.
View at: Publisher Site | Google Scholar
Y. Cai, T. Luan, H. Gao et al., “YOLOv4-5D: an effective and efficient object detector for autonomous driving,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–13, 2021.
View at: Publisher Site | Google Scholar
M. Hnewa and H. Radha, “Object detection under rainy conditions for autonomous vehicles: a review of state-of-the-art and emerging techniques,” IEEE Signal Processing Magazine, vol. 38, no. 1, pp. 53–67, 2021.
View at: Publisher Site | Google Scholar
Z. Liu, Y. Cai, H. Wang et al., “Robust target recognition and tracking of self-driving cars with radar and camera information fusion under severe weather conditions,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–14, 2021.
View at: Publisher Site | Google Scholar
A. Bell, T. Mantecon, C. Diaz, C. R. del-Blanco, F. Jaureguizar, and N. Garcia, “A novel system for nighttime vehicle detection based on foveal classifiers with real-time performance,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–13, 2021.
View at: Publisher Site | Google Scholar
G. S. R. Satyanarayana, S. Majhi, and S. K. Das, “A vehicle detection technique using binary images for heterogeneous and lane-less traffic,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–14, 2021.
View at: Publisher Site | Google Scholar
M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein generative adversarial networks,” in Proceedings of the International Conference on Machine Learning (ICML), pp. 214–223, Xiamen, China, October 2019.
View at: Google Scholar
C. Villani, Optimal Transport: Old and New, Springer Science & Business Media, Berlin, Germany, vol. 338, 2008.
C. Y. Wang, H. Y. M. Liao, Y. H. Wu, P. Y. Chen, J. W. Hsieh, and I. H. Yeh, “CSPNet: a new backbone that can enhance learning capability of cnn,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPR Workshop), Seattle, WA, USA, June 2020.
View at: Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904–1916, 2015.
View at: Publisher Site | Google Scholar
S. Liu, L. Qi, H. F. Qin, J. P. Shi, and J. Y. Jia, “Path aggregation network for instance segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759–8768, Salt Lake City, UT, USA, June 2018.
View at: Publisher Site | Google Scholar
N. Courty, R. Flamary, D. Tuia, and A. Rakotomamonjy, “Optimal transport for domain adaptation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 9, pp. 1853–1865, 2016.
View at: Google Scholar
G. Peyré and M. Cuturi, “Computational optimal transport,” Foundations and Trends in Machine Learning, vol. 11, no. 5-6, pp. 355–607, 2019.
View at: Google Scholar
N. Bonnotte, Unidimensional and Evolution Methods for Optimal Transportation, Paris, France, 2013, Ph. D. thesis.

Copyright

Copyright © 2022 Luyao Du et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies