Abstract

An obstacle detection method based on VM (VIDAR and machine learning joint detection model) is proposed to improve the monocular vision system's identification accuracy. When VIDAR (Vision-IMU-based detection and range method) detects unknown obstacles in a reflective environment, the reflections of the obstacles are identified as obstacles, reducing the accuracy of obstacle identification. We proposed an obstacle detection method called improved VM to avoid this situation. The experimental results demonstrated that the improved VM could identify and eliminate unknown obstacles. Compared with more advanced detection methods, the improved VM obstacle detection method is more accurate. It can detect unknown obstacles in reflection, reflective road environments.

1. Introduction

Obstacle detection has become a major concern in the field of driver assistance systems due to the complexity of the outdoor environment. Cameras (monocular, binocular, infrared, etc.), lidar, and millimeter-wave radar are all examples of obstacle identification equipment. While lidar and millimeter-wave radar are highly accurate at detecting obstacles, their high cost limits their use in low-end vehicles [13]. Due to the low cost, high detection accuracy, and speed of vision-based obstacle identification equipment, it has become more suitable for various vehicles [4, 5]. The vision-based sensor used in this study is a camera. Camera, GPS, and IMU constitute an innovative sensor combination. Compared with single sensor, the application of multisensor information fusion technology can improve the reliability of the whole system, enhance the reliability of data, improve the accuracy, and increase the information utilization rate of the system in solving the problems of detection, tracking, and target recognition.

Machine learning is the process of training and identifying images using deep convolutional neural networks. Compared with other image recognition technologies, machine learning has an extremely high recognition rate for specific images. While machine learning is capable of accurate classification, it can only be used to identify known obstacles. While the vehicle is in motion, using machine learning to identify unknown obstacles may result in misidentification, posing a serious risk to the vehicle's safety (Figure 1). As a result, a method for detection and ranging using vision and an IMU (inertial measurement unit) has been proposed [6]. Given that VIDAR requires more time to run than machine learning, a method combining VIDAR and machine learning to detect obstacles has been proposed (called the VM method). Machine learning is used to identify known obstacles in the proposed method, while VIDAR is used to detect unknown obstacles.

To avoid the situation in which VIDAR detects the reflection as an obstacle when used in a reflective environment (Figure 2) and improve detection accuracy, a VIDAR-based pseudo-obstacle detection method (called improved VIDAR) has been proposed. This method's identification procedure is as follows. The rectangle of the obstacle was determined. The width of the obstacle rectangle was calculated using the transformation relationship between pixel coordinates and world coordinates, and then the height of the obstacle rectangle was calculated using the transformation relationship between pixel coordinates and world coordinates. The true obstacle is determined by the fact that the actual height of the obstacle rectangle remains constant throughout the ego-vehicle movement. If the obstacle is a real one, tracking is continued.

To accelerate the detection speed of improved VIDAR, we combined it with machine learning (this article uses the faster RCNN algorithm) to identify known obstacles, which we refer to as improved VM. The improved VM obstacle detection method can quickly and accurately detect obstacles on reflection roads. The enhanced VM obstacle detection procedure is as follows: first, machine learning is used to identify known obstacles; second, the identified obstacles are removed from the background area; and finally, pseudo-obstacles are eliminated through the use of enhanced VIDAR.

As the core section of automobile-assisted driving, obstacle detection has emerged as a critical area of research in recent years. Due to its simple ranging principle, the monocular vision sensor has become the primary obstacle identification equipment in obstacle identification. Many scholars have conducted related research on obstacle identification to accelerate the process. Traditional image object classification and detection algorithms and strategies are difficult to meet the requirements of image and video big data in terms of processing efficiency, performance, and intelligence. Deep learning establishes the mapping from low-level signals to high-level semantics by simulating the hierarchical structure similar to the human brain, so as to realize the hierarchical feature expression of data, and has powerful visual information processing capabilities. Therefore, in the field of machine vision, the representative of deep learning-convolutional neural network (CNN) is widely used [7, 8]. Convolutional neural networks are also called cellular nonlinear networks. Arena et al. have stressed the universal role that cellular nonlinear networks (CNNs) are assuming today. It is shown that the dynamical behavior of 3D CNN-based models allows us to approach new emerging problems, to open new research frontiers [9]. Shustanov and Yakimov proposed an implementation of the traffic signs recognition algorithm using a convolution neural network; training of the neural network is implemented using the TensorFlow library and massively parallel architecture for multithreaded programming CUDA; and the experiment proves the high efficiency of this method [10]. Zhu et al. have proposed a novel image classification framework that combines CNN and KELM (kernel extreme learning machines). They extracted feature categories using DenseNet as a feature extractor and radial basis function kernel ELM as a classifier to improve image classification performance [11]. Wang et al. proposed the occlusion-free road segmentation network, a fully convolutional neural network. Through foreground objects and visible road layouts, this method can predict roads in the semantic domain [12]. The accuracy of obstacle identification is also continuously improved as new machine learning concepts such as SegNet, YOLO v5, faster RCNN, BigGAN, and mask RCNN are developed [1320]. While machine learning is capable of accurate classification, it can only be used to identify known obstacles. Unknown obstacles may be missed while the vehicle is moving, which will cause a serious impact on the vehicle's safety.

Generally, obstacles are detected using significant information such as color and prior shape. Zhu et al. proposed a method for detecting vehicles based on their edges and symmetry characteristics [21]. They hypothesized the vehicle's location based on the image's detected symmetric regions. The vehicle's bounding box is determined using the projected image of the enhanced vertical and horizontal edges. Zhang et al. [2123] used color information for background removal and shadow detection to improve object segmentation and background updating. This method is capable of rapidly and precisely detecting moving objects. Zhang et al. [24] introduced Deep Local Shapes (DeepLS), which are high-quality 3D shapes that can be encoded and reconstructed without requiring an excessive amount of storage. This local shape of the scene decomposition simplifies the prior distribution that the network must learn and accelerates and accurately detects obstacles. However, in an environment with reflections, the reflections contain significant information about the obstacles they use, reducing the accuracy of obstacle detection.

The vehicle's position is generally determined by highlight information, such as the highlighted area and contour features. Park and Song [25] proposed a front vehicle identification algorithm based on contrast enhancement and vehicle lamp pairing. Lin et al. [26] discovered that the characteristics of headlights were more distinctive than the contours of vehicles and had a greater identification effect and thus proposed the use of lamps as a sign for vehicle identification at night. The Hough transform was proposed by Dai et al. [27] as a method for intelligent vehicle identification at night. This method divides the extracted lamps into connected domains, extracts the lamps' edges, and then identifies the circle using the Hough transform. Finally, by pairing the lamps, the vehicle's location is determined. Kavya et al. [28] proposed a method for detecting vehicles based on the color of the brake lamp during braking in the captured color image. The feature information required for the above identification method in a reflective environment will detect lamp reflections. The vehicle's lamps will also be paired, which will reduce the vehicle's accuracy of identification. We used a modified VM to detect obstacles, allowing for eliminating obstacles in the reflection, thereby increasing obstacle detection accuracy.

3. Methodology of Improved VIDAR’s Pseudo-Obstacle Detection

The monocular visual identification method, based on machine learning, is limited to identifying previously identified obstacles. A vehicle collision accident may occur when unknown obstacles are present on the road. When VIDAR is used to detect obstacles, pseudo-obstacles in the reflection environment are mistaken for real obstacles. Thus, to increase the speed and accuracy of obstacle detection, we use an improved VM.

3.1. Transformation from World Coordinates to Pixel Coordinates

The camera can project objects in the three-dimensional world into a two-dimensional image by capturing an image. In reality, the imaging model establishes a projection mapping relationship between three-dimensional and two-dimensional space. The coordinate transformation is required to convert the world coordinate system's coordinates to the camera coordinate system's coordinates. A rigid body transformation is used to convert the world coordinate system to the camera coordinate system. It is determined by the camera's external parameters. The camera coordinate system to pixel coordinate system transformation converts three-dimensional coordinates to two-dimensional plane coordinates, as determined by the camera's internal parameters. Although both the pixel and image coordinate systems are located on the imaging plane, their origins and units of measurement are distinct. The origin of the image coordinate system is the point at which the camera's optical axis intersects the imaging plane, which is typically the imaging plane's midpoint.

Suppose the internal parameters matrix is M. Project in the physical world to the image plane . By adding a dimension to , which is the expansion of , obtain . Point q is in the form of homogeneous coordinates. Combine the rotation matrix R and the offset matrix T to obtain the external parameter matrix K, where and are the offsets. The coordinate transformation is shown in Figure 3.

The internal parameters of the camera are obtained by Zhang Zhengyou demarcate to determine the transformation relationship between world coordinates and pixel coordinates.

Among them: , , , and , .

3.2. Obstacle Ranging Model

The range of obstacles is as follows (Figure 4). Let be the focal length of the camera; be the installation height of the camera; be the pixel size; be the camera pitch angle; be the intersection of the image plane and the optical axis of the camera; set to ; be the coordinates of intersection points of obstacles and pavement plane set to P; and the horizontal distance d between the object point and the camera is

Assume that is the Y-axis in the previous image, is the Y-axis in the previous image. When the camera moves from to on the axis of the imaging plane (Figure 5), let A be an imaging point for obstacle’s top in the subsequent image, B be the same imaging point for obstacle’s top in the latter image, is the object point of A, and is the object point of B. is the horizontal distance between and the camera; similarly, is the horizontal distance between and the camera. and can be obtained from Equation (3). The camera moved a certain distance during the time between the previous and subsequent images; , but actually. As a result, the and have height if , and static obstacles can be identified by if is known.

Additionally, if the obstacle is moving (as illustrated in Figure 6), the can also be used as an obstacle judgment. The verification process has been shown in the paper [6].

3.3. Static Obstacle Identification Model

There are two types of static obstacles: real static obstacles and static pseudo-obstacles. Static real obstacles refer to actual road obstacles. The reflections identified as real obstacles during the obstacle identification process are called static pseudo-obstacles. It is a type of pseudo-obstacle that reflects some road obstacles but does not affect the vehicle's driving safety. To improve the accuracy of obstacle identification, we must identify and remove static pseudo-obstacles.

3.3.1. Static Real Obstacle Identification

First, we used the VIDAR to detect stereo obstacles and determine which object point on the obstacle is the furthest away in the horizontal and vertical directions to construct a rectangle (the obstacle rectangle, Figure 7). Let A be the first imaging point for the identification of the width of the rectangular road surface of the obstacle, B be the other imaging point for the identification of the width of the rectangular road surface of the obstacle, and be the object point of A; similarly, is the object point of B. The horizontal distances between and the camera can be calculated by Equation (2). Similarly, the horizontal distances between and the camera can also be calculated. The width of the obstacle rectangle can be calculated using the pinhole imaging principle and the geometrical relationship between cameras.

When the camera moves from to on the axis of the imaging plane (Figure 8), let C be an imaging point for identifying the width of the opposite side of the obstacle, D be another imaging point for the identification of the width of the opposite side of the obstacle, be the object point of C, and be the object point of D. (4) and (5) have the same width. The height of the pseudo-obstacle rectangle is calculated.

3.3.2. Static Pseudo-Obstacle Identification

The procedure for identifying pseudo-obstacles is similar to the procedure for identifying real obstacles. However, when obstacles are detected using VIDAR, the object points of the obstacles are different from their actual positions (Figure 9). We detect pseudo-obstacles using VIDAR and construct a rectangle (pseudo-obstacle rectangle) from the object points on the pseudo-obstacle (the object points are the farthest in the horizontal and vertical directions). Let A be the first imaging point for pseudo-obstacle width identification with , B be the other imaging point for pseudo-obstacle width identification with , be the object point of point A, and be the object point of point B. The horizontal distances between and the camera can be calculated using (2). Similarly, the horizontal distances between and the camera can also be calculated using (2), and the width of the pseudo-obstacle rectangle can be calculated by the pinhole imaging principle and the geometrical relationship between cameras.

When the camera moves from to on the axis of the imaging plane (Figure 10), after the pseudo-obstacle moved, let C be an imaging point of the rectangular width of the identified pseudo-obstacle and D be another imaging point of the rectangular width of the pseudo-obstacle, and be the object point of C and be the object point of D. At this point, the width of the object point of the pseudo-obstacle changes from to . Similarly, can be solved using the pinhole imaging principle and the geometrical relationship between cameras. The rectangular height of the pseudo-obstacle can be obtained by (5) and (6) and the triangle similarity principle.

3.4. Static Obstacle Identification Model

Moving obstacles are classified as either moving real obstacles or moving pseudo-obstacles. Moving real obstacles refers to obstacles on the road. The reflections identified as real obstacles during the obstacle identification process are referred to as moving pseudo-obstacles. It is a type of pseudo-obstacle that replicates some road obstacles but does not destroy the vehicle's driving safety. We must identify and remove moving pseudo-obstacles to improve accuracy when identifying obstacles.

3.4.1. Static Pseudo-Obstacle Identification

The steps for identifying real moving obstacles are identical to those for real static obstacles (Figure 11). VIDAR is used to detect stereo obstacles, construct obstacle rectangles, and calculate their width. After the ego-vehicle and obstacle have been moved, the width of the obstacle rectangle is recalculated and then the obstacle height is solved using the triangle similarity principle.

3.4.2. Moving Pseudo-Obstacle Identification

The steps for identifying moving pseudo-obstacles are identical to those for static pseudo-obstacles (Figure 12): detecting stereo obstacles with VIDAR, determining the pseudo-obstacle rectangle, and calculating the pseudo-obstacle rectangle's width. Following the ego-vehicle and pseudo-obstacle movement, the height of the pseudo-obstacle is calculated using the width of the pseudo-obstacle's imaging point.

3.5. Removal Model of Pseudo-Obstacles

The ego-vehicle movement assesses the obstacle's authenticity. Using (3) and (5), the widths of obstacles and pseudo-obstacles when the vehicle moves for the first time are calculated. When the vehicle resumes motion, the width and height of the obstacle and pseudo-obstacle are also calculated using (4) and (6), and the heights of the obstacle and pseudo-obstacle are determined by their widths. Compared with the calculated results for obstacles, detected obstacles are those with a similar height.

The process of obstacle identification is as follows:(1)Confirm stereo unknown obstacles. Machine learning is used to identify known obstacles, obtain images after removing the known obstacles, and then screen out stereo unknown obstacles using VIDAR's obstacle detection principle.(2)Construct an obstacle rectangle. To construct a rectangle, locate the object points on the obstacle that are the furthest apart in the horizontal and vertical directions (Figure 13 and Figure 14).(3)Calculate the horizontal distance. Determine the horizontal distance from the object point to the camera according to VIDAR.(4)Identify obstacles. First, calculate the width W of the obstacle rectangle. Second, determine the relationship between the height and width W through the triangle similarity principle.(5)Calculate the rectangular height of real and pseudo-obstacles by using the same width when vehicles and obstacles move. Calculate the height value twice, compare the two height values, and determine the identified obstacle.

The overall flow of obstacle identification is shown in Figure 15.

4. Obstacle Identification Experiment and Effect Analysis

We analyze the identification effect of the VM and improved VM in two environments. On the movable platform, the experimental equipment, including the camera unit and IMU, is installed (Figure 16(a)). A scale model of the vehicle is used to simulate a specific obstacle. To simulate the unknown obstacle, a beverage bottle cap is used (Figure 16(b)). The polished paper is used to create a reflection of the road (Figure 16(c)). The camera's video captured at a frame rate of 20 fps is utilized to generate an image sequence, and then obstacle detection on the generated image sequence is performed.

4.1. Improved VIDAR and Improved VM Simulation Experiments

A beverage bottle cap is used as an unknown obstacle, and the angular acceleration and acceleration of the ego-vehicle are obtained from the IMU installed in the ego-vehicle. The quaternion method is used to solve the camera attitude and update the camera pitch angle. The image is processed by a fast image region matching method based on MSER. Acceleration is used to calculate the horizontal distance between the vehicle and the obstacles. The height of the obstacle rectangle by keeping the actual width constant during the vehicles and obstacles movement is calculated, the authenticity of the identified obstacles by keeping the height unchanged is confirmed, and the real obstacles are marked.

The previous image and latter image are used to judge whether the height of the obstacle rectangle has changed (Figure 17).

In the VM and improved VM comparison experiments, the faster RCNN is used to identify known obstacles and to identify known obstacles as background, while VIDAR and improved VIDAR are used to perform secondary detection on the background-removed image to identify unknown obstacles (Figure 18, Figure 19, and Figure 20).

The detection of unknown obstacles in this paper is shown in Figure 19 and Figure 20. While VIDAR in the VM is capable of identifying the bottle cap, the cap's reflection is also detected as an obstacle, resulting in low obstacle identification accuracy. When the improved VIDAR is used to detect unknown obstacles, the obstacles in the reflection without height can be eliminated, compensating for the unknown obstacles being misdetected in the reflective environment. As a result, the improved VM detects obstacles more precisely than the baseline VM.

4.2. Analysis of the Identification Result of Improved VM and Improved VIDAR

In the experimental test, a pure electric vehicle is used as the test vehicle (Figure 21). A MV-VDF300SC industrial digital camera is used as a monocular vision sensor. This model camera adopts the USB 2.0 standard interface and has a high resolution, precision, and clarity. The camera's performance parameters are listed in Table 1. The camera is installed at the height of 1.60 m and collects real-time environmental data (we only used the left camera). The HEC295 IMU is mounted on the bottom of the test vehicle and is used to locate and read the vehicle's motion status in real time. GPS is used to determine a precise location. Digital maps are utilized to obtain precise road data, such as distance and slope. The computing unit is used to perform real-time data processing. In the process of calculation, multisensor data processing is the combination and processing of multisource information, which is rather complicated. Fuzzy logic can deal with complex systems [29]. It can coordinate and combine the acquired information to improve the efficiency of the system and effectively deal with the knowledge acquired in the scene.

Accurate calibration of camera parameters was a prerequisite for the whole experiment and is a very important task for obstacle detection methods. In this paper, Zhang Zhengyou's camera calibration method was adopted to calibrate the DaYing camera. First, the camera was fixed to capture images of a checkerboard at different positions and angles. Then, key points of the checkerboard were selected and used to establish a relationship equation. Finally, the internal parameter calibration was realized. The camera calibration process and result are shown in Figure 22.

Camera distortion includes radial distortion, thin lens distortion, and centrifugal distortion. The superposition of the three kinds of distortion results in a nonlinear distortion, the model of which can be expressed in the image coordinate system as follows:where and are the centrifugal distortion coefficients; and are the radial distortion coefficients, and and are the distortion coefficients of thin lenses.

Because the centrifugal distortion of the camera is not considered in this paper, the internal reference matrix of the camera can be expressed as shown in

The calibration of the camera’s external parameters can be calculated by taking the edge object points of lane lines. The calibration results are shown in Table 2.

Due to a lack of reflection road images in the public data set and the fact that different camera parameters would affect range accuracy, we created a VIDAR-Reflection Road database (Figure 23) with a total of 2000 images. The MV-VDF300SC camera unit was used to record the experiment in its natural environment. The test roads were Xuezhai Road and Jiefang East Road in Jinan, Shandong Province, and traffic environment images were collected during rainy days from 10 : 00 to 11 : 00 and 19 : 00 to 20 : 00.

Figure 24 depicts the identification result for the two images. The VM and improved VM accuracy are compared by counting the number of TP, FP, TN, and FN obstacles in each image frame. Let a be an obstacle that is correctly identified as a positive example; b be an obstacle that is incorrectly identified as a positive example; c be an obstacle that is correctly identified as a negative example; and d be an obstacle that is incorrectly identified as a negative example. Then, , , , and . The comparison of the detection effects of VM and improved VM in the reflection environment is shown in Table 3.

In the results’ analysis, accuracy (A), recall (R), and precision (P) were used as evaluation indices for the two obstacle detection methods, calculated through

The accuracy, recall, and precision of the method proposed in this paper are shown in Table 4.

As demonstrated by the experimental results in Table 4, the accuracy of obstacle identification is increased when using the improved VM for obstacle identification in a reflective environment. Due to the weather and other factors, there are times when misidentification and missed identification occur during the experiment. However, the improved method proposed in this paper improves obstacle identification accuracy.

Additionally, we compared our method's detection accuracy to other commonly used target detection methods. Table 5 summarizes the detection results. It is obvious that the proposed obstacle detection method outperforms state-of-the-art methods in terms of accuracy.

The term “real time” refers to the processing of each image frame collected over time. In terms of detection speed, 2000 images were processed using improved VIDAR, VM, VIDAR, VM, and YOLO v5. Table 6 summarizes the average detection times for the five identification methods.

As shown in Table 6, an improved VM takes longer to determine the authenticity of obstacles than a VM. Similarly, improved VIDAR requires more time to determine the authenticity of obstacles when compared with VIDAR. Due to the fewer feature points, improved VM detects faster than improved VIDAR, and less time is required. As a result, using an improved VM for obstacle detection takes not only advantage of machine learning's speed but also improves identification accuracy.

5. Conclusion

This paper first proposes an improved VIDAR method based on VIDAR and then combines machine learning to propose an improved method for VM obstacle identification. On the basis of machine learning to detect known obstacles, VIDAR is used to determine whether there is an obstacle with height by calculating the position of road imaging points, the obstacle rectangle is determined for nonroad obstacles, and then the obstacle height (including real obstacles and pseudo-obstacles) is calculated by using the obstacle imaging points of two frames before and after the vehicle moves. By calculating the height after moving again (including real obstacles and pseudo-obstacles), the two heights are compared to determine the authenticity of the obstacle, so as to realize the obstacle detection. This paper aims to show the effect of obstacle detection using improved VM in the environment with reflection. The experimental results indicate that when compared with VM, the improved VM method for obstacle detection is more accurate in a reflective environment. Because the method proposed in this paper needs a lot of calculations, improving the efficiency of the proposed method will be the next research direction. In addition, obstacle detection is a prerequisite for obstacle avoidance, and an improved obstacle avoidance method is also a future research direction.

Data Availability

Data are available on request to the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 51905320, the China Postdoctoral Science Foundation under Grants 2018M632696 and 2018M642684, the Shandong Key R and D Plan Project under Grant 2019GGX104066, and SDUT and Zibo City Integration Development Project under Grant 2017ZBXC133.