Abstract

To improve the safety capabilities of expressway service stations, this study proposes a method for detecting dangerous goods vehicles based on surveillance videos. The information collection devices used in this method are the surveillance cameras that already exist in service stations, which allows for the automatic detection and position recognition of dangerous goods vehicles without changing the installation of the monitoring equipment. The process of this method is as follows. First, we draw an aerial view image of the service station to use as the background model. Then, we use inverse perspective mapping to process each surveillance video and stitch these videos with the background model to build an aerial view surveillance model of the service station. Next, we use a convolutional neural network to detect dangerous goods vehicles from the original images. Finally, we mark the detection result in the aerial view surveillance model and then use that model to monitor the service station in real time. Experiments show that our aerial view surveillance model can achieve the real-time detection of dangerous goods vehicles in the main areas of the service station, thereby effectively reducing the workload of the monitoring personnel.

1. Introduction

An expressway service station is an area that is established along the expressway as a place for passengers to rest and refuel. Service stations usually include a parking area, service area, and gas station [1]. Expressway service stations were initially built for public service, but with the development of national expressways, the increase in the traffic flow has required service stations to improve their management and service capabilities, provide better services, and ensure the safety of both patrons and personnel.

The service station acts as a logistics node on the expressway by providing services for freight vehicles. As a result, many dangerous goods (DG) vehicles enter and leave the service station. According to “GB 6944-2012 Classification and Code of Dangerous Goods, a standard proposed by the Chinese Ministry of Transport [2], “dangerous goods” refer to chemical products with hazardous properties, and the term “DG vehicles” refers to specific vehicles which transport those dangerous goods by road. When a DG vehicle is involved in an accident at a service station, secondary accidents, such as toxic gas leaks and chemical explosions, can cause significant damage to surrounding facilities and personnel. Therefore, improving the risk control capabilities regarding DG vehicles can significantly improve the safety of service stations.

Service stations need to guide vehicles into the station and monitor the situation in the station and take some response when an accident occurs, or determine the posed risk of an accident [3]. Therefore, surveillance video systems are typically installed in service stations to monitor the situation in the station [4]. Usually, multiple surveillance videos are playing on one screen for surveillance personnel to observe. However, surveillance videos are acquired by cameras set up in different locations within the service station, and there is no continuity between each video, which reduces the efficiency and effectiveness of monitoring management. Because surveillance personnel need to watch the surveillance videos from different perspectives simultaneously, tracking relevant targets within them can fatigue the personnel; moreover, this can increase the likelihood of them losing track of the target when the target passes between the frames of two surveillance cameras. To improve the efficiency of safety management in the service station, a new method is required to reduce the workload of surveillance personnel.

Computer vision technology is often used to reduce the workload in the field of traffic monitoring [5, 6] and the station monitoring [7]. This technology could recognize and locate some targets in the image. Due to the high recognition success rate and high positioning accuracy, it has many applications in traffic monitoring [8] and industrial measurement [9].

Compared with the vehicular sensor networks [10] and in situ technologies [11], computer vision is more responsive, and it is easy to install and operate [12]. However, computer vision has its limitations in traffic monitoring. The sight of cameras usually on the diagonal to the road and the installation pose between each camera are not the same, so it is difficult to use a unified method to positioning targets in different surveillance videos [13]. There is also a big difference between the surveillance area of service station and traffic monitoring. The service area need to cover a large area; then, it needs more cameras to minimize the blind area. But the surveillance area of traffic monitoring usually is road, which is narrow and long, and it does not need too much cameras. The tasks of the two are also different. On the one hand, the service area needs to guide and adjust the position of DG vehicles and make sure its parking position meets safety requirements; then, it has the requirement of positioning. On the other hand, traffic monitoring only needs to record accident of something else, and it is not positioning requirements (a road segment is sufficient). According to this, there needs a method to combine the information of different cameras, to realize the positioning function of the vehicle in the service station.

With the development of image processing technology, image stitching technology has been widely used in video surveillance applications [14]. Image stitching is a method that is used to stitch multiple images from different perspectives into a single image [15]. By stitching the feeds together from several surveillance cameras, this technique can significantly reduce the workload of surveillance personnel. However, traditional image stitching algorithms are based on matching feature points between images, such as SIFT [16], SURF [17], and ORB [18]. These methods are suitable for some images with large overlapping areas but not for the surveillance video systems typically found in service stations. This is because the traffic surveillance video equipment is more concerned with maximizing the sight area using a limited number of cameras, and therefore, the installation positions of the cameras are more scattered. Consequently, there is a lack of overlapping areas between the different video images. These reasons make it difficult to use traditional image stitching algorithms to stitch service station surveillance videos.

The basic principle of camera imaging is the process of mapping image information using the world coordinate system into the image coordinate system after perspective transformation, which can cause some deformation in the image. This process is called the perspective mapping (PM) of the image. A typical example of PM is that if you take two hypothetical objects of the same size, the one closer to the camera will appear larger than the other in the image.

The most common method for eliminating the perspective effect of an image is inverse perspective mapping (IPM) [19], which is the inverse process of PM. By calculating the transformation matrix of the camera, the IPM maps the information from the image coordinate system onto the world coordinate system. In the field of road traffic, IPM is often used to obtain an aerial view image of the surveillance area from surveillance videos, such as the lane detection method proposed by Oliveira et al. [20] and the urban road marking detection method proposed by Li et al. [21]. In addition, because the aerial view image retains the target’s horizontal position information, it is often used for positioning and tracking targets, such as vehicle trajectory extraction methods based on aerial videos [22] and the traffic monitoring based on UAV [23]. Because the installation of surveillance cameras in service stations is similar to their installation in traffic areas, it may be feasible to use IPM to process surveillance videos in a service station.

Additionally, the development of convolutional neural network (CNN) technology has greatly improved the capabilities of vision-based target detection technology [24]. Using the deep-level image information extracted by the convolutional network, CNN detection can quickly extract and classify complex features in an image and then use those features to recognize specific targets. CNN detection also has many applications in traffic detection, such as the vehicle detection method [25, 26]. Because there is a large amount of image information contained in surveillance videos, using CNN detection to detect vehicles in these videos is a way to improve surveillance management efficiency.

In summary, this study proposes a surveillance video processing algorithm based on IPM and CNN detection. This method is best used in the expressway service station, and it is ideal for reducing the workload of surveillance personnel, thereby improving the efficiency of safety management in the service station. The system functions by stitching surveillance videos and automating the detection of dangerous goods vehicles in the service station. The organization of this paper is as follows. Section 2 describes the method of establishing the positioning model based on the surveillance video as well as the detection method for dangerous goods vehicles. Section 3 presents the details of the experiment and the results, and Section 4 presents the conclusion.

2. Materials and Methods

2.1. Surveillance Cameras in the Service Station

The method proposed in this paper uses existing surveillance cameras to capture the image information of the relevant areas. A representation of the common installation of surveillance cameras in a service station is shown in Figure 1, wherein there are several closed-circuit television (CCTV) towers installed in the service station, each of which is equipped with surveillance cameras with different poses, as shown in Figure 2. To ensure that the sight of these cameras can completely capture the main area of the service station, each area is usually captured by several cameras, as shown in Figure 3. There will inevitably be some overlapping areas in the field of view between the cameras, as these overlapping areas provide a prerequisite for ensuring the continuity of the surveillance videos.

2.2. IPM Transformation of Surveillance Video

IPM is applied to remove the perspective effect from a surveillance video and to remap its image into an aerial view image. We used the IPM method based on the homography matrix [27], which uses four corresponding points in each image to obtain the transformation matrix for IPM, and it has good operability and does not require the measurement of the pose parameters of the camera.

Consider that there is a point in the image and is the corresponding point in the world coordinate system (supposing that all points are on the same plane in the world coordinate system). Then, the transformation of these points is calculated, as shown in equations (1), (2), and (3). The 3 × 3 matrix H represents the homography matrix, and this transformation is called homography:

The calculation of the homography matrix is shown in equation (4), where is the point i in the image coordinate system and are the corresponding points of point i in the world coordinate system. The homography matrix H can be calculated by calculating in the equation:

The transformed image is shown in Figure 4. Because the transformed image has marked image distortions and only a part of the image of a service station (such as the road area and the parking area) needs to be monitored, we cropped the transformed image, leaving only the necessary target areas.

2.3. Aerial View Image Stitching of Target Areas

To generate a service station surveillance model, it is necessary to establish a standard reference system based on the structure of the service station. Because the transformed image has a corresponding area in the service station, splicing these images into a background image that includes structural information of the service station (such as an aerial view image or a structure map) can provide a positioning reference for the detection targets in the image. We used an unmanned aerial vehicle (UAV) to take an aerial view image of the entire service station for use as the background image, which provided a position reference for the surveillance model.

After the IPM transformation of the target areas in each surveillance video, we stitched the aerial view image of the target areas in the background image, as shown in Figure 5. The different colored areas in Figure 5 represent the transformed images taken by different surveillance cameras, and the background model is the service area obtained by aerial photography. Because there is generally no uniform standard for the installation poses of the cameras, it is necessary to manually adjust the target area size during the stitching process to ensure that it coincides with the corresponding area in the background model.

3. Dangerous Goods Vehicle Detection Based on SSD

According to the GB 6944-2012 Classification and Code of Dangerous Goods,” dangerous goods are generally defined as chemical products with hazardous properties. Such products should use tank containers for road transportation [28]. Therefore, our proposed method detects DG vehicles in the image by detecting the tank containers, as shown in Figure 6.

The Single Shot MultiBox Detector (SSD) [29] is a one-stage detection CNN model that uses anchor boxes of different scales and ratios to evenly sample the image and then uses the CNN layers to extract the features for classification and regression. This design improves the calculation speed of SSD compared to traditional two-stage methods, such as Fast R-CNN [30]. The loss function of the SSD is combined with the confidence loss and localization loss, as shown in the following equation:

To match different objects, SSD sets multiple anchor boxes in each feature layer, as shown in equation (6), where and represent the maximum scale and minimum scale ratios of the anchor boxes to the image, respectively, and m is the number of feature layers:

Furthermore, the SSD matches features by setting the aspect ratios. For an anchor box with a specific aspect ratio under the scale , the width and height can be calculated using the following equations:

Considering that the detected images are captured by different cameras, these cameras are installed in different positions and poses, and the features in these images have different sizes and ratios; the variable anchor design using SSD can effectively match such features. We set different anchor aspect ratios for the images captured from different cameras. For the image sequence , which is captured by camera , we selected several aspect ratios to form the ratio sequence to sample in different image sequences, as shown in the following equation:

The detection process for the DG vehicle is shown in Figure 7. Because the image after IPM transformation will have serious image distortion, which reduces the success rate of DG vehicle detection, we designed the vehicle detection process to occur before the IPM transformation process. However, because the image after the IPM transformation is cropped, the detection areas only need to include the cropped target areas, which correspond to approximately 30–60% of the original image. Therefore, for each camera , the detection area is divided to reduce the calculation time. The division of the is shown in Figure 8, wherein the size of the should be reduced as much as possible, and it should include the entire surveillance area.

It is necessary to note that the coordinates of the SSD detection result are also required for the IPM transformation to conform to the coordinate system of the transformed images. Similar to the IPM homography in the image, assuming that the center point of the detection result is , the calculation of the transformed point is shown in the following equation:

4. Experiment and Discussion

4.1. Surveillance Model

To verify the feasibility of the surveillance model establishment method, we used the surveillance data from an expressway service area to build a surveillance model based on Unity, as shown in Figure 9.

The surveillance model is composed of several surveillance video components and a background model. Each surveillance video component is an aerial view image transformed from the surveillance video, and the background model is captured by aerial photography. The different color points in the image in Figure 9 are the CCTV towers, and the color of each box represents the area in which the surveillance video is captured by the CCTV tower of same color. Figure 10 shows the DG vehicle detection results of the surveillance model.

According to equations (10) and (11), we calculated the overlapping area of the parking area covered by the surveillance model, and the overlapping area of the parking and road areas covered by the surveillance model (the area of each part was divided by the traffic index line). The result is that reached 93% and reached 71%:

In this case, we combined 6 surveillance videos into 1 surveillance model. It means that the parking area, which required 6 screens to display, now only needs 1 screen, and the position of the vehicles can be observed more intuitively in the surveillance model. Even if the recognition function is not added, this greatly improves the efficiency of surveillance operations.

However, due to the distortion of images in the monitoring model, the positioning cannot achieve high positioning accuracy. But depending on the traffic index line of the parking area, it is possible to locate the approximate area of vehicles.

4.2. DG Vehicle Detection

Training the SSD required 2,000 images as training samples, which were captured from the surveillance cameras mentioned in Section 4.1. These images contained views of the tank container trucks taken from different angles.

For the test process, we used the equipment configured, as shown in Table 1, to test the performance of SSD detection.

For the evaluation of SSD detection, we used 200 images captured during the daytime and 200 images captured at night. The original resolution of these images was 2560 × 1440, but because the images were cropped, the test images had several smaller resolutions. We used intersection over union (IOU) as the criterion for detection success, and its calculation is shown in equation (12). Considering that the positioning operation in this method does not require high positioning precision, when the IOU value exceeded 50%, we determined the detection as successful. AP is the main parameter of the object detection algorithm. Its calculation is shown in equation (13). The evaluation results are presented in Table 2:

Compared with the detection results for daytime samples, the detection result for nighttime samples was greatly reduced. The main reason is the image distortion caused by insufficient light.

4.3. Discussion

The experimental results of the surveillance model demonstrate that, with a reasonable selection of the target areas of each surveillance video, the stitched surveillance model can include the main function areas (e.g., the road and parking areas) of the service station. In addition, multiple surveillance videos were successfully spliced into one video, which greatly reduced the workload of surveillance personnel. However, a significant problem was observed in the surveillance model. When a vehicle passes between two parts of the target area, its appearance is severely deformed, as shown in Figure 11, or the vehicle temporarily disappears from the frame. This is caused by the differences in installation positions and shooting angles between cameras, and it is not an overwhelming concern because the image of the vehicle returns to normal when it completely enters the viewing range of another camera.

The SSD detection experiment showed that SSD detection can effectively detect DG vehicles in the video, in most cases. However, because of limitations in the camera installation positions and poses, it was difficult to detect occluded targets at specific locations. Figure 12 shows an example in which there are four DG vehicles in the image, but only the one marked by the green frame is detected, because it is not occluded, and the undetected vehicles are indicated by the red frame. In this case, the camera captured the image of the parking area, and when several vehicles are parked in the parking area, some vehicles were occluded by other vehicles. Those occluded vehicles will be difficult to be detected by the SSD detector, as shown in the left of Figure 12. Additionally, the head of a truck can occlude the features of a tank container, as shown in the right of Figure 12.

5. Conclusion

This paper proposes a method based on image stitching and SSD detection for the automated detection of DG vehicles in an expressway service station. The detection data are based on existing surveillance camera data from a service station. Our experiments demonstrated that using surveillance cameras installed in a typical manner, the stitched surveillance model can cover the main traffic areas in the service station, thereby greatly reducing the workload of surveillance personnel. SSD detection also reduced the difficulty of DG vehicle recognition and positioning and improved the efficiency of monitoring work.

The experiment also presented some problems with this method, and these problems need to be solved in future research. Because of limitations in the locations and poses of the surveillance cameras, this method cannot produce a complete real-time aerial view surveillance model of the service station. Instead, it can only accommodate the monitoring function for a part of the service station. Additionally, DG vehicles are typically rather large; it is therefore easy for them to occlude other DG vehicles, which causes failure in detection. However, because vehicles are not occluded for the entirety of the time when they are in the service station, in future work, we plan to add a target tracking algorithm to link a vehicle’s information between different frames to predict the position of the occluded DG vehicles.

Data Availability

The data is not open-sourced as the surveillance videos are not allowed to distribute freely.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The work was partially supported by the Shanghai Municipal Science and Technology Major Project (2021SHZDZX0100) and the Fundamental Research Funds for the Central Universities.