Abstract

The number of cars on roadways around the world continues to increase year over year. However, the imbalance between traffic supply and demand has not only brought traffic congestion but also caused serious safety problems. To reduce travel risk, this study proposes a driver route planning method based on accident risk cost prediction for connected and automated vehicles. According to the entropy weight method and an improved algorithm of K shortest paths, a route planning model with accident risk as the main optimization objective was established. Firstly, an accident risk evaluation system was built based on traffic accident data, and a quantitative prediction model of accident risk cost based on driver-, vehicle-, road-, and environment-related factors was constructed. Secondly, the entropy weight method was used to calculate the weights of each indicator to determine accident risk considering the aforementioned factors. Then, the route planning model was established, and the solution algorithm based on K shortest paths was designed to solve the optimal route by comprehensively considering accident risk cost and travel time. The accident risk index of each road section in the example road network was assigned, and the risk of the road section was quantified according to the accident risk cost model. Three candidate paths were calculated by using the path planning algorithm proposed in this study; the total risk cost is 6.19, 6.26, and 6.39, respectively; and the total travel time is 29, 29, and 31, respectively. After comparison, the optimal path and two alternative paths are obtained. The results show that the accident risk cost prediction model based on historical accident data can be used to quantify driving risk. The proposed method can help drivers in the connected and automated environment choose the optimal travel route with the lowest risk and shortest travel time and improve overall traffic safety and efficiency.

1. Introduction

With the rapid development of the global road transportation system, the imbalance between traffic supply and demand has become pronounced, which has brought severe traffic congestion and accidents and has seriously threatened people’s lives and property. For example, in China, the number of traffic accidents increased year over year from 2015 to 2020, and the number of casualties was also huge. On average, about 60000 people died in traffic accidents each year [1]. Advanced driving assistance systems (ADAS) and connected and automated vehicles (CAV) can provide route planning and intelligent guidance for drivers. They are two of the most important means to improve the safety and efficiency of the traffic network, alleviate urban traffic congestion, and mitigate travel risk. However, conventional route planning takes the minimum distance or driving time as the optimization objective, and there is relatively little research based on the risk cost of road accidents. Therefore, this study quantified the accident risk and designed a route planning model based on the accident risk cost to provide drivers with a safer travel path, which can effectively reduce the risk of traffic accidents and improve overall safety.

In recent years, there have been many research achievements in traffic safety, which can be divided into two categories: (1) evaluations of traffic safety based on accident data using the Bayesian network and the accident rate method and (2) evaluations of accident risk carried out by the analytic hierarchy process, entropy weight method, and fuzzy evaluation method in accordance with the indicator evaluation system based on the characteristics of people, vehicles, roads, and the surrounding environment. In terms of studying traffic safety using accident data, Mbakwe et al. [2] established a model combining Delphi technology and the Bayesian network to predict the accident rate and evaluate national traffic safety. Chu [3] used the ordered logit model to analyze the causes of serious accidents involving buses traveling on expressways over a long driving time. The study showed that fatigued driving, drivers or passengers not wearing seat belts, drunk driving, and other behavioral factors have a significant impact on the severity of accidents. Mohan et al. [4] used the accident rate method to evaluate urban traffic safety based on fatal traffic accident data in six cities of India. The study found that the vast majority of deaths in traffic accidents fall in vulnerable traffic subjects (pedestrians, cyclists, electric vehicle drivers, and motorcycle users). Eusofe and Evdorides [5] and Gomes et al. [6] evaluated traffic safety from the traffic management perspective based on accident data. At the same time, some scholars have established models based on the crash data to quantify the risk of road accidents and evaluate road traffic safety. Xie and Yan [7] used kernel density function to fit the spatial distribution of road network of traffic accident risk in Kentucky and Burlington, New York, and estimated its traffic accident risk status. Anderson [8] and Bil et al. [9] identified the hotspots of road accidents by K-means clustering, significance test, and other methods according to the distribution of traffic risks on the road network. However, these studies ignore the impact of driver behavior on traffic safety risks, and several studies in the US report that approximately 90% of the light-vehicle crashes involved the same type of human error such as impaired conditions, inadvertent errors, and risky driving behavior [1013]. Therefore, the impact of driver behavior on road accident risk cannot be ignored. Jiang et al. [14] proposed a safe route mapping (SRM) model, which uses real historical collision data and driver simulation data obtained based on VISSIM simulation to score road safety and establishes safety risk heat maps for roads; drivers can use the road heat maps for situational awareness and trip planning. Arbabzadeh and Jafari [15] used the elastic net regularized multinomial logistic expression to establish a safety prediction model based on driver based data and quantified the traffic safety risk as the likelihood of adverse driving outcomes, to achieve real-time scoring of road safety risks.

In terms of establishing accident risk assessment indicators based on the characteristics of people, vehicles, roads, and environment, Li et al. [16] and Zhao et al. [17] determined the risk evaluation index system of road transportation of dangerous goods through literature research and expert consultation and used analytic hierarchy process (AHP) to evaluate the risk of transportation of dangerous goods. Fernandez et al. [18] studied 535 drivers in Manila to score and rank the factors and indicators affecting accident occurrence (such as bad driving behaviors, cognition of traffic signs, and distracted driving) by using a questionnaire and an AHP to determine the weight of each index. The results showed that bad driving behavior is the main cause of accidents. Temrungsie et al. [19] established a road safety evaluation index system according to the United Nations’ white paper on road safety. They interviewed 100 experts engaged in traffic-related industries in Thailand, scored the indexes, and analyzed the influencing factors of road traffic safety using the AHP. They found that the traffic management in Thailand was chaotic, and the implementation of traffic regulations must be strengthened. Cai et al. [20] established a traffic safety risk prediction index system based on driving behavior data. They proposed a road traffic safety entropy calculation method based on the entropy weight method and scaled the road traffic safety risk level by K-means clustering. Guo et al. [21] collected data on driver eye movement and vehicle traveling state through driving simulations. They constructed a driver behavior index system, simplified the index by principal component analysis, and finally calculated the weights of the characteristic indexes by the entropy weight method to evaluate the impact of behaviors on traffic safety.

Most of the influencing factors of route planning in the existing research focus on minimizing the driving distance and travel time. These studies rarely consider the driver’s personal characteristics, travel purpose, or the road environment. Additionally, the priority for drivers to choose a travel path is not consistent, so the meaning of “optimal” is limited and subjective. The path recommended by a given model does not necessarily meet the expectations of drivers. Therefore, many scholars have incorporated factors affecting driver choice into the route planning model: Pang et al. [22] used the fuzzy neural network method to train the driver’s historical trip data to reflect the driver’s travel preference and provide guidance for the travel route selection of on-board navigation equipment. Lee et al. [23] estimated the delay caused by bad driving behaviors through discrete selection analysis. Then they compared the travel time of different paths to recommend the route with the most reliable travel time. As mentioned above, there are many factors affecting driver travel choice, and as such, path guidance considering traffic safety has attracted increasing attention. Karim and Sayed [24] established the shortest path model integrating travel time and safety by analyzing the relationship between traffic conflict and collision. Payyanadan et al. [25] used the collision accident data of elderly drivers to quantify accident risk influencing factors, such as left turn, U-turn, and travel distance. They evaluated the safety of the path on this basis to help elderly drivers choose a safer route and reduce their accident risk. Zhang et al. [26] established a prediction model of route travel time and accident risk cost according to different parameters, such as traffic volume and capacity, and designed the route planning algorithm for drivers with different risk tendencies from the perspective of generalized travel cost.

It can be observed from the above literature review that many scholars have engaged in significant research on traffic safety and route planning. However, there are still some limitations:(1)Research on traffic safety based on accident data used statistics from after the accident that has already occurred. While this can show the postaccident safety state, it lacks risk cost prediction to visualize the preaccident safety state and suggest safer routes to avoid accidents before they occur—which is obviously more critical for prevention and control.(2)Research on travel risk according to the index system based on the physical characteristics of people, vehicles, roads, and the surrounding environment mostly combined objective analysis and subjective evaluation to evaluate risk through surveys, expert scoring, and the AHP. However, surveys are highly subjective and greatly influenced by the risk preference and experience of scoring experts, and the evaluation is neither accurate nor objective due to the lack of data support.

Additionally, conventional route planning methods often take the minimum driving distance or travel time as the optimal goal, rather than driver preference and driving safety. Although there are route planning studies that do consider preference and safety, there are still few studies on constructing a route planning model based on the accident risk cost specifically. Therefore, based on accident data, this study constructed a risk evaluation system using accident characteristics, established an accident risk quantification model based on the entropy weight method, and designed a route planning algorithm that comprehensively considers accident risk cost and travel time to provide the safest and shortest route for drivers and improve overall safety.

2. Research Method

To realize route planning based on accident risk, it was first necessary to quantify the accident risk of the road section. This study used accident data and physical characteristic indexes of drivers, vehicles, roads, and the surrounding environment, all factors that could have influenced the severity of the accident in the data set, to construct the risk evaluation system. Then, the quantitative model of accident risk cost was established using the entropy weight method to calculate the index weights. Based on this, the real-time risks of road sections were calculated, the travel time of the road sections was loaded, and the example network was constructed. Then, the algorithm of restricted loopless K shortest paths was improved and applied to the model. Taking accident risk cost as the main goal and simultaneously considering travel time, the optimal path was found in the network. The overall workflow of this study is shown in Figure 1.

2.1. Accident Risk Quantification
2.1.1. Accident Risk Quantification Model

In the traffic system, the temporal and spatial changes of drivers, vehicles, roads, and the surrounding environment will affect the occurrence and severity of traffic accidents at any given time. There are many existing studies on traffic safety analysis through the construction of a risk index system, but there are many subjective assumptions about the selection of indicators and the determination of weights. Therefore, this study selected risk evaluation indicators from a traffic accident data set and calculated the index weights and the comprehensive scores according to the real-time data. It defined the comprehensive score as the accident risk, which can reflect the impact of various factors on the accident risk.where is the accident risk cost of the ith road section; is the actual data of the jth index corresponding to the ith road section; is the weight of the jth index.

2.1.2. Index Weight Calculation

Several different methods can calculate index weights, including the entropy weight method, AHP, and principal component analysis. Among them, the entropy weight method is an objective weighting method, which has higher reliability and accuracy than subjective weighting, and it can deeply reflect the distinguishing ability of indicators and determine better weights. According to the basic principles of information theory, information is a measure of the order degree of the system, and entropy is a measure of the disorder degree of the system. According to the definition of information entropy, entropy can be used to judge the dispersion degree of an index for a certain index. The smaller the information entropy, the greater the dispersion degree of the index, and the greater the impact of the index on the comprehensive evaluation (i.e., weight). This method is more suitable for describing the impact of abnormal values in drivers, vehicles, roads, environmental, and other indicators on the severity of accidents. For example, for several different traffic accidents, if the value of one index changes greatly, while the value of other indexes basically does not change, it indicates that the indicator has led to the difference of accidents, and a greater weight can be taken. Therefore, this study selected the entropy weight method to calculate the index weights. The specific calculation steps are as follows:(1)For n samples and m indexes, is the value of the jth index of the ith sample(2)Normalize indexes for the homogenization of heterogeneous indexesPositive index:Negative index:(3)Calculate the proportion of the ith sample value under the jth index:(4)Calculate the entropy of the jth index:where ,meeting .(5)Calculate information entropy redundancy (difference):(6)Calculate the weight of each index: is the standardized data.

It should be noted that the entropy weight method can calculate the index weight values, but there is a problem that the entropy value of a zero index cannot be calculated in the process of practical application. Therefore, when an index value was zero, a value of 0.00001 was added to the evaluation index data of this group; adding such a small increment not only enabled the data group to be valid, but it also ensured a small impact on the difference of each index [20].

2.2. Construction of Route Planning Model

The traditional route planning algorithm is to add the weights of each side in the network graph to find the shortest path. K shortest paths (KSP) problem is a deformation of the shortest path problem. Different from the traditional shortest path problem, the purpose of the KSP problem is to find multiple alternative optimization paths between the start point and the end point in the network graph and form the shortest path group to meet the user’s selection needs to the greatest extent [27]. Based on the improvement of the algorithm of K shortest path in existing research [28], this study designed a route planning algorithm meeting multiple objectives; that is, in the K shortest path set obtained by calculating the accident risk cost, the path with the shortest travel time T and lowest risk was chosen as the optimal path.

2.2.1. KSP Problem

Suppose represents a network graph, where is the set of nodes and is the set of edges. Each edge in is represented by a node pair; that is, , and is the length of this side. Suppose and are two nodes in graph . The path from to in the diagram is represented by the node sequence; that is, . and are respectively the start node and end node of . The length of is the total length of all sides on ; that is, .

The path set from to is represented by . The shortest path problem is to find the path with the smallest length from to . The KSP problem is a generalization of the shortest path problem. In addition to determining the shortest path, it also needs to determine the second shortest path and the third shortest path until the Kth short path is found. represents the Kth shortest path from to .

According to path constraints, KSP problems are usually divided into two types: a general KSP problem and a restricted loopless KSP problem. The general KSP problem has no restrictions on the path. The restricted loopless KSP problem requires that the obtained path is simple, and it cannot contain a loop. The network graph in this study did not contain loops, so only the restricted loopless KSP problem was analyzed.

2.2.2. Restricted Loopless KSP Algorithm

This study used the deviation path algorithm to solve the restricted loopless KSP problem. The core of the deviation path algorithm is how to find by using the shortest deviation path of the obtained . Firstly, the Dijkstra algorithm was used to find the shortest path from to and put it into the path set as . After calculating of the previous paths, the calculation process of was as follows:(1)Take each node except the end node in as the possible deviation point, and calculate the shortest path from to the node . To avoid repetition with the previously found path, the side separated from the node could not be the same as the side separated from on the previously found shortest path .(2)Splice the shortest path from the found to the node with the path from to on the current node to form a candidate path of and save it in the candidate path set .(3)Select the shortest path from the candidate path set as , and put it into the path set .

Repeat the above steps until K paths are obtained.

2.2.3. Multiobjective Route Planning Model

The above section describes the basic concept and solution algorithm of the KSP problem. In the next phase of the study, a multiobjective path planning model was designed based on the deviation path algorithm to obtain the optimal route. Comprehensively considering the accident risk cost and route travel time, the specific steps of the improved algorithm of KSP was as follows:(1)Initialize network to determine the start point and end point of path as well as K alternative paths.(2)Load the accident risk cost of each section and the travel time of each section calculated by (1) into the traffic network.(3)Aiming at the accident risk cost, obtain the shortest path from to by using the Dijkstra algorithm, which is recorded as .(4)If , then move to the final step. If , then all other nodes except for the end point on are regarded as deviation nodes, and there are totally pieces of .(5)Traverse all deviation points, find the shortest path from each deviation point to the end point , splice the path from the start point to with the shortest path from to the end point on , and save it in the set as , as a candidate path.(6)If the candidate path set is empty, go to the last step. If it is not empty, calculate the travel time of each candidate path and find the path with the shortest time; that is, . Remove the path from the set , and put it into the set and return to step 4.(7)From the K candidate paths, select the path with the shortest travel time as the optimal path to obtain the results.

3. Research Results

3.1. Data

To achieve objective weighting and analyze the correlations between indicators according to the data characteristics, this study used traffic accident data from the National Highway Traffic Safety Administration (NHTSA) from the year of 2019 [29]. NHTSA uses data from many sources, including the Fatality Analysis Reporting System (FARS) which began operation in 1975. FARS provides data about fatal crashes involving all types of vehicles; therefore, it was possible to obtain indicators of impact on traffic accidents from this data set. The basic data was saved in comma separated values (CSV) files. For the 2019 data collection year, there were 23 data files. This study selected the files that describe the states of people, vehicles, roads, and the surrounding environment at the time of the accident.

3.2. Data Preprocessing

After obtaining the data files, it was necessary to systematically screen the risk indicators in the data set according to the four elements of the transportation system: people, vehicles, roads, and environment. For the selection of indicators, firstly, this study defined the indicator factors contained in transportation risk through existing literature research, expert consultation, and analysis of accident investigation reports in recent years [3, 16, 17] and then selected the indicators in data files to delete the unnecessary indicators. For example, the relevant data files of the involved people describe the information of all persons involved in collisions, including the demographic information of the driver and passengers and the driving maneuvers before the accident. As most accidents are related to driver behaviors, this study only used data from drivers’ demographic information and behaviors, so the indicators related to passengers were ignored. After screening each file, 22 indicators, including time of accident, driver gender, road alignment, and weather, were obtained to construct the indicator set of accident cost impact factors (see Table 1). The data structure in the data file was complex, and the sample size was large. It was necessary to clean the data and delete abnormal data, including null values, unreported values, and reported-as-unknown values. The original data of indicators was saved in the data table in the form of numbers. The indicator meanings corresponding to different numbers can be obtained by checking the FARS Analytical User’s Manual [29] (see Table 1).

After determining the impact factors, the impact factors located in different data files needed to be integrated into one data file to facilitate the subsequent calculation by the entropy weight method. After deleting the abnormal data, different indicators included different data sample sizes. Therefore, this study used the accident number to connect the selected data files and merge them into a data table. The combined data table contained 26218 accident data points, and data on each accident contained 22 cost impact indicators. Through simple statistical analysis of the processed data (see Table 2), it was found that drivers involved in 3113 accidents were aged between 15 and 20, accounting for 11.9%. Younger drivers generally have a more aggressive driving style and usually ignore traffic regulations, which have a certain impact on accident risk. Additionally, there were 5673 accidents involving drivers aged elder than 65, accounting for 21.6%. Through comparison, it was observed that the proportion of traffic accidents among elderly drivers was higher than that among young drivers aged between 15 and 20. This is because elderly drivers generally response slowly, and it is easy to cause accidents if they do not respond to emergencies in time. Simple statistical analysis cannot accurately explain the impact of each indicator on accident risk. Therefore, in the next phase of the study, the entropy weight method was used to calculate the weight of each indicator to determine the impact of each indicator on accident risk.

3.3. Quantification of Accident Risk Cost

This study used python to realize the entropy weight method. First, in the merged data table, the number of columns—excluding the column communicating accident number—was defined as the number of indicators, and the number of rows—excluding the row communicating index names—was defined as the number of schemes, i.e., evaluation subjects. Then, the data were standardized, during which it was necessary to judge whether the indicator was a positive indicator or a negative indicator. A positive indicator is an indicator that is better when it gets larger; on the contrary, a negative indicator is an indicator that is worse when it gets larger. In this study, for the indicator A_DIST (involving a distracted driver), the value of 1 indicated that distracted driving was involved in the accident, and the value of 2 indicated that distracted driving was not involved. A higher value was better for the evaluation result. Therefore, this was a positive indicator. Similarly, each indicator was judged and the code was entered. Finally, the entropy method function was defined, and the weights of the variables were calculated. The index entropy and weights were obtained through calculation (see Table 3) to realize the objective weighting based on historical accident data. It was mentioned in Section 2.1.2 that the smaller the entropy of the index, the greater the weight. Table 3 shows that the indicator of VTRAFCON (traffic control device) was the smallest entropy and the largest weight. Therefore, whether there are traffic control measures at the location of the accident will have a great impact on accident risk. Finally, by substituting the index weights into (1), the calculation equation of the accident risk cost of vehicles driving in a certain section was determined.

3.4. Case Study of Route Planning

In the previous step, the index weights were calculated, and the calculation equation of road accident risk cost was determined. In the next step, the route planning example network was constructed to realize the route planning with accident risk cost as the optimal goal based on the improved K shortest path algorithm. The example road network constructed consisted of 10 nodes and 14 road sections. The specific road network is shown in Figure 2. By assuming the real-time driver, vehicle, road, environment, and other parameters of the road network, the corresponding attributes of each road section were given; that is, the actual values of 22 indicators included in each road section were determined. On this basis, the accident risk cost of each road section was calculated according to (1). At the same time, the travel time of each road section was assigned. The results are shown in Table 4.

Python was used to implement the route planning model algorithm. First, the Dijkstra algorithm and deviation path algorithm function were defined. Then, the network structure and the start and end points of the network were determined. The network parameters were input, K was set to 3 (i.e., three candidate paths were calculated), and the Dijkstra algorithm function was called to calculate the shortest path from to aiming at the accident risk cost, and the path risk cost was . Then, the deviation path algorithm was called to determine the second shortest path. The detailed calculation steps of are as Table 5.

Table 5 shows that there are five deviation points in total for finding . It was necessary to calculate each deviation point to ensure whether a new candidate path could be generated. After traversing each deviation point, four candidate paths were obtained. The accident risk cost of each path was 6.39, 6.58, 6.26, and 7.86. The travel time of each candidate path was calculated as 31, 31, 29, and 33, respectively. At this time, the path with the shortest travel time was selected as . Therefore, the output second path was .

Similarly, the third shortest path was calculated according to the improved deviation path algorithm. The model output of the third path was . At this time, k = 3 was met, three candidate paths were obtained (see Table 6), and the program stopped calculation. Referring to Table 6, the path with the shortest travel time was selected as the optimal path. The travel time of paths and was 29, but the risk cost of was lower, so was selected as the optimal path of the network (see Figure 3).

Most conventional path planning methods take the shortest travel time as the optimization goal. Therefore, this paper compares the proposed model with the model that only consider the shortest travel time. Using the Dijkstra algorithm to calculate the shortest path with travel time as the goal, the shortest path is and the travel time is 24. Compared with the route selection result of the model proposed in this study, it can be seen that considering the accident risk cost of the route does not have a great impact on the route selection. The travel time of path is 29; compared with path , it takes more time for drivers to travel. However, because the model takes minimizing travel risk as the primary goal and minimizing travel time as the second goal, choosing greatly improves the travel safety of drivers.

According to the above path selection results and comparison analysis, the path planning model proposed in this study takes the risk cost as the primary goal, comprehensively considers the travel time of the path, and realizes the multiconstraint path guidance. The optimal path minimizes the travel time on the basis of the lowest risk cost. It can not only ensure the travel safety but also the travel efficiency.

4. Conclusions and Future Research

This study established an accident risk evaluation system based on traffic accident data; constructed a quantitative model of accident risk cost based on drivers, vehicles, roads, and environmental factors; calculated the weight of each index by the entropy weight method; and determined the accident risk of a road section. Then, based on an improved K shortest path algorithm, a route planning model considering risk cost and travel time was designed, which can be applied to driving assistance systems to help drivers choose a better, safer, and faster travel path and improve overall traffic safety and efficiency. The conclusions of this study are as follows:(1)Based on historical accident data, this study constructed a risk index evaluation system, used the entropy weight method to calculate the index weights, and established the accident risk cost calculation model, which realizes the objective quantification of driving risks and overcomes the disadvantage of subjectivity in traditional risk evaluation methods.(2)Based on the improved K shortest path algorithm, a route planning model meeting multiple objectives was designed. The model considers the path risk cost and travel time simultaneously and selects the optimal travel path with the lowest risk and shortest time for drivers. The proposed method is theoretically and practically applicable for road transportation risk assessment and accident prevention.

The model proposed in this paper considers the impact of drivers, vehicles, roads, and the surrounding environment on traffic accidents. However, due to the lack of index data reflecting real-time traffic flow characteristics (such as traffic flow, traffic composition, etc.) in the data set, the impact of real-time traffic flow characteristics on accident risk is not calculated. Also, this study assumes only a small proportion of vehicles are guided by the route planning method so that the method changes neither the traffic flow states nor the driving risk. At the same time, the K shortest path algorithm used in this study outputs a large number of repeated paths, which is inefficient and can only be used in acyclic networks, so it is difficult to deal with path selection on complex networks. In future research, real-time traffic flow characteristics can be included in the index set. Combined with the accident risk quantification model proposed in this study, the real-time risk status of the road can be calculated to realize the real-time update of the travel path. At the same time, the future research can be applied in the actual road network and try to improve the path guidance algorithm to improve the efficiency and universality of the model.

Data Availability

The data that support the findings of this study are available in the FARS. These data were derived from the following resources available in the public domain: https://www.nhtsa.gov/file-downloads?p=nhtsa/downloads/FARS/2019/National/.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research work was jointly supported by Key Research and Development Program of Shandong Province (2020CXGC010117) and Transportation Technology Plan of Shandong (2021B60).