Abstract
Road traffic safety is a social issue of widespread concern. It is important for traffic managers to understand the distribution patterns of road traffic accidents. To this end, this study examines the spatial and temporal patterns of road traffic accidents from both accident frequency and accident severity perspectives. Road traffic accident data from 2016 to 2018 in Harbin, China, were used for the analysis. First, the spatial localization of accidents was completed using geocoding, and the localized accident data were classified by season. Then, density analysis was performed both with and without considering road network density. The results of the density analysis showed that when road network density was considered, accidents were mainly distributed in urban centers, while accidents were more dispersed when road network density was not considered. Third, a cluster analysis considering accident severity found that low-severity accident clusters occurred mostly in urban centers. High-severity accident clusters were mostly present in suburban areas. Finally, the results of these two methods are shown by using the comap technique. Areas of the city with a high frequency and severity of crashes in each season were identified. This study will help traffic management to have a more visual and intuitive understanding of the urban traffic safety situation and to take targeted measures to improve it accordingly.
1. Introduction
The casualties and property damage caused by road traffic accidents are serious. According to the World Health Organization [1], approximately 1.3 million deaths and injuries occur as a result of traffic accidents each year worldwide. Road traffic-related deaths are the 8th leading contributor to human deaths worldwide. In 2016, 63,093 people died in traffic accidents in China, 1.7 times more than in the United States. The death rate per 10,000 vehicles reached 2.72, 2.1 times that in the United States [2]. To decrease the number of traffic accidents, it is necessary to determine where and when accidents occur frequently. Previous studies have shown that accidents occur with certain spatial and temporal patterns. Areas with a high frequency and severity of crashes vary over time and occur in different areas within cities. Therefore, the frequency and severity of accidents are combined to understand the spatial and temporal distribution patterns of accidents. This information helps traffic managers take targeted preventive measures to reduce fatalities and injuries.
Most of the previous traffic accident data analyses used mathematical statistical methods [3–5]. These methods mainly use accident frequency to determine the location of accident hot spots [6]. Their operation and expression are relatively simple, but these methods have many limitations, such as the lack of visualization and inability to connect space and time. In contrast, spatial statistics methods can be used to visualize the spatial distribution characteristics of traffic accidents through GIS technology. Compared with traditional mathematical statistics, spatial statistics fully utilize the advantages of GIS in spatial data processing. On the one hand, the distribution of traffic accidents can be visualized through GIS visualization technology [7–9]. On the other hand, by using a variety of spatial analysis tools in GIS, scholars can explore the spatial distribution characteristics of traffic accidents and the spatial relationship between different traffic accidents from a variety of perspectives [10–12]. The most common spatial statistical methods in GIS are density analysis, which accomplishes spatial visualization of accidents through kernel density and point density methods [13–16], cluster analysis, which can identify the spatial distribution of traffic accidents as aggregation, diffusion, or random distributions by nearest neighbor distance, and Ripley’s K function method [17–19], which can identify traffic accident hotspot areas by hotspot analysis [20–22] and spatial autocorrelation analysis [23–25].
In general, existing studies have reported a series of results in the analysis of traffic accident data, but there are still some shortcomings. First, in previous studies on accident density analysis, only the accident density was considered, and the influence of road network density on the accident density was not considered. Second, the measure of traffic safety level includes not only the frequency of traffic accidents but also the consequences of the severity of the accidents themselves. Although previous studies have examined the spatial distribution characteristics of regions with higher accident severity, few have linked accident frequency with the spatial distribution characteristics of accident severity.
Therefore, in view of the shortcomings of previous studies, accidents were first spatially located by geocoding methods in this paper. Second, accidents were divided into four categories according to the season in which they occurred. Then, using density analysis and cluster analysis, areas with a high frequency and severity of traffic accidents were identified. Finally, the results of the two analysis methods were combined to determine accident-prone areas of different severities.
This paper aims to investigate the spatial and temporal patterns of accidents from two perspectives: accident frequency and accident severity. The remainder of the article is arranged as follows. Section 2 describes the traffic accident data. Section 3 shows the main methods used in this study. The results of density analysis and cluster analysis are presented in Section 4. Finally, Section 5 presents the conclusions.
2. Data Processing
This study focused on the city of Harbin, located in northeastern China. Harbin covers an area of approximately 53,100 km2 and had a population of 9.55 million in 2017. This study focused on only the main urban area of the city. Each data point contains basic accident information, such as time of occurrence, location of occurrence, accident casualties, road type, and weather data. In this research, some of this information (shown in Table 1) was selected for the study.
In GIS, the location of a traffic accident is generally marked by latitude and longitude coordinates. However, latitude and longitude were not included in the raw data. Therefore, the longitude and latitude coordinates needed to be determined from the description of the traffic accident location. Then, the spatial location of the accident was finalized. This process is called geocoding [26]. Tian et al. [27] evaluated the quality of four mainstream geocoding services in China (Baidu, Gaode, Sogou, and Tencent). The service quality of Tencent’s geocoding API was considered relatively high, with higher data quality and more complete address data than the other services. Therefore, Tencent was selected to complete the conversion of the accident coordinates and then import the converted data into GIS. When the Tencent API returns the coded result, it also returns the reliability of the results. Reliability values range from 1 (low reliability) to 10 (high reliability). A result is considered credible when it has a reliability score of 7 and above, so these results were retained in this study. After the geocoding process, 5850 accidents were identified for further study.
Figure 1 illustrates the overall spatial distribution of traffic accidents from 2016 to 2018. The figure suggests that the accidents occurred mostly in the central urban area.

3. Methodology
3.1. Comap Method
The comap technique can help us to recognize the location of traffic accidents over time. It has been widely used in temporal-spatial integration [28, 29]. In this paper, traffic accident data from 3 years were divided into four subsets by season in accordance with Harbin’s climatic conditions. Then, density analysis and cluster analysis were applied to calculate the intensity of each subset. Finally, the results were arranged sequentially in a graphic to show the spatial distribution of traffic accidents over time. According to the suggestion of related literature [30], class boundaries should overlap. As shown in Table 2, accidents were divided into four subsets by season. There is some overlap between subsets to avoid temporal boundaries.
3.2. Density Analysis Method
This study used point density and line density to identify spatial patterns of traffic accidents. The former is obtained by calculating the number of accidents per unit area. The latter is obtained by calculating the length of the section per unit area.
The calculation of the density analysis was performed in GIS with the neighborhood method. For example, the study area was divided into several small square cells with side length d when calculating the density of the accident points. Each cell ultimately corresponds to a pixel in the output map. The accident density in the region where cell k is located is Dkaccident, and the radius of the neighborhood is set to r. Nk(r) is the number of accidents within a neighborhood centered at the center of cell k and with radius r. The point density is calculated as follows:
The road network density in the area where cell k is located is Dkroad. Similarly, Lk(r) is the length of the road within the same neighborhood. The specific formula is as follows:
In addition, d and r need to be determined in the actual calculation. They are usually obtained in GIS from the minimum of the output image height and width, which are 1/30th and 1/250th of the minimum. The study area is between 126.15 and 127.15 east longitude and 46.09 and 45.52 north latitude. The minimum output image width was obtained after longitude and latitude were converted to the actual distance. A cell length of 230 m and a neighborhood radius of 1900 m were selected for the density analysis.
3.3. Cluster Analysis Method
Cluster analysis is a more rigorous data analysis process. Spatial clustering analysis divides collections of physical or abstract objects in spatial data into similar classes. In turn, the spatial patterns of similar classes of data are obtained [31]. In this paper, outlier analysis was performed in GIS to explore the spatial pattern of accident severity. First, this approach is in contrast to traditional cluster analysis approaches, such as hierarchical or divisional cluster-based methods. These methods can determine only whether a sample belongs to a certain category [32]. However, outlier analysis identifies samples that do not belong to any category. This provides a more comprehensive analysis of the spatial pattern of accidents. Second, the calculation of the outlier analysis is based on the attributes of the individual accident sample points. Original data attributes are largely preserved. This facilitates an in-depth study of the accidents.
Outlier analysis is performed by calculating the local Moran’s I of an accident. It measures the correlation between the attributes of each incident point and the values of other neighboring points. Outlier analysis is calculated as follows:where is the local Moran’s I statistic of data point i, n is the total number of accidents, and are the attributes of data points i and j (which correspond to the accident severity in this paper), is the global mean of the attribute, is the spatial weight between i and j, and is a second-order sample matrix of the attributes of data points.
Formally, can be expressed aswhere the score for the data point is calculated aswhere and can be expressed as
There were five types of statistical results of the outlier analyses: high-high clustering (H-H), high-low clustering (H-L), low-high clustering (L-H), low-low clustering (L-L), and nonsignificant. In general, a 95% confidence level was used to signify statistical significance. In other words, a result was considered statistically significant when the value was less than 0.05. The corresponding z-score should have ranged between −1.96 and + 1.96 according to the normal distribution. If a result was statistically significant and I > 0, then the data point had the same level of high or low attributes as the adjacent points. The attribute values of the point were compared with the average attribute values of all the data points to determine whether the results indicated H-H or L-L clustering. If I < 0, then the properties of the data point differ significantly from those of adjacent points, and the point is an outlier.
4. Analysis Results
4.1. Density Analysis
4.1.1. Distribution of Accident Point Density
The frequency of accidents per unit area is an indicator. This indicator is used to measure the level of traffic safety on urban roads. In this section, point density analysis was used to calculate this value. Moreover, the temporal and spatial distribution of traffic accidents was generated. This approach helps to determine whether an accident hot spot is subject to temporal fluctuations in the accidents. In addition, the maximum value normalization method was used to normalize the density values. This method facilitates the classification and comparison of accident density intervals.
According to relevant literature [28], the accident density results were classified into 3 levels. Density values of 0.5 and 0.8 were used as the two cutoff points. Density values between 0 and 0.5 indicate a low-density area of accidents, values between 0.5 and 0.8 indicate an intermediate density area of accidents, and values between 0.8 and 1 indicate a high-density area of accident. The density calculation results are shown in Figure 2.

(a)

(b)

(c)

(d)
As shown in Figure 2, the number of accidents was similar in spring and fall. The locations of intermediate to high densities of accidents were similar in these two seasons. In addition, the number of accidents was higher in summer than in the other seasons. The areas with intermediate to high densities of accidents were also more widely distributed in summer. The high-density accident area statistics are shown in Table 3. The high-density accident areas were found in the same administrative divisions in spring, summer, and autumn. In winter, these areas were located only in the Daoli District. The high-density area was the largest in summer, at approximately 5 km2. The accident rates in high-density areas were highest in the fall.
Statistics of areas with an intermediate density of accidents are shown in Table 4. The areas with high densities of accidents were found in the same administrative divisions in spring, summer, and autumn, and these areas were mainly located in the Daoli, Daowai, Xiangfang, and Nangang Districts. The areas with a high density of accidents in winter in addition to the other three seasons include the Pingfang District. The area with an intermediate density of accidents is largest in summer, at approximately 35.22 km2. The accident rates in areas with an intermediate density of accidents were highest in winter.
In conclusion, areas with intermediate to high densities of accidents were concentrated near large shopping malls, schools, and hospitals (yellow circles), especially in the vicinity of the First Hospital and Ha Station. These areas were larger in the summer and winter and smaller in the spring and fall. In addition, the accident density in some areas (yellow circles) was found to have not changed significantly in space and time through the comap technique. However, the density in some areas fluctuated in space and time. For example, the green box in Figure 2(d) was not identified in any of the other seasons. Notably, the southern area that had a high density of accidents in the winter (highlighted by the box in Figure 2(b)) was not identified as a high-density area in any of the other seasons.
4.1.2. Distribution of Accident Point Density considering Road Network Density
The density of accident points per unit area was determined in the previous section. However, that analysis did not consider the density of the road network. The spatial-temporal pattern of accidents was not fully reflected. Therefore, a new spatial-temporal accident pattern was obtained by calculation. This calculation was performed by dividing the point density by the road network density (Dkaccident/Dkroad) to obtain the accident frequency per unit road length. The division of the density value is consistent with the above. The new pattern, which considers the road network density, is shown in Figure 3.

(a)

(b)

(c)

(d)
A significant change is shown in Figure 3. The spatial and temporal patterns of accidents differed significantly by season. In addition, new areas with intermediate to high densities of accidents were identified. The statistics of new high-density areas of accidents are shown in Table 5. A comparison of the accident density with the administrative divisions indicated that, in spring and fall, accidents occurred in the Daowai and Xiangfang Districts, while in summer and winter, they occurred in the Daoli District. A comparison of accident density and road networks indicated that, in spring and autumn, accidents were concentrated on Nanzhi Road, Pioneer Road, Gongbin Road, and Hongqi Street. In summer and winter, accidents were concentrated on Pioneer Road and South Straight Road. The area with a high density of accidents in autumn was the largest, at approximately 14.11 km2. Accident rates in the high-density areas were largest in spring.
The statistics of the updated intermediate density areas of accidents are shown in Table 6. A comparison with administrative divisions indicated that, in spring, autumn, and winter, an intermediate density of accidents occurred in the Daoli, Daowai, and Xiangfang Districts. In summer, the area with an intermediate density of accidents also contains Nangang District. A comparison with the road network indicated that the areas with an intermediate density of accidents in each season were mostly concentrated on Gongbin Road, Xinyang Road, Dongzhizhi Road, Nanzhizhi Road, Pioneer Road, and so on. In summer and autumn, these areas also included Tianheng Street and so on. In winter, they included Haping Road, Heping Road, Hexing Road, and so on. The area with an intermediate density of accidents was largest in summer, at approximately 41.39 km2. The accident rate in areas with an intermediate density of accidents was similar among seasons.
In summary, the size of areas with a high density of accidents in spring and winter was smaller, while that in summer and autumn were larger and more widely distributed. In addition, a comap was generated to demonstrate temporal-spatial patterns of accidents in each season. Some areas (marked by boxes 1, 2, and 3 in Figure 4) showed essentially no fluctuations over time. In some areas, there were changes in both spatial and temporal patterns. Examples of these areas include box 5 in Figures 4(b) and 4(c) and box 4 in Figure 4(d). These areas have a low density of accidents in all the other seasons.

(a)

(b)

(c)

(d)
Compared with the results shown in Figure 2, both analyses demonstrated relatively similar areas of intermediate and high densities of accidents, although significant changes were observed in some areas. Some findings were obtained after considering the road network density. First, the color of some areas was lighter in Figure 3 than in Figure 2. For example, the areas in the yellow circle in Figure 2 and the green box in Figure 2(d) became areas with low or intermediate densities of accidents. This indicates that the density results in these areas were due to an overly dense road network. Second, new areas with intermediate to high densities of accidents were identified in each season. The frequency of accidents per unit road length is higher in boxes 4 and 5 in Figure 3(b) and box 4 in Figure 3(b). Finally, there were no significant changes in the area (box 3 in Figure 3(b)). Whether considering the frequency of accidents per unit area or per unit road length, intermediate-to high-density areas of accidents were identified in this area.
4.2. Cluster Analysis
The previous section presents the results obtained when the frequency of accidents was analyzed without considering the severity of accidents. Traffic managers measure the severity of accidents considering not only the frequency of accidents but also the property damage and casualties caused by the accidents. In fact, if an area occasionally has a particularly serious accident, it deserves more attention than areas where minor accidents occur frequently. Therefore, according to the relevant literature and the data, the accident severity was classified into 3 levels in this section (as shown in Table 7).
There are certain patterns in the distribution of traffic accidents of differing severity in time and space. In this analysis, accident severity was used as a factor to evaluate the results of the spatial clustering of accidents. The clustering results are shown in Figure 4.
The position of the box in Figure 4 corresponds to that in Figure 3. The dark red points (H-H) represent the high-severity accident class. The dark blue points (L-L) represent the low-severity accident class. The light red points (H-L) indicate a few high-severity accident points contained within the spatial extent occupied by many low-severity accident points. The light blue points (L-H) indicate a few low-severity accident points within the spatial area occupied by many high-severity accident points. The gray points indicate that the incident points had no obvious clustering features.
As shown in Figure 4, different types of clustering features were identified in each season. Some areas of Pingfang District showed a tendency for high-severity accidents. Likewise, this pattern was also observed in the southwestern part of the study area in spring. Most of these areas were concentrated in peripheral and suburban areas away from urban centers. In contrast, most accidents are dispersed within urban centers. To a certain degree, traffic accidents can cause casualties and property damage, although most accidents have a low severity. In addition, the traffic accidents in the northeastern area of Daoli District in winter were in the L-L class, indicating that this area was a hot spot for low-severity accidents.
4.3. Combined Density Analysis and Clustering Analysis
This approach allows us to clearly understand the results of combined clustering and density analyses. By combining these analyses, accident severity in areas with intermediate and high densities of accidents was identified. More importantly, these areas had a ranking that should be noted. Traffic managers could thus target individual areas for better management and regulation. For example, in spring, there were three areas with intermediate to high densities of accidents (surrounded by boxes in Figure 4). For example, Mm is the number of accidents in Region 1 after the cluster analysis, with m = 1, 2, 3, 4, 5, representing accidents in the H-H, H-L, L-H, and L-L classes and nonsignificant accidents, respectively. The proportion , which indicates the likelihood that this region will eventually exhibit a certain clustering feature, is calculated as
Similarly, the proportion of clustering results for each season was calculated using equation (7). The calculation results are shown in Table 8. The clustering results for some regions were nonsignificant and are not presented in this table. As shown in Table 8, accidents in the L-L class account for a relatively large proportion of accidents in Region 1 (Figure 4(a)), in Regions 1 and 4 (Figure 4(b)), and in Regions 1 and 2 (Figure 4(d)). This result indicated that although the frequency of accidents per unit length of roadway is greater in these areas, the severity of accidents is generally lower. Conversely, accidents in the G-G class account for a relatively large proportion of accidents in Region 5 (Figure 4(c)). This result indicated that the frequency of accidents per unit road length is higher in this area and the severity of accidents is higher.
5. Conclusion
(1)In this paper, density analysis was combined with comap technology to study the spatial and temporal patterns of traffic accidents by season from the perspective of accident frequency. The results show that the accident density distribution is more diffuse in summer and winter when the road network density is not considered. After considering road network density, the accident density distribution is more diffuse in summer and autumn.(2)Accident severity is divided into three levels: accidents that cause property damage, accidents that lead to injury, and fatal accidents. Based on these three levels, cluster analysis was used to explore the spatial and temporal patterns of accidents. The following conclusions were drawn. Traffic accidents in urban centers mostly show characteristics of the H-L and L-L classes. Traffic accidents in the central part of the city are generally low-severity accidents. Conversely, the results show that accidents with characteristics of the H-H class were mainly found in the outer urban areas and suburban areas. This indicates that most traffic accidents in these areas are high-severity accidents.(3)Density analysis is a regional analysis. This method reflects a coarse picture of the spatial-temporal pattern of accidents. Clustering analysis can be accurate at the level of the accident points. Urban areas prone to accidents of different severities were identified by combining two methods. The results show that the accident severity was lower in Region 1 in spring, summer, and winter, although the frequency of accidents was higher. In Region 5 in the fall, not only was the frequency of accidents greater but also the severity of accidents was generally higher.(4)Due to data limitations, further analysis of causal factors of accidents leading to spatial patterns is needed. The next step will be to conduct an in-depth study of the causes of various types of traffic accidents, taking into account road characteristics and infrastructure, with a view to providing a more detailed basis for traffic safety management.Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest regarding the publication of this paper.
Authors’ Contributions
Meina Wang and Wenhui Zhang contributed to conceptualization, methodology, and original draft preparation; Meina Wang contributed to software; Jing Yi and Tiangang Qiang contributed to validation; Wenhui Zhang and Meina Wang contributed to formal analysis; Meina Wang, Xirui Chen, and Jing Yi contributed data curation; Xirui Chen, Meina Wang, and Tiangang Qiang contributed to review and editing. All the authors have read and agreed to the published version of the manuscript.
Acknowledgments
The authors would like to thank Harbin Public Security Bureau Traffic Police Station for providing related data for the case study. The research was funded by the National Key Research and Development Projects (2017YFC0803901) and the Fundamental Research Funds for the Central Universities (2572018BG01).