Abstract
Tailored countermeasures that may significantly improve road traffic safety can be proposed and implemented if the relationship between various associated factors and aggressive driving is well understood. However, this relationship remains unknown, as driving behavior is complex, and the interrelationships among variables are not easy to identify. Considering this situation, this paper constructed a model based on a structural equation model (SEM) and factor analysis (FA), which is a multivariate statistical analysis technique used to analyze structural relationships. The model is applied in a case study using data from the Shanghai Naturalistic Driving Study. In the case study, 16 variables were grouped into five latent factors in the SEM, and the model fits the data well. Compared with other variables, the results show that age had the most significant positive impact on aggressive driving behavior (older drivers exhibited high aggressive driving frequency). Adverse weather negatively impacted driver behavior (lower speed and high longitude acceleration), which in turn negatively affected aggressive driving behavior. In addition, the results show that driver factors (such as age and sex) were the main factors influencing vehicle use (such as hard acceleration), and the environment was the main factor determining risky scenarios, where safety-critical situations increase. This paper provides a reference for defining and determining aggressive driving and a model for exploring the relationship between driving safety factors and aggressive driving, which can be used in real-world applications for improving driving safety with applications in advanced driver-assistance (ADAS) and traffic enforcement safety control systems.
1. Introduction
The analysis of the relationship between driving-safety factors and aggressive driving has been an essential topic in driving-safety research [1]. It involves using statistical techniques to gain insight into driving-safety factors that affect or are associated with aggressive driving [2]. Meanwhile, improving driving safety through various means, such as training drivers to have proper habits, providing advanced vehicle technology, improving road environments, etc., may benefit from exploring the relationship between driving-safety factors and aggressive driving [3]. For example, Ahmed et al. [4] indicated that driving-safety controls effectiveness differs by driver group.
Previous studies have been conducted to examine the relationship between aggressive driving and driving-safety factors, such as driver characteristics (age, sex, professional or nonprofessional driver, education level, etc.), vehicle driving characteristics (speed, accelerator, etc.), and the road environment (such as weather, traffic flow, law enforcement, etc.). For example, Bedard et al. [5] used a multivariate logistic regression and found that aggressive driving is associated with age and sex. Wang [6] analyzed aggressive driving behavior performance for three road types (surface road, freeway, and expressway) and found that aggressive driving behavior was significantly affected by the road environment. Pantangi et al. [7] indicated that driver’s awareness of law enforcement has the potential to decrease aggressive driving behavior patterns. However, driving behavior is multifaceted, and a single factor can hardly show robust results regarding the relationship between driving-safety factors and aggressive driving [8]. For example, driver behavior among same-sex drivers is also affected by changes in age, and different driving performances occur within a given driver group with changes in the road environment. Indeed, some researchers have realized this; for example, Zhao [9] analyzed the influences of multiple factors on aggressive driving and found that driver characteristics and road weather influence aggressive driving. However, robust results have not been obtained because several questions remain unanswered: (1) limited approaches used in multifactor analysis and (2) a lack of reliable data.
Previous studies achieved several results for investigating factors contributing to aggressive driving based on existing approaches, such as support vector machines and random forests [10]. However, driving behavior is multifaceted, and the relationships between driving-safety factors and aggressive driving are challenging to explore. Previous studies have tended to emphasize the impact of a single characteristic on driving safety, and the effectiveness of a combination of subjective and objective indicators for safety evaluation has not yet been thoroughly researched based on those approaches. However, aggressive driving behaviors are complex and associated with multifactors. For example, the typical aggressive driving behaviors, tailgating, abrupt speed change, and sudden lane change [11] have different determining standards and performance in different situations, such as speeding is significantly related to age, educational level, and mileage driven [12], younger drivers are more likely to sudden lane change [9], and tailgating is more likely to present in the normal weather [13]. Considering this situation, this paper uses a structural equation model (SEM), which is a multivariate statistical analysis technique that is used to analyze structural relationships. This technique is the combination of factor analysis and multiple regression analysis, and it is used to analyze the structural relationship between measured variables and latent constructs. Meanwhile, the SEM can concurrently address the complex connections among endogenous and exogenous variables to explore the relationship between driving safety factors and aggressive driving [3].
Various data sources have been used in the past research to study aggressive driving. The earliest studies were conducted by collecting field data or interviewing drivers (e.g., [14]). However, one main concern for using self-report survey data is that such data can be highly subjective. Similarly, driving simulator data have often been used in driving-safety studies; accordingly, as with any driving simulator study, certain limitations are inherent to this approach. Simulated driving behavior may not match normal driving behavior because the participants know that they are being observed.
Meanwhile, data validity has consistently been found to impact findings [15]. Recent developments in sensing and computing technologies have enabled the use of independent data-logging systems with embedded sensors [16] or global positioning system (GPS) data based on smartphone applications [17]. Although such systems help collect microscopic trajectory data with rich information, obtaining data for driver behavior studies has still been difficult due to the limitation of detection devices for capturing movement. Vehicle dynamics data, now more readily available thanks to decreased costs, have been used to study driver behavior through naturalistic driving study (NDS). For example, Guo [18] used the 100-Car Naturalistic Driving Study to identify factors associated with aggressive driving.
This research aims to develop an aggressive driving correlation model for exploring the relationship between multiple driving-safety factors (driver characteristics, vehicle driving characteristics, and road environment) and aggressive driving behavior based on SEM and FA. Driver characteristics include age, sex, distracted driving, and drowsy driving. Vehicle driving characteristics include speed, acceleration, and road environment including road type, traffic flow, and weather.
2. Literature Review
This literature review focuses on the relationship between driving-safety factors and aggressive driving and the use of modeling in aggressive driving research. As mentioned above, driving-safety factors include driver characteristics, vehicle driving characteristics, and road environment characteristics. The literature on each factor and machine learning methods related to this paper is reviewed below.
2.1. Relationship between Driving-Safety Factors and Aggressive Driving
Several driver characteristics affect driving behavior. These characteristics include age, sex, driving habits, etc., among which sex and age are the most prominent factors [19]. Bedard et al. [5] found that aggressive driving varied with age and sex, and elderly female drivers had higher traffic safety needs than young male drivers. However, conflicting results were found by Al-Balbissia et al. [20], who found more significant aggressive driving among male drivers than among their female counterparts in all age groups. Antin [21] used Poisson regression models to determine that male drivers are more prone to exhibit severe aggressiveness among young drivers, whereas in the oldest groups, female drivers are more likely to exhibit aggressive driving. As seen in the above-discussed literature, the relationships between driver characteristics and aggressive driving remain unresolved.
The influence of vehicle driving characteristics, such as speed and acceleration, remains on aggressive driving largely unknown. Specifically, Goebelbecker and Uzgiris [22] proposed a paper referring to a series of field tests carried out in the 1930s and 1940s to determine safe speeds and acceleration rates. The results established the levels of acceleration that led to driver discomfort. Recently, more vehicle driving variables have been considered to analyze the main factor affecting aggressive driving. For example, Reymond et al. [23] proposed an aggressive driving model that considers both maximal lateral acceleration and predictable steering corrections and found that both lateral and longitudinal variables are connected to aggressive driving. Eboli [24] found that speed and acceleration describe the motion of a vehicle, and these parameters are fundamental to defining aggressive driving.
The literature on the impact of the road environment on driving behavior is quite limited. Wang [6] concluded that aggressive driving differences between road types have obvious explanations for road types. Several researchers have found that adverse weather has an impact on aggressive driving [25]. However, different results have been found in other research, such as Kordani [26], who added to the current knowledge by incorporating real-time traffic and weather data from urban arterials to investigate accident occurrence and accident severity mechanisms. They found that weather variables do not significantly affect aggressive driving behavior. Zhou [27] found that aggressive taxi speeders are linked to longer daily driving distance and cruise distance, shorter delivery time, higher hourly income, driving at night, and driving on low-speed limit roads.
In summary, most literature on driving-safety factors has focused on the impacts of single or only limited factors on aggressive driving; however, those findings are incomplete. Meanwhile, sometimes results are inconsistent. Note that driver behavior is multifaceted. The combined effect of subjective and objective factors on aggressive driving should be analyzed to improve driving safety.
2.2. Aggressive Driving Correlation Model
Support vector machines (SVMs) have been used in some aggressive driving research [28]. These results show that SVMs yield better predictions regarding the impact of factors on aggressive driving. Random forests (RFs) have occasionally been used in aggressive driving modeling, and their performance has been reported to be satisfactory. Furthermore, Fu [29] used a Bayesian dynamic extreme value modeling approach for conflict-based real-time safety analysis. However, driving behavior is complex and multifactorial, and the relationships between those factors and aggressive driving are challenging to identify. Researchers have recently tried to overcome these difficulties and develop a methodology based on SEM, which has several advantages for understanding complex relationships among factors [30]. For example, Hamdar [31] developed a quantitative intersection aggressiveness propensity index using SEM to observe environmental, situational, and behavioral factors. Zhao [8] analyzed driving factors effects on driving safety in a comprehensive system considering driver characteristics and vehicle driving characteristics based on SEM.
In summary, SEM is more suitable than other existing methods for aggressive driving modeling in multifactor situations. However, existing studies relying on SEM have fallen short in considering the underlying relationship among contributing factors–factors are manually subjectively grouped into lateral variables, leading to biased results [32]. Therefore, this paper uses SEM integrated with factor analysis to explore the relationship between multiple factors of driving safety and aggressive driving.
3. Methodology
The framework of the methodology is presented in Figure 1. This framework uses three steps to analyze the relationship between driving-safety factors and aggressive driving. Step 1: data collection and filtering. In this step, all characterization variables are extracted from NDS data. Then, the data were filtered based on the data distribution, and the data noise was reduced. Step 2: defining aggressive driving variables. In this step, a judgment criterion for aggressive driving-related variables was defined for various situations. Step 3: correlation model for analyzing the relationship between driving-safety factors and aggressive driving. This step uses descriptive data analysis to develop an initial model based on experience. The model structure is adjusted based on factor analysis to ensure matching between variables and latent factors. Finally, an aggressive driving correlation model in the multifactor situation is constructed based on SEM.

3.1. Data Collection and Filtering
NDS information is typically in the format of a time series and is recorded at a certain frequency determined by the equipped devices. Each vehicle has a unique ID, and thus, its trajectory, driver characteristics, vehicle driving behavior, and details of the surrounding driving conditions (both the road environment and the surrounding vehicles) can be extracted. The data used in the study include timestamp information, driver characteristics, vehicle driving behavior, and road environmental data. Details of the data are provided in Table 1.
The raw NDS database has some data issues, such as missing information outliers and noise. Therefore, the data needs to be cleaned. Outliers are first removed from the database, and records with missing data are interpolated. Given a vehicle with missing information for variable (in the time series), , , missing information in can be interpolated:where data are missing between and () and conveys the data to be interpolated, where .
After data interpolation, noise in the data (such as fraud data and random noise) should be attenuated [33]. Different digital signal filtering techniques, such as the Kalman filter and Savitzky–Golay filter, can reduce noise. In this study, the Kalman filter, which is practically used for correcting noisy time-series data, is introduced. The details of the Kalman filter can be found in Kim [34]. In this paper, speed, longitudinal acceleration, and lateral acceleration date for each vehicle were calibrated using the Kalman filter.
3.2. Selection of Aggressive Driving-Related Variables
To model the aggressive driving behavior, the related output (criteria in describing aggressive driving state) should be determined. Different definitions have been given to aggressive driving. Some researchers demonstrated that aggressive driving behavior is the offensive maneuver by the driver that endangers other persons or property [35]. Some work has focused on a series, or a combination, of such offenses. For example, the NHTSA has defined aggressive driving as a combination of such offensive driving behaviors [36]. This paper focuses on independent offensive maneuvers (the former). From the NDS data achieved, criteria associated with aggressive driving include speed, TTC between vehicles, longitudinal acceleration, and lateral acceleration. Aggressive driving behaviors associated with these parameters include tailgating (described by TTC and longitudinal acceleration), abrupt speed changes (described by longitudinal acceleration), erratic or sudden lane changes (described by lateral acceleration), and speeding (described by speed). Accordingly, this paper further classified parameters related to aggressive behavior into two types and considered both to analyze risky driving behavior: (1) vehicle-control parameters, which represent driving operation and (2) risk scenario-related parameters, which represent safety-critical situations where risks occur. In this study, vehicle-control parameters involve longitudinal and lateral acceleration rates, and risky-scenario-related parameters include TTC and speeding state.
3.2.1. Vehicle-Control Parameters
Rules are needed to determine the behaviors that constitute aggressive vehicle-control behavior. The risks associated with the behaviors represented by vehicle-control parameters change with vehicle speed; therefore, the impact of speed on these parameters should be considered. Thresholds at different speeds are used to determine drivers’ aggressive states in terms of the two behavioral parameters, longitudinal and lateral acceleration. The statistical approach is used to determine thresholds for data outliers (observations of aggressive driving operation recorded on the driving-state dimension) according to the data distribution.
In this study, speed data are recorded to 0.1 km/h (values are saved in one decimal); therefore, distributions of the longitudinal acceleration and the lateral acceleration at each speed point are checked. In most cases, data confirm the normal distribution, . According to Zhang [37], data that fall outside the range are considered outliers (where ). (when 0.01% of the observations are set as outliers) is applied in this study as threshold. The details of determining outlier thresholds are provided in the case study section.
3.2.2. Risky-Scenario-Related Parameters
For risky-scenario-related parameters, both the typical traffic conflict technique (the TTC measure) and the vehicles speeding state are considered to determine whether a vehicle is in a risk scenario. A TTC threshold of 1 sec (TTC<1 sec) is chosen according to a previous study that identifies abnormal driver behavior using data from the same NDS [6]. Speeding is also an important type of aggressive behavior. However, since NDS data involve trips over the road network with diversified speed limits (not available), measuring and quantifying speeding attempts has been challenged. This paper only used the data of freeway sections to maintain consistency in the extraction criteria for aggressive driving behavior. Therefore, to extract aggressive driving behavior with obvious characteristics, this paper adopts the speed limit of the freeway (120 km/h) as the threshold. In the future study, we will further study the impact of different roadway types and the speed limits on aggressive driving behavior.
Once the drivers’ operation exceeds one of the thresholds above, it will be judged as suspected aggressive driving. Then, the data clips of suspected aggressive driving are checked based on the NDS video records. Once the drivers occur the tailgating, erratic, and sudden lane changes, the suspected aggressive driving are marked as aggressive driving. The variables used in the study and their definitions are provided in Table 2.
3.3. Aggressive Driving Correlation Model Construction Based on SEM and Factor Analysis
This step uses descriptive data analysis to develop an initial SEM structure based on experience. To ensure a matching degree of each variable for factors, the variables were reclassified based on factor analysis (FA). Then, the aggressive driving correlation model in the multifactor situation is constructed based on SEM. The flowchart is shown in Figure 2.

3.3.1. Stage 1: Initial SEM Structure
SEM has three parts. Part 1 is the endogenous variables measurement model (Y measurement as the aggressive driving model). Part 2 is the exogenous variables measurement model (X measurement as a driving-safety factor model). Part 3 is a structural model. The equation for the SEM applied in this study is presented in the following, and the description of the equation parameter is shown in Table 3:
The SEM elements are summarized in a measurement model, structural model, and covariance matrix. This paper uses AMOS 20 to construct a correction model. AMOS 20 can efficiently exchange data files within the SPSS framework and has a convenient graphical interface for creating SEM. The aggressive driving correlation model is constructed based on the SEM. Driving-safety-related variables and aggressive driving variables are latent variables in the initial model. To ensure a matching degree of each variable for driving-safety factors and aggressive driving, the variables are reclassified based on the factor analysis that is conducted to construct linear combinations of the original variables to explain a large part of the total variance.
3.3.2. Stage 2: SEM Frame Structure Adjustment Based on FA
FA studies the interrelationships among characteristic variables to find a new, smaller set of variables that express common factors [38]. The original structure of the SEM model was constructed from the researchers experience and literature review. The original structure is subjective and neglects the difference in the data source. In this paper, FA was used for adjusting the SEM structure based on the distribution of factor loading. Note that, the result of FA is just used as the adjustment reference, which will not impair the ability to explain contributing factors in the SEM. Before using the FA, the validity test should be used to look for highly-correlated variables. The Kaiser-Meyer-Olkin (KMO) test and Bartletts’ spherical test was used in the validity test to examine the effectiveness of factor analysis based on a previous study [9]: (1) KMO test coefficient >0.5; (2) Bartletts’ .
Several stages should be used to explore the fit of each variable. First, let , …, denote original driving variables normalized to zero and a variance of 1. Using a multifactor model, can be assumed to bewhere is a common factor, is the factor loading, and is a unique factor. These factors satisfy the equations, and the mathematical expectation of each parameter is
Then, the main step of common factor extraction is correlation analysis, which chooses to decompose each variable into common factors . In this paper, some categorical variables do not obey the assumption of a normal distribution. At the same time, the goal of this step is to extract a comprehensive evaluation of variable contributions. Therefore, the correlation between variables and common factors is explored based on the principal component analysis (PCA) method, which has advantages in solving the above-mentioned problems. PCA starts extracting the maximum variance and puts them into the first factor. After that, it removes that variance explained by the first factors and then starts extracting the second factors maximum variance. This process continues until the last factor. Details of the PCA method can be seen in [39].
The extracted results for common factors from the previous stages should be examined. Eigenvalues, calculated as the sum of squared factor loadings for each principal component (common factor), are used as the criteria to determine whether a related common factor should be kept. Common factors with eigenvalues over 1 are selected to be kept [40]. After PCA, the factor loading matrix can be calculated, and then common factors, , are determined along with the factor loadings.
Finally, the cumulative percentage of variance is calculated (using the contribution test) to check whether the results from the retained common factors application are reasonable in explaining the relationship between the input variables. The cumulative percentage of variance is calculated as
In PCA, factors are listed in order of the variance that they explain; a higher cumulative percentage value can be achieved by retaining more factors, and various numbers of values have been used as the criteria. In this paper, as the number of original variables is small and the selection of the number of common factors is simple, we adopted a basic requirement of [41] to determine whether the selection of common factors is reasonable. After the factor analysis, the common factors and factor loadings are calculated for SEM variables belonging to adjustment.
3.3.3. Stage 3: Aggressive Driving Correlation Modeling Structuring Based on SEM
After the structure adjustment, SEM was constructed based on AMOS 20. Five methods are available to estimate the parameters in AMOS 20: maximum likelihood (ML), generalized least squares (GLS), unweighted least squares (ULS), scale-free least squares (SFLS), and asymptotically distribution-free (ADF) methods. Generally, the ML method is the most stable and is used in this paper as an estimator of parameters. In the ML method, the parameter (ɵ) can be computed by minimizing the discrepancy between the sample covariance matrix S and the population covariance matrix expressed in terms of unknown parameters . The ML function is as follows:Values of are estimated to minimize the weighted sum of squared deviations of s from σ ().
The output of SEM programs includes matrices of the estimated relationships between variables in the model. Assessment of fit essentially calculates how similar the predicted data are to matrices containing the actual data relationships. There are different approaches to assessing fit. Traditional approaches to modeling start from a null hypothesis, rewarding more parsimonious models (i.e., those with fewer free parameters), to others such as Akaike information criterion (AIC) that focus on how little the fitted values deviate from a saturated model (i.e., how well they reproduce the measured values), taking into account the number of free parameters used. Some of the more commonly used measures of fit include Chi-squared, goodness-of-fit index (GFI), adjusted GFI (AGFI), normed fit index (NFI), and root mean square error of approximation (RMSEA) [42]. In this paper, once the model displays values greater than 0.9 for GFI, AGFI, CFI, and NFI and a value lower than 0.05 for RMSEA, it shows the goodness of fit.
The model may then need to be modified to improve fit, thereby estimating the most likely relationships between variables. Many programs provide modification indices that may guide minor modifications. Modification indices report the change in χ2 that results from freeing fixed parameters, therefore adding a path to a currently set-to-zero model. Modifications that improve model fit may be flagged as potential changes that can be made to the model. Changes to the measurement model effectively show that the items/data are impure indicators of the latent variables specified by theory and experience [43]. In this paper, the model was modified based on the change in χ2. Meanwhile, the modifications to a model, especially the structural model, are adjusted to be consistent with the actual situation.
4. Case Study
4.1. Data Preparation
As a case study, the data collection and method applicability verification are based on the Shanghai Naturalistic Driving Study (SH-NDS). The SH-NDS was jointly conducted by Tongji University, General Motors (GM), and the Virginia Tech Transportation Institute (VTTI). Driving data were collected daily from 60 licensed Shanghai drivers who, all together, traveled 161,055 km during the study period. The drivers drive according to the usual driving habits and route without any driving tasks driving trip, and a whole trip of individual drivers is counted as one record. Five GM light vehicles equipped with SHRP2 NextGen data acquisition systems (DAS) were used to collect naturalistic driving data. DAS includes an interface box to collect vehicle controller area network data, an accelerometer for longitudinal and lateral acceleration, a GPS sensor for location, and four synchronized cameras that can be used to validate the sensor-based findings. As shown in Figure 3, the four cameras monitor the driver’s face (Figure 3(a)), the drivers hand maneuvers (Figure 3(b)), the roadway in front of the vehicle (Figure 3(c)), and the roadway behind the vehicle (Figure 3(d)). A total of 60 participants were screened for inclusion, and 10500 records (a trip of drivers is defined as a record, and each record is stored as a separate file, which includes an excel file with driving data of this trip; four video files with driver characteristics, road environments) were collected. After data filtering, a total of 8000 records were collected. When we compare the coefficient of variation (C.V.) and standard deviation (S.D.) of the data before and after filtering, we find that interpolation and noise removal can improve the quality of data, take the speed as an example, and S.D. is decreasing after filtering (42.27 vs 37.76). The change of C.V. is the same as S.D. (0.019 vs 0.017).

(a)

(b)

(c)

(d)
4.2. Aggressive Driving and Driving-Factor Variable Extraction
As mentioned above, aggressive driving behavior is represented using vehicle-control parameters involving longitudinal and lateral acceleration rates and risky-scenario-related parameters include TTC and speeding state. The data beyond the judging threshold are classified as aggressive driving behavior (aggressive = 1, safety = 0) based on the proposed method in Section 3. Then, NDS data are collected and coded for the SEM structuration. The vehicle driving behavior data are extracted from the database. Driver characteristics and road environment are collected from the recorded video. In the detail, after the experiment, the experimenters marked the road environment as categorical variables based on the video records (every experimenter is trained before doing data mark and follows the same standard). Then, these environment variables are combined with the vehicle behavior variables to construct the data set. Note that the age of the driver was separated into three binary classes to construct an exogenous measurement model: (1) 19–30: 1, others: 0; (2) 30–40: 1, others: 0; and (3) 40–50: 1, others: 0. Among the three binary variables, only (1) was statistically significant and was included in the final model. Meanwhile, this paper divides road types into horizontal curves and profiles. Table 4 shows the definitions of the variables and their codes. Note that the ratio of age and gender in this paper has been biased. However, the experiment sample is still increasing as the experiment continues, in the future study, the model will be optimized based on the new data set.
4.3. Driving-Safety SEM Construction
An initial SEM model is developed using the descriptive data reported in Table 4. Several variables, such as those based on driver characteristics, vehicle driving characteristics, and road environment, are set to “X” observed factors that could be divided into several factors with similar characteristics (i.e., exogenous latent variables in the SEM). Other variables are set to “Y” observed factors, which represent endogenous latent factors associated with aggressive driving, such as “Aggressive vehicle control” or “Risky-scenario-related characteristics,” in the SEM.
In the initial model, the model includes five categories: driver characteristics, vehicle driving characteristics, road environment, aggressive vehicle control, and risky-scenario-related characteristics. Notably, age, sex, distraction, and workload were chosen as representative demographic variables because these variables were shown to significantly affect driving safety in previous studies [9]. Meanwhile, the driver characteristics are expressed by those variables. Besides, the vehicle driving characteristic category included speed and acceleration. The road environment is expressed by road type, traffic flow, and weather. The aggressive vehicle control category included ALA-ACC and ALO-ACC. The risky-scenario-related characteristics include ATT and speeding.
The initial model is developed using descriptive data analysis based on experience. The variables are reclassified based on the factor analysis to ensure a matching degree of each variable for latent factors. This paper used SPSS 20.0 to assess the effect of factor analysis. In this paper, the analysis revealed that KMO = 0.62 and . This shows that the indicator data are suitable for factor analysis. Then, five common factors are extracted based on the criterion of eigenvalues >1 (Figure 4). The total contribution of the five factors was 54.66%, where factor 1 explained 18.34%, factor 2 explained 10.86%, factor 3 explained 9.97%, factor 4 explained 8.51%, and factor 5 explained 6.97%. The varimax rotated factor analysis results and the factor loads of each item are presented in Table 5. Finally, the variables of latent factors are adjusted based on the factor loadings. The first factor is “risky-scenario-related characteristics” because aggressive TTC and speeding items have high factor loads. The second factor is “aggressive vehicle control” because aggressive longitude and lateral behavior have high factor loads. The third factor is “road environment,” including weather, horizontal curves, profiles, and traffic flows. Besides, the fourth factor is “driver characteristic,” which is associated with age, sex, and distraction. The fifth factor is the “vehicle-driving characteristic” factor because items such as speed and lateral acceleration have high factor loads.

Importantly, the result of the factor analysis is used as a reference, and several variables are reclassified into other factors based on the practical meanings of some observed variables. For example, the workload in “vehicle-driving characteristic” factors is reclassified into factor 3 (driver characteristic). In addition, the aggressive longitude acceleration in factor 5 is reclassified into factor 2 (aggressive vehicle control).
4.4. Results
Based on model tests conducted to select the available factors, five factors are used in the model. In addition, two factors are selected as endogenous latent factors. Exogenous latent factors include driver characteristics (age, sex, distraction, and workload), vehicle-driving characteristics (speed, longitude acceleration, and lateral acceleration), and road environment (weather, horizontal curve, profile, and traffic flow). In addition, endogenous latent factors encompass risky scenario-related characteristics (ATT and speeding) and aggressive vehicle control (ALO-ACC and ALA-ACC). This paper checked the multicollinearity between latent variables based on the correlation matrix. The result shows that the correlation coefficients of each variable are all less than 0.5. There has no significant correlation between each variable; therefore, the influence of multicollinearity between latent variables is not considered in the SEM. Then, univariate normality was assessed to determine the estimation method of the SEM. The ML estimation method is employed in this paper. The final SEM is depicted in Figure 5.

In Figure 5, rectangles represent observed variables; ellipsoids represent unobserved latent factors, and arrows from the observed variables to latent factors represent regression paths. Additionally, circles with arrows pointing toward each observed variable represent measurement error. Moreover, each latent factor is connected to every other factor by a curved two-headed arrow, denoting that every factor covaries with other factors. In this paper, the SEM model consists of 5 latent factors and 15 observed variables.
According to the SEM model results shown in Figure 5, we compare the effect of each variable on the latent factors. In the X measurement model, age was the variable that most significantly affected driver characteristics (factor load = 0.99). Weather is the most significant variable (factor load = 0.99), as was the horizontal curve, which is the variable that most significantly negatively affects the road environment (factor load = -0.65). Speed is the variable that most significantly influences vehicle driving behavior (factor load = 0.99). In the structural model, the main factor influencing AVC is vehicle driving characteristics, and the main factor that influences RSC is the road environment. Therefore, to improve driving safety, vehicle driving behavior, and road environments must be fully considered. In the Y measurement model, AVC is mostly associated with COA, and ATT mainly influences RSC.
The chi-square test is a widely reported goodness-of-fit index used in SEM analysis. If the model fits the data well, the chi-square value should be small, and the value associated with the chi-square should be relatively large. In this paper, the estimated model possesses 52 degrees of freedom, with a chi-square value of 65.379 (), suggesting that the model fit is acceptable. Table 6 also shows the goodness-of-fit statistics of the SEM.
5. Discussion and Conclusions
In this paper, an aggressive driving correlation model is presented to explore the relationship between aggressive driving and driving safety factors. This model integrates factor analysis and SEM using data collected from the NDS. Driving-safety factors and aggressive driving are defined as the latent factors in the initial SEM based on experience. The variables were reclassified based on the factor analysis to ensure a matching degree of each variable for driving-safety factors and aggressive driving. The results from the case study using the SH-NDS show good performance of the framework to explore the relationship between driving-safety factors and aggressive driving.
In this paper, 15 variables are grouped into five latent factors in the SEM. The findings reveal that compared with other variables, age most positively affected driver characteristics, which in turn affected aggressive driving: older drivers exhibit high AVC and high TTC risk. This finding coincides with those of [44], who pointed out that young but experienced drivers are safer than others. This result may be due to young drivers having quick responses, which help during urgent driving situations, and are good at adopting new smart traffic technology, such as connected vehicle technology. Adverse weather negatively affected driver behavior (lower speed and high longitude acceleration). In this paper, adverse weather included low visibility situations due to rain, fog, and other conditions. These situations can increase driver anxiety, which can lead to aggressive driving. As in this paper, adverse weather is the main environmental factor associated with aggressive driving [45].
In summary, lower driving safety levels were more associated with older driver age, adverse weather conditions, and high-speed operating conditions than other factors. The results also show that driver behavior was the main factor influencing aggressive vehicle control, and the environment was the main factor influencing risky scenario-related aggressiveness. In addition, driving safety was affected by traffic control (outside of the vehicle) and driver behavior control (inside the vehicle). Therefore, different measures should be adopted to improve driving safety for different institutions. Considering these results, the following recommendations are made to reduce aggressive driving and improve driving safety. Traffic management can improve driving safety by enhancing the road environment (e.g., optimizing road alignment). Enforcement can deter bad driving behaviors and increase driving safety. Meanwhile, the research on driver characteristics in this paper (such as unique designs for older drivers) could be considered in designing human-machine interfaces (HMIs).
Despite the above contributions, this study has some limitations. Additional research should be carried out on this topic to validate the proposed method. Enough data should be collected for model calibration and the boundary curve determination, road environment variables should be determined in a more precise way. The influence of other variables like overconfidence or time pressure driving behavior, law enforcement, professional or nonprofessional driver, and education level on aggressive driving behavior remains largely unknown. In future work, we will create a scale based on the applications of the driving anger scale and the accident proneness scale for collecting driver attitude data. Relationships between common factors identified and more variables, between aggressive behavior and other driving safety factors (e.g., driver actions, law enforcement), should be further explored. Meanwhile, the future study will uncover the relationship between aggressive driving and crash risk (e.g., crashes or traffic conflicts).
Data Availability
The data used in this paper are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was supported by the National Key R&D Program of China (2019YFB1600703) and China Study Abroad Scholarship.