Abstract

This study aimed to evaluate the driving behavior of taxi drivers in Isfahan, Iran, and assess the probability of a driver being among the high-risk taxi drivers. To identify risky driving behaviors among taxi drivers, the Driver Behavior Questionnaire (DBQ) was used. By collecting data from 548 taxi drivers, exploratory factor analysis identified the significant components of DBQ including “Inattention errors,” “Inexperience errors,” “Lapses,” “Ordinary violations,” and “Aggressive violations.” K-means clustering was conducted to cluster taxi drivers into three risk groups of low-risk, medium-risk, and high-risk taxi drivers based on their self-reported annual traffic crashes and fines. In addition, logistic regressions identified the extent to which drivers’ crashes and traffic fines are related to their driving behavior, and therefore, what aberrant driving behaviors are more important in explaining the presence of taxi drivers in the high-risk cluster. The results revealed that the majority of participants (66.78%) were low-risk taxi drivers. Aggressive violations and ordinary violations were significant predictors of taxi drivers being in the high-risk group, while inattention errors and aggressive violations were significant predictors of being in the medium/high-risk cluster. The findings from this study are valuable resources for developing safety measures and training for new drivers in the taxi industry.

1. Introduction

Driving behavior is one of the most important predictors of crashes and one of the factors affecting traffic safety [2]. However, differences have been observed between the driving behavior of people who pursue driving as a job (such as taxi drivers) and people who drive for daily purposes [3]. Excessive driving hours, workplace stress, and constant presence in traffic have caused professional drivers to exhibit more aberrant driving behavior, which can have significant effects on road safety. Aberrant driving behavior is generally referred to as driver behavior that can increase the likelihood of crashes or increase its severity. Taxi industry is an important entity of the transport industry as it accounts for a large share of intra-city trips. As a result, taxi drivers’ risky driving behavior poses a risk to a wider range of road users.

The risky behavior of a taxi driver can directly increase the chances of a taxi crash [2]. These behaviors mostly include speeding, avoiding a complete stop when entering the main street, a sudden change of direction for the passenger to board, as well as passing a red light [4]. Due to the long working hours and the pressure of delivering their service in repeatedly stressful conditions, there is growing evidence that there is a high prevalence of stress and mental health issues among taxi drivers than other jobs [5, 6]. For instance, Abd Rahman et al. [7] confirmed that there is a close relationship between mental workload and driving performance, and drivers are more likely to perform violations in very complex traffic situations such as city driving. In addition, the emergence and popularity of shared taxis are negatively affecting their work environment and income [8]. All of these factors may lead to a high probability of risky driving behavior by taxi drivers. In addition, high self-confidence in one’s driving skills and a great deal of effort required to search for a passenger can exacerbate these high-risk driving behaviors.

Previous literature showed that drivers who work professionally are more prone to unsafe driving behavior. This tendency increases the chances of crashes for these people [3]. This is even more surprising when it comes to taxi drivers. Although taxi drivers often have high driving experience, high driving skills, and also have more capabilities in controlling the car, they exhibit more unsafe and high-risk driving behavior than other drivers [9]. These unsafe behaviors do not only impact other road users but also many passengers who intend to use taxis [10]. In addition, pedestrians play an important role in the safety performance of taxis. The World Health Organization, in its annual safety report, showed that the share of pedestrians in crashes that lead to death in Iran is 23% [11]. Consistent with these results, Adl [12] confirmed that taxi crashes with pedestrians are responsible for a large share of casualties and injuries in crashes in Iran. Even in developed countries, taxi and pedestrian crashes account for a high percentage of all crashes. For example, in Australia, they were one of the leading causes of death due to crashes [13] and that taxis are disproportionately involved in pedestrian crashes [14].

1.1. Objectives and Scope of the Study

Taxi drivers are much more likely to show risky driving behavior in metropolitan areas than other drivers due to the long working hours, high self-confidence in their driving skills, and constantly seeking passengers. As a result, examining the safety of taxis and the factors affecting them is of great importance. Public transportation and, more specifically, taxis in most of Iran’s metropolises have always been associated with overcrowding, low comfort, long and unreliable waiting times, as well as long travel times. In addition to the above, exposure to crashes and poor taxi safety are the most challenges faced by the taxi industry. This study aims to assess the driving behavior of taxi drivers and the significant factors affecting their risky behavior. The Driver Behavior Questionnaire (DBQ) identified risky driving behaviors among taxi drivers. Furthermore, by investigating the rates of drivers’ crashes and traffic fines, drivers were clustered based on their driving risk. The probability of the driver being in each risk cluster according to the driver’s driving behavior was then determined. Finally, the factors of driving behavior that can affect the level of taxi safety and pose a risk to passengers, and other road users were identified. In summary, the contributions of this study are as follows:(1)This study investigated risky driving behavior among taxi drivers in Isfahan by using a driving behavior questionnaire and self-reported annual property damage only (PDO) crashes and traffic fines. This study converted DBQ questions into several uncorrelated components by exploratory factor analysis to assess the aberrant driving behavior among taxi drivers.(2)Due to the high number of PDO crashes and traffic fines among taxi drivers in Iran, this study proposed to use the rates of annual PDO crashes and traffic fines as criteria to cluster taxi drivers to identify drivers’ risk groups.Clustering drivers can help to present robust logistic regression models to identify significant factors affecting their risky behavior and the probability of a driver being in each risk cluster.(3)Considering that all types of risky driving behavior cannot be significant in explaining high-risk taxi drivers, this study compared drivers’ risk groups based on the driving behaviors. This study has established two logistic regression models to identify risky driving behaviors that are correlated to risk levels. These models can contribute to identifying the probability of a taxi driver being in each risk cluster according to the driving behavior.

The proposed approach in this study may help decrease the likelihood of crashes and violations for the taxi driver. This study may assist in the creation of more complete criteria for employing new drivers and continuing their activities in the taxi organization. Given that driving behavior directly affects the likelihood of crashes and driver risky behavior, identifying the factors influencing the formation of aberrant driving behavior can improve policies to increase taxi safety as well as improve driver training in this area.

The paper is organized as follows. Section 2 describes the previous literature by concentrating on taxi safety and taxi driver behaviors, and also their relation with taxi crashes. It is then followed by Section 3, which presents the methodology for survey design and procedure, participants and data analysis. Sections 4 and 5 present the results of the study and relevant discussions from the models, respectively. Finally, Section 6 shows the conclusions with a summary of the key results, implications for future research, and limitations.

2. Literature Review

2.1. Taxi Safety

The main goal of governments in providing public transportation for citizens is to provide public transportation facilities with features such as good accessibility and affordability for all people. Among them, taxis are of particular importance. White and White [15] defined a taxi fleet as transportation for the general public without ownership. Dridi et al. [16] emphasized that public transportation, especially taxis, should provide fast, high-quality transportation for all members of society, especially those with greater vulnerabilities, by following regulations and safety schemes. Taxi driving has always been considered one of the riskiest jobs [17]. A constant presence in traffic, work-related fatigue, safety issues and problems, and in addition to all these cases, constant conflicts and disputes with passengers and peers are the causes of the high risk of this job. In the past, various methods have been considered in different countries to increase the safety of taxis and reduce the risks associated with work. These methods include changing tax-related policies or using more advanced equipment to continuously monitor taxis, such as the Global Positioning System [18]. Although these methods focus on examining the presence and severity of hazards in the workplace, personal factors related to the driver are one of the most effective ways to increase the safety of the taxi fleet.

The nature of work as a taxi driver is very different from other jobs. Working hours are not clearly defined and can fluctuate on a daily basis. In addition, the emergence of shared taxis has led some travelers to use shared rides, which can negatively affect taxi drivers’ income. All of this has made the management system have a difficult task overseeing taxis and their safety policies. One of the main problems in employing taxi drivers in Iran is the lack of a taxi license and also the lack of a specific system for evaluating and training a taxi driver job applicant. As a result, an applicant can start working as a taxi driver with a normal driver’s license and an age requirement of 22 years, without passing any training or exams [19]. These criteria can cause unsafe drivers to enter the taxi fleet and reduce the safety of taxis. Meanwhile, in Iran, taxis have a maximum capacity of 4 people (maximum capacity in most countries with a taxi safety program is 2 people).

In previous studies on the role of driver characteristics on taxi safety, factors affecting taxi safety were examined. Vahedi et al. [2] showed that factors such as the age of taxi drivers, marital status, violations of rules and regulations, and daily driving rates could affect the likelihood of crashes in taxis. Al-Ghamdi [20]; on the causes and factors of taxi crashes, reported that violations and speeding of taxi drivers are the main causes of crashes among taxi drivers. Borowsky and Oron-Gilad [21] conducted a study on the effect of driving experience on the ability to perceive danger for non-professional drivers with low driving experience, highly experienced non-professional drivers, young drivers, and taxi drivers. This study showed that taxi drivers are more likely to perform risky behaviors than other drivers. La et al. [22] asserted that high-risk taxi drivers tend to have a longer average daily driving time and a shorter daily rest time in terms of the likelihood of crashes. Meng et al. [23] evaluated the effect of fatigue on the driving ability of taxi drivers. As a result of this study, it was found that taxi driver fatigue affects the speed of braking reaction, vehicle control, and steering control.

2.2. Taxi Drivers’ Driving Behaviors

A questionnaire survey method is commonly used to identify high-risk behaviors of taxi drivers. Reason et al. [1] first developed a 50-item DBQ to assess violations, errors and lapses. Other researchers also developed new versions of DBQ to evaluate other factors such as aggressive violations [24, 25]. Risky behaviors of the taxi driver have a significant impact on taxi safety. Rundmo and Iversen [26] showed that drivers’ driving behavior was closely related to the likelihood of crashes. Dangerous driving Taxi drivers are often seen in the form of violations, aggressive violations, driver errors, or lapses. Focusing on aggressive driving behavior, Sullman et al. [27] designed a questionnaire to identify the risky characteristics of a taxi driver. Zhang et al. [28] also proposed a questionnaire to identify the cause of high-risk behavior of Chinese taxi drivers. Four factors were obtained from the survey data. This study also presented the effect of driving behavior with driving attitude, driving skills, and driver personality to analyze the characteristics of high-risk behavior of Chinese drivers. In recent years, the use of large data such as taxi location system data has also been used to collect information on driving behavior that can be used to identify factors affecting high-risk driving behavior. In a recent study by Huang et al. [29]; based on data from the location system in the taxi as well as the taxi meter, the effect of taxi mileage, travel time, and taxi fare on taxi driver driving behavior was confirmed. Combining GPS data with video data, Li et al. [30] stated that indicators of driver risk perception, decision-making, and driving style of taxi drivers can directly affect taxi performance, all of which are critical to assessing the safety of driving behaviors.

Some risky driving behaviors that can explain the taxi crashes are not unique to the job and are common to all drivers. In contrast, some behaviors are common among taxi drivers. For example: misunderstanding the distance to the car in front, maneuvers and sudden changes of direction, Crossing traffic lights, and using cell phones while driving [31]. Also, other factors can directly lead to risky behaviors of taxi drivers. For example, driving at night increases the risk of crashes and injuries [32]. Competing for more passengers can also lead to risky and aggressive driving behaviors. The important problem is that many of these risky behaviors seem to be normal among taxi drivers, so much so that they consider these behaviors necessary in order to earn more income. Tseng [3] examined the relationship between driver performance in terms of speed and risky behaviors. Among taxi drivers who are looking for passengers in various ways, speeding has been reported as the most important risky behavior.

2.3. Relationship between Crashes and Driving Behavior

The rate of traffic crashes in Iran is much higher than the global average. Accordingly, the country suffers from the far-reaching consequences of driving injuries, deaths, and costs imposed on society. Many Iranian transportation experts attribute this issue to various factors such as driver, vehicle, and road problems in Iran. Therefore, more research is needed on the relationship between traffic crashes and the factors that affect them. There is a growing trend among researchers to examine the relationship between crash predictors and the likelihood of their occurrences, such as road environment, traffic, and weather conditions [33]. In addition, a large number of studies are investigating the causes of human-caused traffic crashes as the most important predictors of traffic crashes [34]. Since traffic crashes are the main cause of injuries and the second leading cause of death in Iran, it is important to identify the factors affecting these crashes. The main purpose of this section of the literature review is to focus on the factors affecting the occurrence of crashes as well as risky driving behavior on predicting the probability of crashes among taxi drivers.

The main causes of traffic crashes can be divided into three categories: environmental factors of the road, factors related to the vehicle, and factors related to the driver. As a result of examining the rate of road, vehicle, and driver factors in traffic crashes, road factors 28 to 34%, vehicle factors 8–12%, and driver factors 93 to 94% can affect crashes [35]. In this field, a proper understanding of the characteristics of drivers that can potentially lead to high crash risks. Therefore, to prevent crashes and reduce their severity, it is necessary to examine each of these factors and provide the necessary solution in accordance with the current conditions of the country under study.

Regarding the relationship between traffic crashes and driving behavior, one reason that researchers have generally used DBQ is the significant relationship that has been reported between DBQ items and the number of driver crashes. The probability of crashes often has a direct and positive relationship with the rate of driver violations. Parker et al. [25] examined the relationship between driving behavior questionnaire factors and the rate of crashes and found that only violations have a significant relationship with the crash rate. Focusing on the role of the driver in crashes, he also classified these crashes into active and inactive groups and reported that the rate of driver violations is directly related to active and inactive crashes. In a more recent study by af Wåhlberg et al. [36]; it was found that errors and lapses could be as effective in predicting the likelihood of crashes as driving violations.

Studies on taxi drivers can be broadly divided into different sections, such as how the driver makes decisions, risky driving behavior, the crash rate and the factors affecting its occurrence, and the driver’s occupational health. In the context of taxi drivers’ decision-making style, Li et al. [37] used data from taxi drivers’ driving behavior to assess taxi drivers’ behavioral preferences in choosing a route. Zhang et al. [28] also used the complimentary game theory method to investigate the effect of passengers on the dangerous behavior of taxi drivers and found that in the case of taxi drivers’ driving behavior, high-speed driving behavior, aggressive driving, and tired driving can lead to crashes. Increase. In China, a survey of taxi drivers found that their attitudes toward traffic violations affected their high-risk driving behavior. Sullman et al. [27] found that taxi driver who exhibited more passive driving behaviors were more likely to be distracted and more likely to involve in crashes. Cheng et al. [10] examined the impact of taxi drivers’ dangerous behaviors on traffic violations and found that crashes were directly related to the rate of driver violations.

Double pressures on the workload of taxi drivers and economic issues have caused taxi drivers to engage in risky driving behaviors frequently [38]. Many studies have shown that the main cause of taxi crashes is closely related to high-risk driving. For example, Wang et al. [39] stated that severe taxi crashes are most likely due to speeding, driving without a seat belt, ignoring signs or symptoms, or committing other high-risk driving behaviors. Sullman et al. [27] stated problems such as loss of vehicle control, loss of focus, and crashes due to errors are more common in taxi drivers. Newnam et al. [40] noted that 86% of taxi crashes in Addis Ababa might be due to driver error, and older drivers are more likely to have crashes. In addition, a significant percentage of educated taxi drivers exhibit unsafe driving behaviors that include breaking the rules, not paying attention to the road, or driving in a state of fatigue. In addition, sleep problems and fatigue can reduce the driver’s attention span, alertness, and reaction, and often lead to traffic crashes [41].

3. Methodology

3.1. Data Collection

The survey was conducted among drivers affiliated with the Isfahan Taxi Organization (ITO), Iran. The ITO is responsible for certifying, training, and supervising drivers in Isfahan. In coordination with the Deputy of Transportation and Traffic of Isfahan Municipality, as well as the consultation of the deputy regional director of the Isfahan Taxi Organization, the survey was conducted from August 3, 2021, to August 14, 2021, at the place of Isfahan Taxi Organization using in-person interview. Since the survey was carried out during the Pandemic, COVID-19 related protocols were followed for in-person interviews. A total of 548 male taxi drivers with valid taxi driving licenses participated in the survey. Participants were 22 to 74 years (mean = 45.91, SD = 12.37), with the driving experience of 2 to 45 years (mean = 23.49, SD = 9.52), and taxi driving experience of 1 to 40 years (mean = 13.24, SD = 7.69). Also, respondents stated that they drive on average 30 to 80 hours a week (mean = 55.88, SD = 17.70), and 50,000 to 130,000 kilometers a year (Mean = 86751, SD = 20820). During the last year, participants have experienced 0 to 4 property damage only (PDO) crashes (mean = 0.68 and standard deviation = 1.01), while they were responsible for these crashes. Also, these drivers have been fined at least 0 and at most 6 times (average = 1.37 and standard deviation = 1.51) by the police during the past year. Table 1 presents a summary of demographic and drivers’ related information.

The authors of this study were constantly present at the ITO during the survey and resolved the problems and ambiguities of taxi drivers regarding the questionnaire. Also, before completing the questionnaire, drivers were given the necessary explanations about the purpose of the study and questions, and they were assured that their identities would not be determined. The pilot test showed that it took drivers about 10 minutes to complete the questionnaire.

3.2. Item Measurement

The questionnaire consisted of two main sections. In the first part, demographic and drivers’ related information along with their record of traffic fines and crashes were provided. The second part of the questionnaire includes 27 questions about taxi driver driving behavior, which is in line with recent studies in the study of driving behavior [42, 43]. The validated 27-item DBQ [24] was applied to measure risky driving behaviour. The original DBQ included four scales of errors, lapses, ordinary violations and aggressive violations. The back-translation method was applied for the survey questions [44]. The original English items were translated into Persian. Then, the items were translated back into English by an English expert. Drivers were asked to indicate how often they committed each of the 27 behaviours in the past year. Respondents used the 6-point Likert scale, which includes always (6) to never (1) to answer the questions. The items of the second part of the questionnaire are shown in Table 2.

3.3. Data Analysis
3.3.1. Factor Analysis

Since DBQ questions were highly correlated with each other, Principal Component Analysis (PCA) was conducted to determine the structure of DBQ items define significant components in order to address the multicollinearity. PCA uses an orthogonal transformation to convert correlated variables into principal components. Kaiser–Meyer–Olkin (KMO) statistics and Bartlett’s test of sphericity were used for sampling adequacy measurement tests, and the suitability of the data for factor analysis. According to Tabachnick et al. [45]; KMO index ranges from 0 to 1, with values more than 0.50 considered suitable for factor analysis, while Bartlett’s test of sphericity should be significant ( value <0.05) for factor analysis to be suitable. The Eigenvalues for each principal component were used as main indicators to select significant components. The number of factors that have an Eigen-value equal to or greater than 1 must be extracted [46].

3.3.2. Clustering

The present study used K-means clustering for the initial categorization of taxi drivers. In this clustering method, parameters are selected as the clustering criteria so that a specific label can be assigned to each data. Previous studies have considered different criteria for clustering taxi drivers. For example, near-crash situations, reports of driver crashes from the police, or self-reports of driver violations have been used in past studies [43, 47]. Considering that 39.8% of taxi drivers reported that they were involved in a crash in the past year, and also 59% of taxi drivers reported that the police fined them at least once since last year, it was expected that the criteria of traffic crashes and fines for clustering would be appropriate in the context of Iran. In this study, self-reported driver crash rate during the past year and also self-reported driver traffic fines rate in the past year were selected as criteria for clustering taxi drivers. To calculate the rate of drivers’ crashes and traffic fines, the annual mileage was used as the exposure factor to calculate rates. Using the experiences of previous studies, three clusters, including low-risk drivers, medium-risk drivers, and high-risk drivers, were assigned to taxi drivers based on their crashes and traffic fines rates. The average Silhouette was used to evaluate the validity of the number of clusters as well as the appropriateness of the clustering. Silhouette value is one of the most widely used criteria for evaluating the effectiveness of clustering and estimating its accuracy [48]. The value of Silhouette is in the range of −1 to +1, and the closer this value is to 1, the better the clustering. In general, the average Silhouette is a measure of how the data is properly clustered. This study used Python coding to conduct clustering and calculate the average Silhouette value.

3.3.3. Logistic Regression Analysis

After clustering drivers into different groups based on their driving risk, a logistic regression model was conducted to estimate the probability of the presence of each taxi driver in the high-risk group. The logistic regression model is one of the regression analysis methods that is often used when the dependent variable is binary. The results of logistic regression models in this study showed which of the DBQ factors has a greater role in increasing the probability of taxi drivers being in the group of high-risk drivers and how reducing each of these behaviors can reduce taxi drivers’ driving risk, which can help to increase the safety of the taxi fleet. Also, the role of annual driving mileage as the exposure criterion was checked in the regression model. The Receiver Operating Characteristic (ROC) curve was used to evaluate and validate the model and determine the predictive ability of the model. This curve determines the sensitivity of the model using two criteria of a positive and negative index. In this study, the sensitivity of the model to assess the probability of the presence of the driver in the cluster of high-risk drivers against his presence in other clusters and to determine the significant DBQ factors in the presence of the driver in these groups. The predictive power of the model is obtained by the area under the ROC curve (AUC), which is a number between 0 and 1, and the higher the value, the greater the power of the model in predicting.

4. Results

4.1. Exploratory Factor Analysis

Using the survey results obtained from taxi drivers in Isfahan, DBQ items were tested. The KMO value was 0.591 and -value <0.001, which showed a good fit (KMO value greater than 0.5 is appropriate). According to the results of PCA, 8 components with eigenvalues greater than 1 were identified. Eigenvalues for each factor are shown in Table 3 and Figure 1.

According to Table 3, the results of PCA showed that 8 components had eigenvalues more than 1, showing that 27 DBQ questions can be converted into 8 significant components containing substantial information. Considering the visual inspection of the scree plot, after the fifth component, other components almost form a smooth line, which can be interpreted that most of the variance is explained by the first 5 components (62.731%); therefore, a 5-component structure was used to convert DBQ questions into sets of uncorrelated items. To define each component and its DBQ questions, the rotated component matrix was presented (see Table 4). For each component, DBQ questions with a coefficient of less than 0.3 were removed from the components due to their lower importance in explaining variance; thus 3 questions of ERR4, LAP2, and AV2 were removed from the analysis. By using the experience of Parker et al. [25]; each component was explained based on its original type, including Lapses, Errors, or Violations. The first component in this study includes questions related to “Lapses.” The second component includes questions related to driver errors. Considering the type of questions in this component associated with these questions, this component is referred to as “Inattention errors.” The third component includes questions related to “Ordinary violations.” The fourth component consists of error questions. However, the difference with the second component is due to the type of errors, so this component is called “Inexperience errors.” The fifth component contains violations; however, due to the aggressive nature of these violations, this component is called aggressive violations.

The results of PCA and reliability indices for the DBQ items are presented in Table 4.

4.2. Risk Clusters

Drivers were clustered into three risk groups, including low-risk taxi drivers, medium-risk taxi drivers, and high-risk taxi drivers. To determine the appropriateness of clustering, the factors of the sum of squares within the cluster and the average silhouette were calculated. The clustering results showed that the average Silhouette is equal to 0.589, which due to being higher than the threshold of 0.5 indicates the appropriateness of clustering. Also, the parameter of the sum of squares within the cluster is equal to 457.66. Figure 2 shows the clustering results based on the annual crashes and traffic fines rates of taxi drivers.

The results of taxi drivers clustering showed that drivers with low driving risk have the highest frequency among taxi drivers. These drivers have lower crash rate and traffic fines than other groups. The second group includes drivers with moderate driving risk so that although they have a low number of crashes per year, they commit several traffic violations annually. The third group consists of taxi drivers with high driving risks. These drivers have a high annual crashes and fines rate. Information about the drivers of each cluster can be seen in Table 5. Information on DBQ items for each cluster is also presented in Table 6.

The results of Table 5 showed that 66.7% of the participants were low-risk drivers, 21.5% were medium-risk drivers, and 11.6% were high-risk drivers. The average number of crashes for high-risk drivers per year is 3.01, which is much higher than low and medium-risk drivers. In terms of the number of traffic fines, high-risk drivers with an average of 2.91 annual traffic violations were the most, followed by medium-risk drivers with an average of 2.64. According to the results of Table 6, and Figures 37 in the appendix, in terms of Lapses, there were high-risk drivers with a weighted average of 1.81 and low-risk drivers with a weighted average of 1.84, which indicated a small difference between the two clusters. Medium-risk drivers had the least number of Lapses. With regard to the inattention errors, high-risk drivers had the highest index with a weighted average of 2.35, followed by medium-risk drivers with a small difference and a value of 2.10. In terms of ordinary violations, high-risk drivers committed the most ordinary violations, and these results were repeated in aggressive violations. In terms of inexperience errors, all three groups were approximately equal; however, medium-risk drivers made more inexperience errors.

4.3. Logistic Regressions

While the results of clustering based on annual crashes and traffic fines rates divided taxi drivers into three risk groups, the main question is which factors of their driving behavior make them high-risk drivers. To clarify, two logistic regression models were presented. The first model aimed to identify significant DBQ factors of a driver being in the group of a high-risk drivers. The second model identified the DBQ factors of a driver being in the groups of high-risk and medium-risk drivers. In the hierarchical logistic regression models, first, the exposure factor (annual mileage) was entered into the model to check for the effect of exposure. Then in the next step, the DBQ components were added to the model. Table 7 presents the results of the first hierarchical logistic regression model.

According to the results of the first logistic regression model, in the first step, the exposure factor (annual driving mileage) was entered into the model. The results showed that this variable could significantly predict the probability of a driver being in a high-risk cluster, and the distance traveled annually further increases the chances of the driver being in the group of high-risk drivers. (Chi-2 = 28.62, -value = 0.001). Also, aggressive violations and ordinary violations had the most significant impact on taxi drivers being in the high-risk group. According to the odds ratio, drivers with more aggressive violations are up to 28 times more likely to be among the high-risk drivers.

The ROC chart was used to estimate the suitability of the model performance. Figure 8 shows the ROC curve for the first model. The area under the curve shows the AUC value, which varies from 0.5 to 1, and the closer it is to 1, the more accurate the model predicts. The results showed that the first model has an AUC of 0.906, which indicates a good prediction provided by the model. Table 8 shows the correct number of predictions of the logistic regression model by clusters. In the group of high-risk drivers, the proposed model correctly predicts 39 drivers out of 64 drivers. In the group of low and medium-risk drivers, the proposed model correctly predicted 461 drivers out of 484 drivers.

Table 9 presents the results of the second hierarchical logistic regression model. In this model, the DBQ factors affecting taxi drivers being in the medium/high-risk clusters were evaluated.

According to the results of the second logistic regression model, the exposure factor first entered the model. The results showed that the annual mileage significantly explained the placement of drivers in the two groups of high/medium-risk drivers, and the annual mileage increases the chances of a driver in these groups (Chi-2 = 81.07, -value: 0.001). In the next step, aggressive violations and inattention errors had the greatest impact on taxi drivers being in the group of medium/high-risk drivers. The results of the ROC indicated the appropriateness of the predictive power of the proposed model with the AUC of 0.861. Figure 9 shows the ROC curve for the second model. Table 10 also shows the accuracy of the prediction with respect to the clusters. According to Table 10, the model correctly predicted 310 drivers out of 366 drivers who were in the group of low-risk drivers. Also, the proposed model correctly predicted 135 drivers out of 182 drivers who were in the cluster of high/medium-risk drivers.

5. Discussion

The purpose of this study was to investigate risky driving behavior among taxi drivers by using the driving behavior questionnaire and self-reported annual PDO crashes and traffic fines. Due to the high number of PDO crashes and traffic fines among taxi drivers in Iran, this study used the rates of annual PDO crashes and traffic fines as criteria to cluster taxi drivers to identify drivers’ risk groups. Based on the results of the clustering, two logistic regression models were presented to identify significant factors affecting risky behavior and the probability of a driver being in each risk cluster.

According to the results of the principal component analysis, among the 27 DBQ questions proposed in the survey, 24 significant questions were extracted. The questions included 5 main components of aggressive violations, ordinary violations, inattention errors, inexperience errors, and lapses. Consistent with the results of previous studies, different components for taxi drivers’ driving behavior were extracted from DBQ. In all of these studies, the three factors of errors, lapses, and violations explained a large proportion of variance in aberrant driving behavior [1]. Other studies, however, added new factors to taxi drivers’ driving behavior so that they could further explain their aberrant driving behavior. Rimmö and Åberg [49], presented four factors of driving behavior using the DBQ questionnaire, inattention errors, inexperience errors, lapses, and violations. Based on information from Iranian drivers, Tavakoli Kashani et al. [50] identified four components of driving behavior. These four factors included errors, lapses, ordinary violations, and aggressive violations. It was in line with the study of Mesken et al. [51] regarding the most significant DBQ components. In a more recent study, consistent with the results of the current study, Wang and Xu [43]; using exploratory factor analysis, divided driving behavior questions into five components of inattention errors, inexperience errors, lapses, ordinary violations, and aggressive violations. Compared to the three-factor structure, including violations, lapses, and error, first developed and completed by Parker et al. [25]; this study examined violations and errors more with more components. Also, the differences between inattention and inexperience errors, and differences in the severity of violations were argued.

Based on the annual rate of PDO crashes and traffic fines of taxi drivers, the K-means clustering was conducted to classify taxi drivers in order to better investigate the significant driving behavior factors of each cluster. Previous studies have considered different criteria for driver clustering. Due to the importance of crashes in explaining the aberrant driving behavior of taxi drivers, many studies have considered traffic crashes as a clustering criterion. For example, Parker et al. [25] divided crashes into active and inactive categories and examined the role of the driver in these crashes. The results showed that both active and passive crashes factors were associated with driver violations. In addition, a recent study, af Wåhlberg et al. [36] showed that the extent to which drivers make driving errors could predict crashes. Also, de Winter and Dodou [52]; by using a review approach of the DBQ-based studies, showed that out of 76 surveys, in 42 cases, there is a significant relationship between the rate of crashes and aberrant driving behavior. Contrary to many previous studies, Wang and Xu [43] suggested using sensors, installed in the car to record near-crash situations. Based on the number of near-crash situations, the drivers were clustered into different risk groups. Some studies in this context also stated that using the rates of crashes and traffic fines have some limitations as clustering criteria due to the low rate of crashes among drivers [43, 53]. However, in the country of study, Iran, 39.8% of taxi drivers reported that they were involved in a crash in the past year. Also, 59% of taxi drivers reported that the traffic police fined them at least once since last year. These high rates of traffic crashes and fines showed that the criteria of traffic crashes and fines for clustering would be appropriate in the context of Iran.

The results of the clustering revealed that the majority of taxi drivers (66.78%) were low-risk drivers, while 21.53% of them were clustered in the medium-risk group, and 11.67% of drivers were high-risk drivers. For low-risk drivers, the average annual number of PDO crashes and traffic fines was 0.35, and 0.29, respectively. The medium-risk drivers reported a marginally higher number of annual PDO crashes (0.69) compared to the low-risk cluster; however, the number of annual traffic fines was 2.64, which was significantly higher than the low-risk group. Both figures for high-risk drivers were considerably higher than low and medium risk groups (2.91 for the annual number of traffic fines and 3.01 for the annual number of PDO crashes). According to these results, taxi drivers with a higher number of traffic fines are more likely to be medium-risk drivers, and taxi drivers with a higher number of PDO crashes are more likely to be high-risk drivers. According to the mean values of DBQ components for each cluster, low-risk drivers reported a higher level of lapses in comparison with other risk groups. It can be interpreted that taxi drivers’ lapses (e.g. misread the signs and exit from a roundabout on the wrong road, or no clear recollection of the road along which driver has just traveled) could not increase the drivers’ risk, and other factors play a more significant role in explaining the drivers clustered in higher-risk groups. The results also revealed that medium-risk drivers reported higher inexperience errors (e.g. underestimate the speed of an oncoming vehicle when overtaking, or brake too quickly on a slippery road) compared to other clusters. For other components (inattention errors, ordinary violations, and aggressive violations), high-risk drivers reported a higher level of them compared to low and medium-risk drivers. These results are consistent with the results of Wang and Xu [43]; stating that high-risk taxi drivers had the highest values in ordinary violations and inattention errors. To better visualize the distribution of DBQ components among risk clusters, boxplots were presented in Appendix. The significantly higher mean values in ordinary violations and aggressive violations for high-risk drivers are obvious, according to the boxplots.

The results of this study showed that in terms of significant factors in clustering taxi drivers in the high-risk group, violations (ordinary violations and aggressive violations) play the most important role. Exposure factors (average annual mileage) was also found to be significant in explaining a high-risk driver. Regarding the ordinary violations, most cases of aberrant driving behavior do not have a direct and dangerous effect on other drivers. However, in aggressive violations, other drivers are directly affected by the risky driving behavior of taxi drivers. For example, expressing anger by chasing a car or suddenly changing lanes are examples of aggressive violations. The results of the logistic regression model stated that drivers who committed ordinary and aggressive violations are more likely to be in the cluster of high-risk drivers. In other words, they have higher rates of crashes and traffic fines. These results were also confirmed by Wang and Xu [43]; who stated that ordinary violations and inattention errors are the most important factors that can put the driver in a near-crash situation. In addition, the results of this study regarding the role of aggressive violations in high-risk taxi drivers in both models were consistent with previous studies. According to the previous literature, it was asserted that both ordinary and aggressive violations are the most important predictors of traffic crashes [1]. According to Parker et al. [25]; drivers who commit more traffic violations often make an impact on the occurrence of a crash. In terms of policy, Taxi Organization should consider the important role of driving violations in increasing the rate of crashes and traffic fines; therefore, training programs and workshops can be started by Taxi Organization to present the current unsafe situation of Iran’s taxi industry and its significant reasons. Furthermore, as a possible solution, the control of such behaviors, especially in drivers who tend to commit violations, is necessary by changing their attitudes, as well as recognizing their psychological background and changing driving beliefs and norms [54]. The main positive outcome of this intervention is to increase the safety of taxi drivers.

This study also showed that inattention errors could explain the taxi drivers being in the cluster of medium/high-risk drivers. According to the DBQ, inattention errors include not paying attention to pedestrians crossing a side street from the main road, failing to see the right-of-way sign and the possibility of colliding with cyclists while driving. Regan et al. [55] defined inattention errors as not paying enough attention to driving or the road due to not concentrating enough or doing other activities when driving. As a possible solution, Regan et al. [55] proposed driver education as one of the most effective measures. In addition, providing safety training methods such as classes and workshops for drivers to improve decision-making power in critical situations, as well as the use of driving assistant systems for taxi drivers (e.g. collision warning systems), can be widely used to control errors caused by inattention. Since policymakers and experts in Tehran Taxi Organization believed that drivers’ training, including workshops and in-vehicle training, are very effective in improving the safety of the taxi fleet [19], it is expected that introducing training programs for taxi drivers by focusing on the importance of violations in being a high-risk driver and errors in being a medium risk driver can possibly contribute to decreasing the number of crashes and improving the safety of the fleet. Also, the annual records of taxi drivers’ crashes and fines can be used as a reference for Taxi Organization to supervise and control taxi drivers and identify high-risk ones. Therefore, it is suggested that Taxi Organization cooperates with the police authority to receive crash reports on a regular basis. Among the DBQ factors that indicated the relationship with the rate of traffic fines and crashes of taxi drivers, two factors of lapses and inexperience errors had the least effect on increasing the risk of taxi drivers. Since inexperience errors and lapses are both related to driving skills and experience, one of the possible reasons for them not being significant compared to violations could be false reporting of drivers in the driving behavior questionnaire. This is plausible because many drivers may be overconfident about their driving skills as they reported inexperience errors as well as lapses below the actual value.

6. Conclusions

6.1. Summary

This study aimed to cluster taxi drivers according to their driving risk by presenting different statistical models and also to identify different factors that can put the taxi driver in the group of high-risk drivers. According to the results of the PCA, among the 27 initial DBQ questions proposed in the survey, 24 significant questions were extracted. The questions included 5 main components of aggressive violations, ordinary violations, inattention errors, inexperience errors, and lapses. According to the results of clustering, three risk-related clusters were proposed for taxi drivers: drivers with low driving risk, drivers with medium driving risk, and drivers with high driving risk. Annual crashes rate and annual traffic fines rate were adopted as the main criteria for clustering taxi drivers. Also, the average annual mileage was used as the exposure criterion to calculate crash and traffic rates. According to the results, the majority of taxi drivers (66.78%) were low-risk drivers. They reported a higher level of lapses compared to other risk groups, showing that lapses cannot contribute significantly to increasing the number of PDO crashes and traffic fines, and cluster drivers in higher-risk groups. High-risk drivers considerably reported a higher level of violation (ordinary and aggressive) in comparison with low and medium-risk drivers. Committing ordinary violations by taxi drivers as well as committing aggressive violations can increase the probability of drivers being in the high-risk group. Medium-risk drivers reported a higher level of inexperience errors; however, the low/high-risk drivers were more likely to report a higher level of aggressive violations and inattention errors.

6.2. Limitations and Future Research

The present study has several limitations. First, this study used the drivers’ self-report questionnaire to assess the rate of driver crashes and traffic fines, which has many limitations–for example, the low number of annual crashes, or inaccurate reports of the number of crashes and violations. However, due to the high number of PDO crashes and traffic fines among taxi drivers in Iran, the criteria of crashes and fines rate can be appropriate in this context. For future studies in other countries with a lower percentage of crashes and traffic fines, it is recommended to have coordination with the taxi organization and the traffic police to collect the report of crashes and violations more accurately. The use of the record of crashes instead of self-reported data collection would be more practical due to the lower rate of crashes. Second, the present study focused on DBQ factors that can affect the presence of drivers among high-risk taxi drivers, and many factors were not considered in this regard. For example, economic issues, management systems, as well as the work environment can be examined in more depth, and their impact on the aberrant behavior of taxi drivers can be determined. Finally, this study was conducted during the COVID-19 pandemic; therefore, these conditions may affect the results.

Appendix

Distribution of 5 DBQ Components among Risk Clusters

Figures 37 show the distributions of 5 DBQ Components among Risk Clusters via boxplots.

Data Availability

Data can be made available by contacting the second coauthor, Dr. Kayvan Aghabayk (kayvan.aghabayk@ut.ac.ir).

Conflicts of Interest

The authors declare that they have no conflicts of interest.