Abstract
As the need to assess the level of road safety grows, there is a noticeable tendency of experts to use one overall composite index that contains information on a number of safety performance indicators (SPIs). Indicators commonly used in road safety assessment are numerical, and their natural uncertainty and vagueness are often overlooked. However, there are also SPIs that are rather linguistic, such as data on driver behavior, which are most often collected through questionnaires and are considered qualitative, imprecise, and fuzzy. Together with inappropriate selection of weighting and aggregation methods, such data can be a source of uncertainty and can lead to unreliable results and erroneous conclusions. In this regard, the present study provides a systematic and efficient hybrid method that integrates three different procedures to deal with unavoidable uncertainty in each step of index construction. The application of fuzzy linguistic rating grasp insight into the ambiguity that is intrinsic in drivers’ self-assessment. Entropy describes each observed behavior by quantifying the disorder of a system. Grey relational analysis aggregates behavioral indicators into a composite index, doubting their sufficiency and completeness. A case study of Montenegro has been provided to demonstrate the practical applicability of the proposed method in safety assessment under uncertainty. Results abstracted not wearing the seatbelt as the most common negative behavior among drivers in Montenegro, followed by using the telephone while driving, speeding, and driving under the influence of alcohol. In addition, municipalities are ranked according to the level of road safety.
1. Introduction
The aggregation of different variables into one safety performance index is a popular concept in evaluating road safety and in comparing the performance of territories. Researchers have established a comprehensive set of safety performance indicators (SPIs), namely composite indicators, that takes into account both direct and indirect influences of road safety risk factors. Hereupon, a large number of SPI combinations were developed to create an overall composite index, for example [1–3].
Indicators commonly used in road safety assessment are numerical and their natural uncertainty and vagueness are often overlooked. The data collected may sometimes be the result of a measurement error or the use of proxy data, whose reliability is, therefore, questionable [4]. Furthermore, SPIs that describe drivers’ attitudes and behavior are usually collected by measuring with physical devices and from questionnaires [5–7]. The indicator value, which is usually the mean value of all the respondents’ answers, cannot be considered completely accurate [8]. Not only by arbitrarily and carelessly recording opinions and attitudes but also by not understanding the questionnaire (or some part of it), respondents can cause uncertainty and vagueness of the collected data [9]. Respected that a road safety score is as reliable as the data on which it is constructed and that its quality should improve in parallel with the improvement of the quality of the information [10], the question arises about the composite index constructed from such type of data. In addition, the selection of weighting and aggregation techniques in the index construction process can be a source of the unreliability of the obtained results. Therefore, ignoring uncertainty as the inherent feature of the mentioned kind of data will lead to unreliable results, and it is necessary to consider uncertain parameters to preserve the robustness of the safety performance index.
Some authors have suggested different methods for handling imprecise data, and its models are singled out as superior and common to capture uncertainty. The existing models for road safety evaluation commonly consider the data uncertainty in one of the steps: data modeling [11], weighting [12–14], or aggregation [15, 16], and they usually do so only when the subjective opinion of road safety experts is involved [17, 18]. The main aim of this paper is to propose a hybrid method for constructing a road safety performance index that will consider the vagueness and uncertain nature of SPIs related to driver behavior, suggesting the integration of uncertainty-solving techniques in each step. Such a method is needed because most of the countries still do not have a reliable database on road safety, do not have enough accurate data on traffic safety, and data on driver behavior are most often collected using questionnaires. The main contribution of the proposed approach is reflected as follows: both qualitative and quantitative input data can be considered and aggregated together into one composite index. The weighting of all indicators can be obtained without the implementation of any subjective information from the outside, reflecting at the same time the uncertainty of selected indicators. The proposed aggregation technique allows handling uncertainty for system evaluation when the decision makers are unsure about the sufficiency and completeness of the data at hand by providing a set of robust road safety performance indexes for each territory that allow full ranking. Robust and logical results verify the possibility of the application of the proposed method in many other fields besides road safety.
The rest of the paper is organized as follows: in the next section, the feature of SPIs will be discussed and ways to address their natural uncertainty will be outlined. In Section 3, the case study will be presented, followed by the results in Section 4. In Sections 5 and 6, the research will be discussed and concluded.
2. Uncertain Nature of Safety Performance Indicators
It was confirmed that every safety analysis has some degree of uncertainty [10]. Indicators that are usually used to describe the state of road safety rely on empirical data, such as the number of fatalities and injured, the number of accidents, the motorization level, [14, 19],etc. Although crisp, because of real-life application, there are numerous reasons for the uncertainty, imprecision, or greyness of these data values. Measurement error and proxy data often cause relevant data incurrence. For example, recorded SPIs, such as the number of drivers on the roads speeding or driving under the influence of alcohol, are not fully known because it is impossible to get accurate information. SPIs based on traffic violations that are used as indicators by some researchers, [20, 21], are also a reason for the vagueness of the final score since this type of data can represent the approximate situation on roads but not the precise state of safety and drivers’ behavior. Likewise, Lloyd and Forster [22] stated that risk exposures (especially accident risk, which is defined as the rate of accidents per vehicle kilometer traveled) represent uncertain information. Hence, the reliability of the performance index derived from this kind of data is questionable. However, none of the mentioned studies took into account the uncertainty of the data in the matter when calculating the composite index, despite the proven uncertain nature of these data.
Furthermore, data related to self-reported behavior, which are often used as SPIs in index construction, represent a qualitative measure of opinion about one’s own behavior. When collecting data on behavior and perception (usually through questionnaires), the main focus is on their final presentation as, in many cases, the respondents are not able to clearly express their judgments. Besides road users, decision makers are sometimes involved in trying to express their opinion and preference on particular road safety domains. Since the precision of these subjective measures is unlikely, it is assumed that all final values derived from self-reported data and perception data are imprecise [23] and fuzzy [24], and identifying and addressing uncertainties is essential to manipulating this data. The intention to handle them can be found in the literature.
The simplest way to cope with uncertain data is averaging, which implies statistically abstracting the probability distribution (by means and variance). Tsang et al. [25] applied decision trees to handle data sets with uncertain values and showed that the results are more accurate than those using an average value. Xiong et al. [11] analyzed roadway traffic accidents based on rough sets (for data) and Bayesian networks (for aggregation). Chen et al. [12] performed a road safety risk evaluation applying entropy weighting. Entropy weighting is also used in [14] to assess urban road safety and to rank the provinces accordingly. Ayati et al. [26] applied the evidential reasoning (ER) approach to consider the subjective state of evaluation of the roadside hazard severity and to assess the situation more realistically. Rassafi et al. [10] collected the opinions of road users and experts for the purpose of assigning weights in the process of road safety assessment. They also used the ER approach in addition to Dempster–Shafer theory to handle the uncertainty of data. Of the other multicriteria decision-making methods, fuzzy-oriented analytic hierarchy process [18, 27] and data envelopment analysis [28, 29] were mainly used. It is proven that despite the differences, all existing models for addressing uncertainties are related to one another [30].
The widely used fuzzy theory [31] is an appropriate method to process imprecise and inaccurate data, notably when they are described in human language. The same could be said for the grey theory [32]. Both fuzzy and grey theories deal with the uncertainty of human perception in a manner that creates a mechanism of similar reasoning. The application of these theories for addressing uncertainty will be described in the next sections.
2.1. Fuzzy Linguistic Rating
When applying questionnaires to collect road safety behavioral data, the authors are usually conflicted about whether to use parametric or nonparametric scale procedures since it is not fully clear if the data are interval or ordinal. Questionnaires that are usually employed to collect perceptions, ratings, or judgments on many subjects in various fields rely on the most famous and widely used Likert scale [33, 34]. Likert scale traditionally applies linguistic variables that are encoded by means of ranking (in most cases, from 1 to 5–7, e.g., from the least “strongly disagree” to the largest “strongly agree”). This way of measuring opinions is very popular since it is easily conducted. Based on the answers, the final data represent the average value of even-spaced ranks with a given possibility of applying some statistical tools. However, this type of scale has some drawbacks. For instance, some of the information may be lost because of the imprecision of selecting one single value of the given 5–7 options on the rating scale, while the space between each rank cannot be equidistant. Furthermore, the transition between the ranks may be differently perceived based on different respondents, and most importantly, the interpretation of results is not very reliable because applying statistical tools in most cases is limited [33]. In that matter, some authors turned to the usage of the fuzzy theory that refers to addressing the uncertainty and imprecision of data and that relies on possibility and neglects probability [35].
As a part of the fuzzy theory, fuzzy numbers have already been recognized as a tool in psychometric studies when imprecision and subjectivity need to be captured. There are two ways of dealing with these issues. One way is to introduce an alternate scale, question formats, etc., and the other is to develop new methods to analyze this type of collected data. Recently, applying the fuzzy set theory to capture the subjectivity and imprecision of each collected answer has become widespread. In addition, fuzzy numbers are employed to grasp an insight into the ambiguity that is intrinsic to human assessment and rating [9]. In the literature, this method is known as the fuzzy linguistic approach and has been used by many authors in different fields.
The fuzzy linguistic rating scale is most commonly used for obtaining the criteria weights in multiattribute decision problems. Bao et al. [36] used triangular fuzzy numbers to calculate the criteria weights for composite index construction. Mousavi et al. [37] used trapezoidal fuzzy numbers to rate the alternative and the criteria in the manufacturing system. Tseng and Chiu [38] calculated the weights of the green supply chain based on fuzzy linguistic preferences. Awasthi and Kannan [39] applied the triangular fuzzy numbers to derive criteria and alternative weights for the purpose to develop suppliers’ environmental performance. A triangular fuzzy rating scale was used by Stanković et al. [40] and Rostamzadeh et al. [41] to evaluate solutions for road sections and to calculate the significance of the selected transport and logistic problems, respectively. Wersi Qazvini et al. [42] applied the fuzzy TOPSIS approach for the aggregation of alternatives in the process of identifying and analyzing the black spots of suburban areas, while Memiş et al. [17] evaluated the criteria using the fuzzy pivot pairwise relative criteria importance assessment approach to determine and rank the road transportation risk factors.
The above-mentioned studies validate the fuzzy linguistic rating scale as a tool for capturing the imprecision and uncertain nature of the Likert scale. As can be seen, the triangle and trapezoid are the most commonly used shapes of a fuzzy number, representing the compromise between narration and calculation. Unlike the previous practice of using fuzzy numbers to describe decision makers’ opinions, and based on them, to calculate weights of indicators, in this paper, they are employed to describe the driver’s self-reported behavior and accordingly calculate the values of indicators.
The result of the fuzzy approach counts heavily on the degree that fuzzy numbers correctly represent qualitative data, i.e., on assigning the appropriate membership functions for fuzzy numbers. In this paper, the fuzzy rating of each input indicator is described by linguistic expressions. Calculations were done by transferring these variables into trapezoidal fuzzy numbers (i.e., a1, a2, a3, and a4) with the following membership function:
The fuzzy rating scale of the frequency of a particular driving behavior of each respondent is described with one of the six linguistic expressions given in Table 1, Defuzzificated value represents final (crisp) value that will serve as input data in the behaviour index construction.
In that manner, a fuzzy rating scale was used here to model a road user’s self-reported behavior, similar to [38, 43]. Furthermore, on such data, two more methodologies will be applied for weighting and aggregating, with the main aim to model all existing uncertainties and to construct one reliable and robust overall road safety behavior index under an uncertain environment.
2.2. Weighting and Aggregating Methods
Combining SPIs into a complex performance index includes assessing the relative importance of each indicator (assigning weights) and aggregating indicators.
As a weighting technique based on statistical parameters that measure the amount of information in a variable, Shannon’s entropy method can assess the importance of the indicators by reflecting the unevenness of the dataset. Hence, it is suitable for reflecting the uncertainty of behavioral indicators. Hence, in the context of traffic safety, it has recently been widely used to determine the criteria weights in the form of a common set of weights [12, 14, 44].
In addition, the grey theory is a useful model for the analysis of uncertain systems with partially known and partially unknown information in many fields. Grey relational analysis (GRA), as a part of a grey theory, represents a normalization-based technique, which implies the positive values of the data sequence generated by translating the performance of all alternatives into series, and unlike other methods for addressing uncertainties, it works well on small sample size. A combination of fuzzy theory and GRA was used by Ma et al. [45] for the purpose of creating three different sets of road safety indicators related to experts’ attitudes: one related to highways, one related to urban roads, and one related to regional roads. Hu et al. [46] used GRA in addition to fuzzy and data envelopment analysis to develop public transport network evaluation, and both Liu et al. [47] and Grdinić-Rakonjac et al. [48] used GRA to calculate the weights of road safety indicators. In this paper, GRA was applied to the aggregation of weighted indicators and the index construction, as shown in Figure 1.

With this novel hybrid methodology (entropy-based weighting + GRA aggregating), it is possible to create a reliable composite safety performance index of the territory and additionally establish a more rational ranking among the territories under evaluation. The steps of entropy and grey relational analysis are summarized in Table 2.
3. Case Study
The proposed approach is tested for a set of 21 Montenegrin municipalities. Since one of the goals of this research was to identify the most influential road safety risk behavior, the driver is chosen to represent the unit of the analysis rather than any other users involved in the road traffic. The following unwanted behaviors were investigated: driving above the speed limit, driving under the influence of alcohol, using the telephone while driving, and not wearing the seatbelt. Data were collected by conducting face-to-face and online questionnaire surveys among 1309 drivers with a priori determined sample size for each municipality (3% error bound and 95% confidence limits). Descriptive statistics are given in Table 3.
The questionnaire was compiled in accordance with the ESRA and SARTRE methodology, which means that each respondent describes the frequency of certain risk behaviors while driving using one of the values of the language variables: never, rarely, sometimes, often, very often, and always. All questions are divided into those related to main roads, regional roads, and local/urban roads. Finally, drivers answered up to 25 questions grouped into four main domains (V1–V9, A1-A3, T1-T4, and S1–S9) and presented in Table 4. Their answers are given in Table 5 in percentage.
4. Results
The fuzzy theory refers to addressing the uncertainty and imprecision of data. Hence, in this paper, all self-reported behaviors were processed and converted into numerical values using trapezoidal fuzzy numbers and their operations. The data are organized in such a way that higher values represent positive behavior. The final fuzzy number (Table 6) for the particular risk behavior (for example, speeding in the municipality AN) was derived as the weighted sum of all calculated fuzzy numbers for each question. Internal weights (see Figure 2) were determined by experts relying on the road safety figures (number of accidents and fatalities on different types of roads). Defuzzification was conducted via equation (2), and the final values were obtained (Table 7). Not wearing the seatbelt received the lowest average score (0.568), abstracting this negative behavior as the most common among drivers in Montenegro, followed by using the telephone while driving, speeding, and driving under the influence of alcohol (0.593, 0.650, and 0.881, respectively).

Following the steps described in Section 2, the entropy-based weighting technique was introduced, and it was found that not wearing a seatbelt had the greatest influence on the final score with an assigned value of 0.528. The lowest assigned weight of 0.046 was received by the behavior related to driving under the influence of alcohol (first row in Table 8).
Furthermore, these weights were integrated with grey relational coefficients, and the weighted grey degree was obtained, representing the final behavior FEGRA index. Results are presented in Table 8. The municipality of Tivat (TV) received the largest behavior index (0.778), indicating that the local drivers have relatively better behavior than drivers in other territories. It is followed by municipality of Kotor (KO), Bar (BR), Šavnik (ŠA), and Mojkovac (MK). These scores are not a surprise given that those municipalities are all small ones, with mostly low motorization levels and no main roads on their territories. In contrast, the municipality of Budva (BD) was assigned a relatively low score (0.371) and was identified as the worst-performing municipality, along with the municipality of Žabljak (ŽB). Both of these municipalities represent tourist centers. Hence, an increased number of vehicles and population on their territories may be the cause of low performance.
5. Discussion
Having in mind that the construction of a composite index implies (more or less subjective) methodological choices in several consecutive steps, the robustness analysis of the proposed methodology, as a kind of tool for quality assurance [49], was performed. The influence of data modeling methods, weighting, and aggregation schemes on final outcomes (i.e., a shift in the rank of the entire set of municipalities) was examined. In addition to this uncertainty analysis, the Pearson product-moment correlation ratio was chosen as a variance-based measure that accounts for linear dependence between the input and output data, and it is exactly correspondent to the main effect index or the first-order sensitivity index [50, 51].
In the first step, the impact of the data modeling method will be examined. As it was said earlier, the FEGRA index is constructed with the data collected from the questionnaire. A common practice in data modeling, when opinion and attitudes are involved, is the usage of the Likert scale. It is usually done by averaging the collected answers. Firstly, the survey answers were observed as ordinal with “1,” presenting unwanted behavior, and “6,” presenting positive behavior, and the final crisp score for the selected behavior was calculated as the mean of all collected answers by respecting the previously defined internal weights. Those values are then normalized (see Table 9). However, the uncertainty of the drivers’ subjective measure of their own behavior is not addressed in that case. The comparative influence of the applied fuzzy numbers for resolving the uncertainty of input data is presented in Figure 3. The calculated rank-order coefficient of 0.69 shows a relatively low correlation between rankings, meaning that the usage of fuzzy numbers had a significant effect on the final performance index. Figure 3 illuminates that the largest rank variation is recorded for two top-ranked territories (rank shift is 13 and 8 places, respectively). Besides this, the data modeling technique used affects principally the bottom-ranked territories.

In the second step, the influence of the weighting method is considered. The weights attached to each indicator are usually chosen to reflect the importance of that indicator with respect to the concept being measured [50]. An ordinary initial approach is to take that each input variable contributes equally. This equal weighting is the most common scheme appearing in composite index construction [52]. Also, lately, the very popular method for weighting is the data envelopment analysis (DEA) since, as a data-driven technique, it does not relay on a subjective measure from outside. Here, a comparison of these two methods with ranking according to the proposed entropy weighting is performed. The DEA model from [43] is used, and separate weights for each territory are obtained. Figure 4 (left graph) compares these rankings, and it can be seen that addressing data uncertainty using the entropy weighting scheme has a positive impact on the ranking of eight and twelve territories compared to equal and DEA weighting, respectively, whilst a negative effect on the rank order is recorded in twelve and six territories, respectively. The higher correlation coefficient (0.72 and 0.70 for equal weighting and DEA, respectively) indicates that the weighting method has less (but not negligible) impact than data modeling.

(a)

(b)
In terms of aggregation techniques, the additive (simple linear) approach and the multicriteria decision-making (MCDM) approach were chosen for comparison. Additive aggregation, as the most commonly used in composite indicators, allows the assessment of the marginal contribution of each variable separately and entails full compensability. However, when assigned weights are considered to be “importance coefficients” (weights representing a measure of importance), noncompensatory aggregation should be used for composite index construction [52]. In that sense, the MCDM-TOPSIS method was taken, considering the preferential order of indicators by similarity with the ideal solution. Figure 4 (right graph) compares the ranking obtained by both additive and MCDM aggregation with the ranking obtained by FEGRA when the grey relational analysis is employed. The relatively high correlation coefficients between rankings under the study show a somewhat negligible influence on the final score, with the total absolute average shift in the rank of 1.05 places when GRA is replaced with additive aggregation and 1.5 places when aggregation is conducted with the MCDM technique (Table 10). In Table 10, rank shifts are given (individually by municipalities and total) along with the measures of linear dependence–Pearson product-moment correlations for all considered alternatives.
Finally, the ranking according to the complete FEGRA method was compared with the one performed by the selected combination of steps in the development of the composite index: the Likert scale (data modeling)–equal weight (weighting)–additive approach (aggregation). This comparison is illustrated in Figure 5 with error bars representing the deviation of FEGRA rank by ±3 places. From the scatter plot, it can be seen that larger deviations occur in the first half of the scale (among the better-ranked municipalities) and that the maximum shift is +7/−5 positions.

The demonstrated robustness of the novel hybrid methodology FEGRA (Fuzzy number–Entropy–GRA) confirms that it is able to create a reliable composite safety performance index of the territory and additionally establish a more rational ranking among the territories under evaluation.
6. Conclusions
Evaluating the state of road safety is a complicated task, especially complex in an uncertain environment. The source of uncertainty can be varied: from the selected indicators and the collection of their data, through the weighting process, to the chosen aggregation approach. This paper has just focused on assessing road safety using a composite performance index as a complex decision-making problem in uncertain circumstances. To overcome the drawbacks of data uncertainty, a systematic hybrid method—FEGRA—has been proposed. The method integrates different techniques for dealing with uncertainty at each step, namely trapezoidal fuzzy numbers for modeling indicator data, entropy-based weighting to determine the nominal importance of indicators, and grey relational analysis as a technique for aggregating weighted indicators. Data on the self-reported behavior of Montenegrin drivers collected through a questionnaire were used as indicators of safety performance (SPIs). The presented method made it possible to reliably assess traffic safety and rank municipalities in Montenegro on the basis of these behavioral SPIs.
The results show that not wearing a seatbelt is the most common negative behavior of drivers in Montenegro with the largest impact on final efficiency and ranking (i.e., it is the most important issue that policymakers should address in the future). Ranking positions are understandably very sensitive to the “quality of presentation” of input data, especially if that data is qualitative (opinion, judgment), as here. The performed robustness analysis of FEGRA (comparative analysis) confirms that the final ranking of the territory is mostly influenced by data modeling and weighting techniques (with a total absolute average rank shift of 3.5 places), while the choice of aggregation method is, as expected, less influential (with an average rank shift of 1 rank place).
One of the advantages of the proposed FEGRA method is its efficiency and practical usefulness since it does not require complicated software, as well as the fact that it can be treated as a unique methodology for the periodical monitoring of road user behavior and the achievements of implemented measures, strategies, and policies. However, since the analysis of the proposed method showed that indicator values have a large influence on the final ranking and from the fact that the result of the fuzzy approach counts heavily on the degree that fuzzy numbers correctly represent qualitative data, the calculated results might vary based on the assigned membership function, which is the largest shortcoming of this method. Another disadvantage stems from the fact that GRA is a normalization-based technique, and the calculated results might vary based on the type of normalization. Therefore, future work should conduct a deeper sensitivity analysis of both, i.e., assigning a different membership function and applying alternative data normalization, and also explore methods that best fit data representing road safety behavior.
Data Availability
Data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.