Abstract

In transport demand analysis, the calibration of a model means estimation of its (endogenous) parameters from observed data with an inference statistical estimator. Indeed, these considerations apply to any choice behaviour model, such as those derived from Random Utility Theory or any other choice modelling theory. Calibration of choice models can be carried out from disaggregate vs. aggregate data, while inference statistical estimators can be specified through Bayesian vs. Classic (or Frequentist) approaches. In this paper, the resulting Bayesian or Classic disaggregate or aggregate calibration methods are discussed, analysed in detail, and compared from the mathematical point of view. These methods are applied to calibrate Logit choice models for describing path choice behaviour at national scale on a small sample. The Logit choice model can be derived from Random Utility Theory (or be considered an instance of the Bradley–Terry model). Path choice set definition is also discussed, and specialised indicators are used for result comparison. The main contributions of this study concern the use of two different estimation approaches, Bayesian vs. Classic, adopting and introducing some indicators of goodness of estimation. The results of this work, relating to the sample of users adopted, show that the Bayesian approach provides a better estimate than the Classic approach because the calibrated parameters are more stable, the specific constants of the alternatives decrease, and the resulting models show better values of clearly right indicator.

1. Introduction

User path choice behaviour models are one of two main elements of any method for travel demand assignment to a transportation network, the other being the arc cost-flow functions modelling user driving behaviour and resulting congestion, a part of the OD demand flow matrix (details can be found in [1] and [2]).

Traditionally, within assignment methods, the path choice behaviour is modelled(i)By assuming the path choice set defined by(a)Either exhaustive enumeration of all elementary paths(b)Or selective enumeration of some of them according to some given criteria (i.e., efficient paths as defined in [3])(ii)By defining the user choice strategy among alternatives in the path choice set.

From a more general point of view, Mansky [4] proposed to model any choice behaviour as a two-level decision:(i)By an explicit definition of the perceived choice set(ii)By a description of choice among alternatives in the choice set.

This approach includes the first one as a special case and it is described below in detail for path choice behaviour considering that the application described in this paper focuses on the explicit approach.(i)Alternative Choice Set(a)Generation of Perceived Alternatives. Given a pair of origin and destination, the routing alternatives can be generated through the exhaustive approach (considering all elementary, say loop-less, paths) or the selective approach (only some elementary paths). The exhaustive approach may be less effective because commonly users likely perceive only a few alternatives. For this reason, first some candidate paths are generated considering some reasonable criteria [5, 6].(b)Definition of the Perceived Choice Set. Then, from the candidate alternatives, the perceived choice set may be defined through the application of deterministic [5] or probabilistic perception models [7, 8].(ii)Choice Strategy among Alternatives. The choice of the alternative from the perceived choice set is commonly described through Random Utility Models (RUM; see [5, 9, 10]), Fuzzy Utility Models (FUM; [1113]; see also [14, 15]) Quantum Utility Models (QUM; [16, 17]), psychological choice modelling, and ranking models (for instance, [1820]) as well as applying any other choice modelling theory, for instance, comparing pairs of alternatives [21, 22] for the probability estimation.

According to most used utility-based choice theories, given the choice set defined as in point I, each user moving between o and d pair i(a)Knows all paths in the choice set(b)Associates to each path in the choice set a perceived utility(c)Chooses the maximum perceived utility (or minimum perceived disutility) pathFurthermore, according to any Uncertainty Utility Theory,(d)The perceived utility is modelled by a continuous uncertainty number due to several sources of uncertainty regarding the users as well as the modellers.

In RUT, the perceived utility is modelled by a continuous random variable, with expected value called systematic utility; thus, the choice probability of an alternative is given by the probability that its perceived utility is equal to maximum among all alternatives; hence, the path choice proportions are assumed defined by the path choice probabilities. When the perceived utility covariance matrix is non-singular, a probabilistic path choice function is obtained, fully specified by the perceived utility probability density function (pdf).

Assuming the perceived utilities independent and distributed with some common parameters according to a Gumbel or a Weibull r.v., the Logit or the Weibit model [23] is obtained, respectively, both in closed form. Independence between alternatives does not allow to model overlapping paths, and this issue can be addressed within the utility specification (i.e., C-Logit in [24]; Path Size in [25]). Other approaches, not in closed form, model overlapping paths through covariance: Probit [26] or Gammit [27] based on MVNormal or MVGamma distributions. More details are given in de Luca [28]. As already stressed, the Logit or the Weibit choice models can also be derived from the psychological Bradley and Terry [21] model (BLT).

Whichever is the choice modelling theory, parameters of the choice model, including those in the utility specification, have to be calibrated against observed data.

From the statistical point of view, estimators can be specified according to(i)Bayesian (B) statistical inference:Parameters are assumed described by random variables and (point/interval) estimates are defined through the posterior distribution, given by the prior distribution times the likelihood of the observations (duly normalised); the resulting estimators can easily be fitted in a dynamic context applying the well-known Bayes’ theorem.(ii)Classic or Frequentist (C) statistical inference:Parameters are assumed having a (deterministic) value (even though non-knowledgeable), and their estimates are given by the point of maximum of the likelihood, say the joint probability of observing the sample of input data, where the maximum is taken among all possible samples (or other methods such as Least Squares).

The Bayesian statistical estimator is a random variable, and the uncertainty is defined by a probability distribution. The Classic statistical estimator is a point value based on the Frequentist approach. Bayesian methods assume prior pdf (probability density function) specification and obtain posterior pdf specification, possibly different in specification and parameters; in Classic estimation, a prior belief is not considered.

Bayesian methods have recently been adopted for travel demand estimation and traffic models in dynamic context. The dynamic travel demand is estimated in Yu et al. [29] considering normal prior distribution and posterior distribution obtained from added observed traffic counts; the errors decrease with the observations. Considering the generalized Bayesian approach and path choice, Zhu et al. [30, 31] analysed the traffic models in stochastic transportation systems with user equilibrium and non-user equilibrium conditions and the convergences with numerical studies and day-to-day dynamics for path.

In transportation demand analysis, the estimation of endogenous parameters of the demand model from observed data is commonly called calibration, developed through the steps of specification of the functional form, parameter estimation, and validation with formal and informal statistic indicators. For this reason, the term calibration is used below to define the estimate of the endogenous travel demand model parameters, with specific reference to the application to path choice. Calibration can be carried out from:(i)Disaggregate data: the input data are the alternative effectively chosen by each user in a sample, assuming that their choice decisions are independent.(ii)Aggregate data: the input data are the frequencies or the number of time that each alternative is observed.

The parameter calibration for path choice has been carried out through Classic estimators [5, 24] or Bayesian ones [32, 33] with multinomial prior Logit or mixed Logit probability. Washington et al. [32] reported theoretical aspects about calibration of multinomial Logit models, applied to path choice, through Classic or Bayesian estimators.

In this paper, the Bayesian calibration approach is analysed in detail and compared with the Classic one by calibrating Logit path choice models at national scale. Path choice set definition is also discussed. The paper is structured as follows. Section 2 presents the methodology. Section 3 discusses the results of an application to a real case. Conclusions and further developments are given in Section 4. The main contributions of this paper concern the use of(i)Two different calibration approaches, Bayesian vs. Classic (or Frequentist) (Section 2).(ii)Two different sources of information, observed choices or frequencies (Section 2).(iii)Some indicators of goodness of calibration (Section 3).

2. Methodology

Path choice behaviour for each user class (u.c.) n can be described by applying any model derived from a discrete choice theory (such as RUM, FUM, and QUM quoted above). Applying Random Utility Theory (RUT), it is assumed that for a journey:(i)Sn is the choice set containing the perceived alternatives for the u.c. n.(ii)Each user in the u.c. n associates a perceived utility Uk,n to each path k belonging to the choice set Sn of perceived alternatives.(iii)Each user in the u.c. n chooses the (an) alternative of maximum perceived utility.(iv)The perceived utility Uk,n is modelled as a (continuous) random variable, with pdf fU(•), due to several sources of uncertainty both for users and modeller; it has expected value E(Uk,n) = , called systematic utility (several specifications can be considered for the pdf of Uk,n).

According the above assumptions, the choice proportion pk,n of path k is given by the (mass) probability that the perceived utility of path k is the maximum (that is the choice probability):where θn is the choice function parameter vector including the perceived utility pdf parameters, whose meaning depends on the choice model specification; this vector may be different with the u.c. n. According to equation (1), the choice probability pn(k | θ, Sn) depends on the values of the systematic utility h ∈ Sn.

The systematic utility is commonly assumed, both in research and practice, a linear combination of a vector x of attributes, such as travel time, monetary costs due to tolls, fees, …, through utility parameters θv included in vector θ:

All parameters in vector θ are to be calibrated against observations, and they are usually considered deterministic parameters. Recently, they have also been considered random variables Θ with a joint pdf f(θ|Sn), to model for instance users’ heterogeneity, VoT distribution, etc.

For each u.c. n different RUMs are obtained depending on the distribution of the perceived utility, as already said. In this paper, multinomial Logit is used, where the perceived utilities Uk,n are independently and identically distributed (i.i.d.) as a Gumbel r.v. with mean given by the systematic utility and dispersion parameter θo; this parameter plays the role of utility scale factor, and thus it may not be distinguished from θv; therefore, the only actual parameters of the model are θ = θv/θo. Equation (2) becomesand if all parameters in vector θ are assumed deterministic variables, the choice probabilities can be defined in closed form:and as already noted, the Logit choice model can also be obtained as a psychological one.

As already said, calibration of Random Utility Models for path choice (or of any other model), that is, estimation of parameters θ, can be carried out from:(i)Disaggregate data: the input data are the alternative effectively chosen by each user within a sample of N users, assuming that their choices are stochastically independent.(ii)Aggregate data: the input data are the frequencies, that is, the number of times that each alternative is chosen, with N being the sum of frequencies over all alternatives.

Similar considerations may apply for other RUMs such as Probit, Weibit, Gammit, or other psychological choice models that will be discussed in a future paper.

In the following notation, Sn is omitted to simplify notation. For application’s purpose, each user class n is assumed to be made up of one user.

2.1. The Classic (or Frequentist) Approach

According to the Classic (or Frequentist) approach of estimation, each parameter is assumed having a given (even though non-knowledgeable) value, and its estimate is given by the point of the maximum of the likelihood, say the joint probability of observing the sample of input data over all possible samples under the same conditions. This approach shows several drawbacks, such as the assumption of the availability of infinite many samples under the same conditions, as well known in the literature.

The joint distribution of the sample of the observations given the values of parameters θ can be specified by the disaggregate or the aggregate approaches described above.

2.1.1. Disaggregate

Given a sample of N users, choice decisions are assumed independent, for a user n, the alternative xn effectively chosen is observed, and values xn are collected in vector x = […, xn, …]’.

The likelihood for the N observed chosen alternatives is the joint probability of observing the alternative effectively chosen by each user.

Assuming that the random variables Xn describing the alternative chosen by each user n are i.i.d. as the integer r.v. X with probability mass function px, isspecified as in the previous subsection:

2.1.2. Aggregate

Given M alternatives, the number of times (frequency), zm, that each alternative m is observed is assumed independent, and values zm ≥ 0 are collected in the vector z = […, zm, …]’, to avoid any confusion with vector x, with Σmzm = N.

Assuming that z is a realization of a multinomial random variable Z with probabilities pz(m), specified above, the likelihood of observing the vector z is given by

2.1.3. Estimation method

For estimating parameters θ, the likelihood L is considered as a function of θ given by L (θ; x) = P (x | θ). Thus, according to the maximum likelihood method, the estimates θML are given by the solution of the maximisation problem: Maxθ L(θ; x).

Very often a logarithmic transformation simplifies the problem without changing the solution: l(θ; x) = ln(L (θ; x)); thus, the estimates θML are given by the solution of Maxθ l(θ; x). In Classic statistical inference, θML maximises the probability to observe the really observed sample with respect to all the others.(i)Disaggregate:According to the above assumptions, the ML estimates θML are given by(ii)Aggregate:According to the above assumptions, the ML estimates θML are given by

Following the Least Squares method, the estimation θLS is given by

2.2. The Bayesian Approach

According to Bayesian estimation, parameters are assumed described by random variables and estimates are given by the median or the mean or the mode (if the posterior pdf is unimodal) of their posterior joint pdf given by their prior joint pdf times the likelihood of the observation (duly normalised). This approach shows several useful features, as well known in the literature, such as it is suitable for successive re-estimations when further information is collected, possibly in a dynamic context, and it allows consistent definition of credible interval estimators.

The Bayesian approach considers that the posterior P(θ | x) conditional joint pdf of the parameters θ, conditioned to the observation vector x, is equal to the product between the prior (P(θ)) and the likelihood (P (x | θ)) probabilities, duly normalised by factor (1/P(x)):

The initial knowledge about the parameter theta is described by the prior (P(θ)), and then multiplying it by the likelihood P(x | θ) from a set of observations x (and normalising it) gives an update of the knowledge as described by the posterior function P(θ | x). It should be noted that the normalising factor (1/P(x)) does not depend on parameters θ.

2.2.1. Prior Specification

For the vector θ of parameters, a prior conditioned joint conditioned pdf f(θ)—of the parameters θ—can be assumed known. It derives from the past information available about θ. It could derive from previous estimation procedures or previous experiences about it (i.e., sign, probable interval, and probabilistic distribution).

Assuming θ as continuous random variables, the prior probability P(θ) is given by the prior conditioned joint pdf of θ, f(θ):

Assuming that all the parameters θj are i.i.d. with pdf f (θj) the a prior probability has the specification:

2.2.2. Likelihood Specification

Our knowledge of parameters θ as modelled by the prior pdf can be improved from observations. The likelihood L—of the observations over the parameters—can be specified in two ways:(a)Observing the alternative chosen by a sample of users, x.(b)Observing the frequencies relative to the alternative chosen by a sample of users, z.

As above, it is repeated for reader’s convenience.(i)Disaggregate:(ii)Aggregate:

Note that the probability mass function pn(k | θ) can be specified in several ways, and some of them are reported in Section 2. For the probability evaluation, the meaning of the parameter θ depends on the type of model.

The likelihood can be considered as a function of θ:

2.2.3. Normalising Factor

As said above, the normalising factor (1/P(x)) depends on observations but does not depend on parameters θ.

2.2.4. Posterior Definition

As reported above, the posterior P(θ | x) conditional joint pdf of the parameters θ is the probability given to parameters θ after observations have been collected.(i)Disaggregate:(ii)Aggregate:

2.2.5. Point and Interval Estimators

In both cases, a point estimate of parameters θ may be given by the mean, the median, the mode, or any other location indicator of the posterior distribution, depending on the error function. Credible intervals are easily defined from the posterior joint pdf, as well.(i)Disaggregate:(ii)Aggregate:

(1) Normalising Factor. It can be easily noted that the normalising factor does not affect the point of maximum, and thus it can be neglected.

(2) Prior pdf. Several approaches are available to define the prior distribution such as conjugate priors and improper priors. As the size of the sample goes to infinity, the results of the Bernstein–von Mises theorem can be used:The posterior tends to the likelihood, and the prior role becomes less and less relevant:The likelihood tends to be normally distributed:

Thus, for large samples ① the mode θBM of the posterior pdf goes to the point of the maximum of the Likelihood, ② the mean and the median goes to mode, therefore:

Moreover, this theorem links Bayesian inference with Classic or Frequentist inference.

(3) Remark. In most cases, it is convenient to maximise a log-transformation of the objective function, which does not affect (as any increasing transformation) the point of maximum. Assuming the (naïve) pair-wise conditional independence between the N observations and that the m parameters are i.i.d. with pdf f(•) and applying equations (13) and (14) for defining the likelihood in equation (21), the estimators are given by(i)Disaggregate:(ii)Aggregate:where ln ((Σmzm)!)/(Πmzm!)) has been omitted since it does not affect the point of maximum.

2.2.6. Application in a Dynamic Context

The Bayesian estimators can easily be fitted in a dynamic context applying the well-known Bayes’ theorem. Indeed, once the posterior distribution has been defined, it can be used as the prior distribution when other data, possibly of different types, will be available.

3. Results and Discussion

In this section, results of an application are discussed. The whole (small) sample contains 216 randomly sampled independent interviews (one interview for each user). Data were collected through an on-road revealed preference (RP) survey, carried out in Italy at the national scale. Reference was to short/long extra-urban car trips. Different OD pairs, between main cities at least partially connected through motorways, were considered such that the observed distances are in the range of 20–1100 km and the travel times are in the range of 0.2–10.5 h.

The travel survey’s main purpose was to test some calibration methods for route choice models at national scale; therefore, the whole OD matrix was not estimated since it is not relevant for the analysis of the user route choice behaviour; of course, it would be much relevant for any kind of assignment that is not a topic of this paper.

The level of service attributes for the Italian national road transport system and the comparison between the generated and the chosen alternatives are carried out with a road network composed of all motorways and main roads. It has around 5000 nodes and 16000 links.

The main purpose of the experimentation is the comparison of different methodologies for the model parameter estimation; for this aim, a small sample has been used for computation simplicity. For practical applications, larger samples must be used.

Alternative choice set is generated with a multicriteria and deterministic approach considering the criteria that guarantee a high degree of coverage of the interviews with a significant choice of path for each criterion.

The user choice strategy among alternatives is described by a Logit choice model, as a psychological choice model or obtained as a RUM assuming perceived utility, indipendent Gumbel distributed with common scale parameter θ0. The calibration of the distribution parameter, including those of the systematic utility specification, is based on Bayesian and Classic approaches.

3.1. Main Assumptions and Preliminary Data Analysis

The quantitative characteristics (attributes) considered for each path, which give the best statistical results in the calibration, are(i)yTk,n, travel time (in hour) with parameter θT.(ii)yPk,n, percentage of distance (in decimal) on motorway links with parameter θP.(iii)yLk,n, label variables (0 or 1) considered in the path generated with the criterion of minimum time with parameter θL (it is 1 if the path of minimum time is equal to the path with another adopted criterion).

The expected value of the perceived utility has the specification:

It is assumed that each entry of the prior vector θ = [θT; θP; θL]’ is independently distributed with a normal probability function fN(θ), with(i)Expected value μθ = [μθT; μθP; μθL]’ = [−2 h−1; 2; 2]’.(ii)Standard deviation σθ = [σθT; σθP; σθL]’ = [1 h−1; 1; 1]’ (and all covariances, σθTP, σθTL,σθPT, σθPL, σθLT,σθLP, equal to 0 since independent).

3.1.1. Alternative Choice Set

In Italy, it is extremely a rare case that an extra-urban OD pair between main cities is connected by more than three paths. We did not find any two (out of more than 100) province capitals (population range is roughly from 25 000 to 2 500 000 citizens) for which a commonly used navigator suggests more than three (actually different) paths, mostly due to the Italian geography and the available road facilities.

Starting from the sample of 216 users, at most three paths are generated for each user according to three criteria: minimum travel time path; maximum motorway path; and minimum monetary cost path. These criteria have been selected as the most effective with respect to the revealed chosen path, from a pool of several other criteria.

The three criteria adopted ensure that for each user at least one generated path has been chosen by the user in 200 cases over 216 cases. Other criteria have been tested, and the second best path is generated for each criterion, but the coverage does not increase with statistical significance. The choice set covered by the first path generated according to each criterion counts 200 interviews out of 216 (93.0%).

There are 14 users with only one path generated in the perceived choice set, and they are not further considered in the parameter calibration. The choice set with at least two available alternatives generated through the three criteria counts 186 interviews out of 216 (86.1%).

3.1.2. Choice among Alternatives

The observed paths cover the different ranges of travel distance at the Italian national scale.

The main statistic characteristics of all the generated paths relative to the 186 users are(i)Travel time between 0.2 h and 10.3 h.(ii)Distance between 23 km and 1069 km.(iii)Monetary cost between 2 € and 190 €, due to toll and fuel.(iv)Percentage on motorway between 0% and 99%.

The statistics reported below do not refer to the paths generated for a single user: the variability of journey times, distances, and the percentage of use of the motorway refer to the variability among all the users belonging to the sample.

In Figure 1, a comparison between the travel time and the percentage of motorway in the path with minimum travel time is reported for all the 186 users. Long-distance paths have the tendency to have a high percentage on motorway.

3.2. Calibration or Parameter Estimation

To test the Bayesian (B) or the Classic (C) approach to calibration, two different types of information are adopted: disaggregate (D) and aggregate (A). The Logit choice model is adopted. To test the effect on the calibrated parameters, two sample are considering: 124 users (2/3 of 186) randomly extracted; 186 users (all). It simulates a dynamic context.

Considering the small number of interviews and the number of calibrated parameters, in the Bayesian approach, a numerical approximation of the prior distributions is used for obtaining the posterior distributions over a uniform grid (50 × 50 × 50) and the modes are obtained through a numerical optimization. As the numbers of the calibrated parameters and the interviews increase, the Metropolis–Hastings algorithm, say Markov chain Monte Carlo (MCMC), can be used for the Bayesian calibration.

3.2.1. Bayesian Approach

The main results of the calibration considering the Bayesian approach (B), the disaggregate data (D), the Logit specification, and the two user samples defined above are reported in Figure 2, which highlights the effects of adding 62 interviews to the initial 124. The prior of θ is a normal distribution (in the figure, the prior may look non-centered due to the range of the values on the horizontal axis). In all cases, the posterior pdfs are less dispersed than the prior pdfs. Furthermore, increasing the number of observations in the sample leads to a reduction of variance, as expected.

It is also worth noting that the probability of a parameter θ having the wrong sign decreases from the prior to the posterior as well as from the smaller to the large sample size, due to the reduction of variance.

3.2.2. Bayesian vs. Classic: Parameter Point Estimation

The main results of the calibration considering disaggregate and aggregate data are reported, respectively, in Tables 1 and 2. The estimates are given by the mode of the posterior distribution in the Bayesian approach or by the point of maximum likelihood in the Classic approach. For simplicity’s sake, detailed statistics are not reported for aggregate data.

At a first glance, all calibrations look fine since each estimated parameter has the expected sign, and indeed the parameter of(i)The travel time is negative.(ii)The percentage of travel on the motorway links is positive (indicates the attraction of a high quality path).(iii)The label is positive (indicates the generation of the same path with several criteria).

Considering the disaggregate approach, the t-statistics in the Bayesian approach are always higher than those in the Classic approach. Furthermore, all parameters are significant at 95% significance in the Bayesian approach, while in the Classic approach, two out of three parameters are not significant with a very low t-Student statistic value for the parameter referred to as the label variable.

As expected, the logarithm of the optimal likelihood value with aggregate data is worse than the one with disaggregate data; moreover, the logarithm of the optimal likelihood value with Bayesian approach is (slightly) worse than the one with Classic approach, as expected.

For a comparison of maximum likelihood values, the traditional value of a pseudo-ρ2 is also computed as 1 minus the ratio between the logarithm of the optimal likelihood value and of the likelihood value with all parameters equal to zero. This indicator allows comparing the results of the different methodologies by reporting the values of the indicators on the scale [0; 1]. The two estimation methods give similar values for the pseudo-ρ2 statistic.

In all cases, the Bayesian approach provides a more stable value of the parameters. Moreover, with respect to disaggregate data, the Bayesian approach provides a much lower value of the label parameter; this result is highly relevant as this variable describes information that is not described by the other variables (see analysis reported below).

Of course, a further advantage of the Bayesian approach over the Classic approach is the definition of the posterior probability distribution of the parameters, which support a more theoretically sound interval estimation described in the next subsection.

3.2.3. Bayesian Parameter Interval Estimation

Table 3 compares the posterior statistics, mode, mean, variance, and coefficient of variation () of the parameters calibrated by the disaggregate Bayesian approach. The results are compared considering the two samples. Considering the probability distribution, it is possible to evaluate the significance level for each interval. The variation reported in Table 3 is a measure of the interval estimation for a given significance level. The variance of the label variable is greater than the other variances; out of sense of completeness, values of t-Student significance are also reported.

3.2.4. Clearly Right and Clearly Wrong Statistics

Figure 3 shows the clearly right and clearly wrong statistics proposed in de Luca and Cantarella [34] for the full sample and the Classic and Bayes disaggregate approach for calibration. Clearly right (wrong) index is the percentage of users in the sample -shown on y axis- with a modelled probability for the chosen (not chosen) alternative greater than a predetermined threshold -shown on x axis-. The values are plotted with a threshold greater than 50%, considering that with lower values the statistic has no significance. The values are similar as expected since the two approaches lead to similar models, but it is important to note that in the left area, the clear wrong of the Bayesian approach is greater (better) than the Classic approach; very often the clear right of the Bayesian approach is lower (better) than the Classic approach. They give the information that in this area, the Bayesian approach gives better results than the Classic approach with the adopted data.

3.2.5. Analysis of Some Users

Starting from the 186 users, 5 representative users are randomly selected with the characteristics reported in Table 4. They are sorted according to the percentage use of the motorway. The characteristics of these users are adopted for post-calibration analysis. These users are represented with a red dot in Figure 4.

In Table 4, the effect of the label variable in the Classic approach compared to the Bayesian approach is evident. To compare the percentage contribution of each attribute component (j) to each alternative (k), the percentage weight is evaluated considering the indicator:with θk the value of the k-th parameter and yj,k,n the value of the j-th attribute, for the alternative k,for the user n.

Considering the 4 users, numbered 103, 102, 88, and 59, the effect of the label variable is zero considering that the label attribute has a value equal to zero. Considering user numbered 87, the effect of the label variable is 78% in the Classic approach and 37% in the Bayesian approach. Also, considering the contribution for all users of the sample (all 186 users) and evaluating the average contribution, in the Classic vs. Bayesian approach, the contribution is 71% against 30%.

Considering all 186 users, the average contribution of each attribute component (j) to each alternative (k) in each expected value of the utility is as follows:(i)In the Classic approach, the contribution of each component is not balanced (13%, 17%, and 71% with three attributes and 41% and 59% with two attributes).(ii)In the Bayesian approach, the contribution of each components is better balanced (37%, 33%, and 30% with three attributes and 50% and 50% with two attributes).

This consideration shows that the unexplained information in the label variable in the Classical approach is greater than in the Bayesian approach.

This indicator does not aim to indicate greater or smaller accuracy of a method. It aims to check if the value of any parameter makes a high contribution in the utility calculation. The high contribution indicates that it alone could significantly influence the probability calculation. In practice, it is preferable to have models where the weight of the different contributions is balanced and the variation of the corresponding attributes significantly affects user choice.

Table 5 shows the probabilities computed for the 5 representative users considered with the optimized parameters reported in Tables 1 and 2. It can be observed that for the chosen alternative, the Bayes probabilities are higher than the other probabilities. In the specific case of user number 87, the disaggregate classic approach provides 100% probability for alternative 1, considering the high value of the label variable.

4. Conclusions

In this paper, path choice model calibration is tackled by comparing Bayesian (obtaining the posterior pdf of the parameters) and Classic (obtaining the maximum likelihood parameters) estimation methods and two different types of data: disaggregate (observing the choices actually made by a sample of users) and aggregate (observing the observed frequencies).

The Bayesian approach allows to obtain the posterior distribution of the parameters and therefore also allows to easily obtain the estimate by credible intervals. The posterior distribution can be adopted for forecasting scenarios or new prior distribution if new data are available. Indeed, it is suitable for successive re-estimations when further information is collected in a dynamic context for decision making in changing environments.

As expected, from the Bayesian statistics theory, as the number of observations increases, the estimates of parameters asymptotically tend to the maximum likelihood. On the other hand, apart from very large sample, Bayesian estimators provide estimates more consistent with the sample size and allow to take into account preliminary knowledge within the prior distribution.

With reference to the adopted sample, the Bayesian and Classic estimation methods give quite similar values for the pseudo-ρ2 statistical indicators. But, the comparison through several other indicators shows that the Bayesian estimation approach provides a better estimate than the Classic approach for several reasons, such as the calibrated parameters are more stable; the absolute values of the labels, which have a similar functioning to the specific constants of the alternatives, are lower; and the resulting models show better values of clearly right indicator.

The proposed method is rather general and can be applied and transferred to other contexts. The path choice model has general validity and with appropriate specifications can be tested in other contexts also to compare different estimation methodologies.

The results obtained refer to the sample used; in the future the proposed method must be tested with other samples, possibly referring to other contexts. In addition, the analyses should be extended by considering also choice models different from the Logit model, such as the Weibit model, as well as comparisons and integrations of econometric and psychological choice models. Application in a dynamic context is another issue worth of further research effort.

Data Availability

Data are not available due to privacy restrictions.

Disclosure

This study was performed as part of the employment of the authors.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Giulio Erberto Cantarella was responsible for supervision, conceptualization, methodology, formal analysis, and reviewing and editing. Antonino Vitetta was responsible for methodology, formal analysis, investigation, software, data curation, validation, visualization, and reviewing and editing.