Abstract
This paper proposes a linear mixed model of route speed distributions that separates the variability into an intertraveller component, consistent across days and time intervals for each recurrent traveller, and an intratraveller component representing uncertainty. The intratraveller variability corresponds to travel time uncertainty, while the total variability is typically captured by empirical measurements and used in travel time reliability assessments. The intratraveller and the total variability differ if there are systematic differences in speed between different recurrent travellers. The paper also investigates to what degree vehicles traversing a route during the morning or evening peak over multiple days are recurrent travellers. Using data from Bluetooth and Wifi sensors on 26 routes in Stockholm, Sweden, over a three-month period, we find that the traveller recurrence is higher towards the city in the morning peak and out from the city in the afternoon. Model estimation results show that the relative intratraveller variability is also significantly higher in the commute direction (towards the city in the morning and out from the city in the afternoon) and on routes with high congestion levels. The relations revealed in this paper may be used to estimate the relevant intratraveller variance based on the total variance and readily available route attributes. Without this correction, the costs associated with travel time variability may be overestimated.
1. Introduction
It is well known that the travel time on a given route is not constant but typically variable between trips. Some part of the variability is usually predictable by arecurrent traveller based on systematic demand variations across hours, weekdays, months, and years. Part of the variability, however, arises from factors that are difficult for travellers to foresee, such as demand fluctuations, weather conditions, and incidents. This part of the travel time variability gives rise to travel time uncertainty. The extent to which the travel time can be predicted in advance is referred to as travel time reliability. Low reliability is associated with considerable costs due to late arrivals and the need for safety margins in departure times [1, 2].
It is by now widely accepted that travel time variability, and not only the expected travel time, is associated with economic costs. A large literature has appeared in recent years, covering the theoretical aspects of reliability, traveller behaviour under travel time uncertainty, valuation studies through stated and revealed preference experiments, as well as empirical investigations of travel time variability [1–3].
A common metric of travel time variability borrowed from general statistics is the standard deviation. However, this metric can be difficult to interpret for travellers and practitioners, and various other metrics have been proposed [3]. The works of [4, 5] derived a foundation for using the standard deviation based on microeconomic scheduling models, in which costs arise from early or late arrivals. Later extensions have found that other metrics such as the travel time variance can be motivated given other assumptions about travellers’ scheduling preferences [6].
Many studies have assessed the behaviour of travellers facing travel time uncertainty, in particular their trade-offs between uncertainty, average travel time, and travel cost. The value of travel time reliability has been studied in the context of mode choice [7], route choice [8, 9], departure time choice [10], and toll road usage and road pricing [11, 12]. Across studies, the ratio between the value of reliability and the value of travel time has been found to lie in the range between 0.2 and 1.5 with a typical value just under 1 [2].
Economic appraisals of transport policies and infrastructure projects are generally based on travel demand forecasting and traffic assignment models that treat link travel times as deterministic or, interpreted differently, only consider expected travel times [2]. In order to incorporate travel time reliability in appraisals, a number of studies have therefore sought to establish analytical relations between the mean and the standard deviation (or some other metric of variability) of travel time at the link level [13–15]. Other studies have modelled the correlations between link travel time distributions along a route or in a network [16–18].
On the empirical side, a long line of research has sought to model and characterise the variability of travel times on different types of roads, links, and routes. Many studies have found travel time distributions to be asymmetric with long upper tails [19, 20]. Proposed closed-form distribution functions include log-normal [21], stable [19], gamma [20], and Burr Type XII [22]. Other approaches model the asymmetric distributions as mixtures of multiple underlying distributions representing different traffic states [23–25]. A nonparametric approach to estimating route travel time distributions based on floating car data is proposed in [26].
A taxonomy of sources of travel time variability is provided in [27], who separate them into (1) traffic influencing events, including traffic incidents and accidents, road construction work, weather, and environmental conditions, (2) traffic demand, including day-to-day fluctuations and special events, and (3) physical road features, including traffic control infrastructure and road capacity. Several authors note that travel time variability can be separated into day-to-day variability, within-day (interval-to-interval) variability, and vehicle-to-vehicle variability [28, 29]. A mixture model of compound Gamma-Gamma distributions to jointly represent day-to-day and vehicle-to-vehicle variability under different traffic states is proposed in [24]. The model is further developed and applied to empirical data in [20].
For some types of trips such as the commute to and from work, travellers tend to use the same route around the same time of day repeatedly over multiple days. An important distinction can be made between the individual traveller’s uncertainty across days and the total variability across all trips and travellers. While the individual uncertainty determines the associated reliability cost, empirical travel time measurements typically provide the total travel time variability without distinguishing between recurrent vehicles. The intratraveller and the total variability differ if there are systematic differences in speed between different recurrent travellers. These differences, referred to here as intertraveller variability, could be due to heterogeneous preferred driving speeds, or because frequent travellers may be able to strategically plan their departure times to reduce travel time variability. The total travel time variability across trips, travellers, and days can thus be conceptually decomposed as
If the relative magnitude of the intertraveller variability component is significant, this must be taken into account in economic valuations [2] and modelling [30] of travel time reliability.
While some existing studies distinguish between day-to-day and vehicle-to-vehicle variability, no study that we are aware of has considered that some vehicles, i.e., travellers, repeatedly traverse the same route multiple days. A reason for this is presumably that data on travellers’ mobility over multiple days have been lacking historically. Analysis of public transport smart card data reveals that travel patterns exhibit strong regularity between days [31], but evidence from private cars is so far limited due to difficulties in data collection. Several studies have used travel diaries to study the regularity of individuals’ activity-travel patterns across days and have found the highest degree of repetition for essential activities such as commuting [32]. Further, the level of repetition of daily activity-travel patterns is more correlated with commitments and obligations than with travel mode choice [33]. Meanwhile, studies have shown that familiarity with a route is coupled with higher driving speed [34], which suggests that there is a consistent vehicle-specific component of travel time variability.
Given that there are consistent travel time variations among different travellers, this means that the vehicle-to-vehicle and day-to-day dimensions are intertwined. To correctly capture the part of the total travel time variability that is experienced by an individual traveller a more elaborate model is needed, which decomposes the total variability into day-to-day variability, interval-to-interval variability, traveller-to-traveller variability, and residual (within-day-period-traveller) variability.
The aim of this paper is to highlight the distinction between inter- versus intratraveller travel time or speed variability, to assess the prevalence of recurrent travellers on urban motorway and arterial routes, to extract the inter- and intratraveller variability from the total variability, and to assess their relative magnitudes. The analysis utilizes disaggregate travel time observations from Bluetooth and Wifi devices installed on multiple routes.
The paper is organized as follows. Section 2 describes the methodology for analysing traveller recurrence and speed variability and describes a case study on multiple arterial and motorway routes in Stockholm, Sweden. Section 3 presents and discusses the results from the case study, and Section 4 concludes the paper.
2. Method and Case Study
This section proposes a method for analysing variations in traveller recurrence between routes based on disaggregate travel time observations with associated (possibly rehashed) device IDs from Bluetooth and Wifi data. Further, a linear mixed model of route space-mean speeds is proposed that separates total variance into intertraveller and intratraveller variance components.
2.1. Disaggregate Travel Time Data
The methodology is based on route travel time measurements from individual trips. For each measurement, three items of information are assumed to be available: (1) the measured travel time, (2) the date and time of the measurement, and (3) a consistent identifier (ID) for the vehicle. These type of data may be collected through various technologies, e.g., Bluetooth and Wifi sensors or automatic number plate recognition (ANPR) cameras. The data set used in the case study has some limitations with respect to the third item, consistent IDs, and the implications for the analysis are addressed in the following.
We use travel time data from a set of routes in the Stockholm region, as shown in Figure 1. Each route is defined as a pair of detectors that capture the MAC addresses of Bluetooth and Wifi devices in vehicles (including mobile phones and other devices inside the vehicles). Travel times are measured by matching the MAC addresses and associated time stamps between the pair of detectors. The data cover the three-month period from 1 January 2019 to 27 March 2019, which includes 61 work days (Monday–Friday).

Each row of data contains information about the following:(i)Route ID(ii)Anonymized device ID(iii)Trip time stamp (the time passing the upstream detector)(iv)Trip travel time (the time difference between passing the two detectors)(v)Level of outlier
The MAC addresses have been hashed into anonymous ID numbers before they wereaccessible for our analysis. The extent to which the addresses are rehashed over time is not known to us, which is a disadvantage of the data set. Further, it is possible that mobile devices internally resample their MAC addresses at certain time intervals. These factors imply that the same vehicle may be recorded under multiple IDs over the three-month period, which means that the true number of unique vehicles is lower than the number of IDs. In any case, it can be assumed that this bias is similar across all routes, and we can study variations in traveller recurrence and speed variability components across different route characteristics.
The data are separated into two time period categories: morning peak (Monday–Friday, 8-9 am) and afternoon peak (Monday–Friday, 4-5 pm). For each route and time period, we remove all observations marked as outliers. After this trimming, we discard any routes with less than 100 observed trips or observed trips form less than 15 distinct days in either the morning or the afternoon peak. This produces a final set of 26 routes, as listed in Table 1. The table also shows the length of each route and three attributes used in the subsequent analysis: the geographical region (north or south of the city center), road type (motorway or arterial), and direction (towards or away from the city center). The number of observed trips for each route in the morning and afternoon peaks, respectively, is shown in Table 2. The average number of days per route with available data is 57.1 and 58.0 for the morning and afternoon periods, respectively.
2.2. Traveller Recurrence
As indicator of the recurrence of travellers on a route and time period, we use the mean number of trips per vehicle ID and day. A recurrence value of 1 thus means that the traveller uses the route in the same time period once per day on average. Due to rehashing of vehicle IDs, we expect that the actual recurrence is higher than what is observed from the data. In any case, we are interested in the variation in recurrence across different routes and time periods, in particular the morning and afternoon peaks. To assess the influence of route characteristics and peak period on the recurrence, a linear regression model is estimatedwhere indicates the time period, indicates the route, and is the number of routes. The error terms are i.i.d. normal. The parameters are to be estimated. The model includes the region, road type and direction of the route, the time period, as well as the interaction between direction and time period, as explanatory variables. This allows us to distinguish between the morning and afternoon commutes towards and out from the city center.
2.3. Linear Mixed Model of Route Speed
In order to model the speed variability on each route and time period, we divide both morning and afternoon peak periods into time intervals of approximately 15 minutes. Let denote the travel time on a certain route and time period on day during time interval for vehicle (we omit route and time period indices here for clarity). We model the space-mean speed , where is the length of the route. The reason for this choice is twofold. First, it normalizes the measurements across different routes. Second, the space-mean speed distribution tends to be more symmetric and similar to a normal distribution for the routes considered in the case study (compare Figure2).

The space-mean speed is modelled in a linear mixed model (LMM) framework [35] as the sum of a deterministic component and a zero-mean random term ,
The deterministic term is used to control for parts of the variability that can be predicted based on systematic temporal features. It is modelled as a linear function of a set of predictors (fixed effects),where , , and are categorical variables for the time interval, weekday, and month of the observation, respectively, represented as sets of dummy variables with associated parameter vectors , , and .
The focus of our analysis is on the random component , which is split into an intertraveller part (consistent across days and time intervals), and an intratraveller part . The intertraveller part represents systematic differences in travel speed across travellers, and each traveller is associated with a specific realization of the term. The intertraveller part is assumed not known by the traveller before the trip and reflects the individuals’ travel time uncertainty. Speeds are further assumed to vary randomly between days, between time intervals within each day, and even between all trips within each day and time interval. The intratraveller component is thus decomposed into a day-to-day component , a within-day interval-to-interval component , and a residual component . The final specification of the (random effects) model is
The random terms are assumed to be mutually independent and distributed normal. This assumption implies that the individual variance is equal for every traveller. The intertraveller component has variance , the day-to-day component has variance , the interval-to-interval component has variance , and the residual component has variance .
The speed model is estimated separately for each time period (morning and afternoon peak) and route combination , with the restricted maximum likelihood (REML) method [35]. The estimation is carried out using the fitlme routine in Matlab R2018b with the fminunc unconstrained optimization route.
2.4. Intratraveller Speed Variability
Across all days, periods, and travellers, the magnitude of the intratraveller variance is , while the total travel time variance is . We define the relative intratraveller variance (RIV) as the ratio between the intratraveller and total variances,
This ratio, which lies between 0 and 1, captures the degree to which the total speed variability across all trips represents the travellers’ uncertainty.
Rehashing of vehicle IDs implies that some of the intratraveller variability is incorrectly attributed to intertraveller variability. This means in turn that the relative intratraveller variance is underestimated. However, the influence of route and time period characteristics can be assessed qualitatively. For this purpose, a linear regression model is estimated as follows:
The parameters are to be estimated. The error terms are i.i.d. normal. In addition to the previously introduced variables, is defined as the ratio between the route space-mean speed during off-peak hours (measured on Sundays 8-9 am) and during period , as shown in Table 2. The value 1 corresponds to the same speed as during off-peak while higher numbers indicate higher congestion. As can be seen, some routes even display higher speeds during peak hours than off-peak hours, which could be due to more aggressive driving behaviour.
3. Results and Discussion
This section presents results from the analysis of traveller recurrence and the relative intratraveller speed variance in Stockholm.
3.1. Day-to-Day Traveller Route Recurrence
Figure 3 shows the distribution of number of trips per vehicle ID and per route across all 26 routes during the analysis period. Blue and red bars indicate the morning and afternoon peak hours, respectively. Around one-third of all observed trips are generated by vehicles whose ID appear only once on the same route during the analysis period. Thus, around two-thirds of the trips are generated by vehicles who are observed at least twice on the same route. The average number of trips per ID is 2.22 in the morning peak and 2.05 in the afternoon peak, which corresponds to average recurrence 0.0391 and 0.0355, respectively. The lower recurrence in the afternoon could reflect more varied travel habits compared to the morning.

Table 2 shows the mean recurrence for each route and the morning and afternoon peaks separately. The recurrence ranges from 0.0245 for route 18 in the afternoon to 0.0472 for route 23 in the morning. The linear regression model in (2) is estimated based on the information in Tables 1 and 2. Estimation results are shown in Table 3.
The intercept represents the recurrence in the baseline case: an arterial route in the north region aligned towards the city center during the morning peak. There is no statistically significant difference between the north and south regions, nor between arterials and motorways. However, there is a significantly lower recurrence in the afternoon than in the morning towards the city (p value 7.9e − 5). Further, an F-test shows that the recurrence is significantly higher out from the city than towards the city in the afternoon (-statistic 11.5, p value 0.0015). Finally, the recurrence out from the city in the afternoon peak is not significantly different from towards the city in the morning peak (-statistic 0.600, p value 0.443). The of the model is 0.33, which indicates that a large portion of the variation in recurrence is not explained by the included factors. The rehashing of device IDs could be a partial explanation.
All in all, the results show that a relatively small part of the traffic flow on urban routes consists of recurring travellers, although the rehashing of vehicle IDs biases the estimated number of trips per traveller downwards. The recurrence is higher towards the city in the morning peak and out from the city in the afternoon peak compared to other situations. This is in line with expectations that commute trips tend to be regular and follow these directions.
3.2. Inter- and Intratraveller Speed Variability
Figure 2 shows the observed space-mean speed distributions for all 26 routes and the morning and afternoon peaks separately, encapsulating both the intertraveller and the intratraveller variability. For each histogram, the number of bins is selected based on Sturges’ rule. Many routes show distinctly different speed distributions in the morning and the afternoon peaks, which indicates a clear difference in directionality of the traffic between the two periods. With a few exceptions such as routes 13 and 14, most such routes are on motorways. Some routes display bimodal distributions that suggests that traffic conditions vary between congested and uncongested.
The speed model in Section 2.3 is estimated separately for each route and time period combination. Table 4 shows the coefficient, the speed variance components, and the relative intra-traveller variance (RIV) according to (6). Some patterns can be observed from the results. For example, routes 14–18 have lower intertraveller variance than intratraveller variance components (, , and ) in the morning period but higher intertraveller variance in the afternoon period. For routes 19–23, the opposite pattern holds. Table 1 shows that the former group is aligned towards the city center, while the latter go out from the city center. Thus, there appears to be systematic variations in the nature of speed variability depending on the direction of the route and the time period.
The RIV values range from 0.35 for route 21 in the morning to 0.98 for route 17 in the morning, with average values 0.71 in the morning peak and 0.69 in the afternoon peak. The differences in variance decomposition among the routes observed above are also manifested in the RIV values, which are higher when the intratraveller variance components are larger and vice versa.
To assess the influence of route characteristics and time periods, the linear regression model in (7) is estimated using the independent variables in Tables 1 and 2. Two model versions are estimated, without (model 1) and with (model 2) the congestion variable. Estimation results are shown in Table 5. In model 1, there is no statistically significant difference between arterials and motorways. The relative intratraveller variance in the morning peak is significantly higher towards the city center than out from it (p value 9.6e − 10). Further, the RIV is significantly higher in the morning peak than in the afternoon peak towards the city (p value 2.5e − 7), but significantly higher in the afternoon than in the morning out from the city (-statistic 28.26, p value 3.03e − 6).
In Model 2, the relations found for Model 1 above are also present. Further, the level of congestion on the route has a significant positive impact on the relative intratraveller travel time variance (p value 1.6e − 5). This implies that each driver has less influence on the chosen speed in congested traffic conditions. An F-test for the combined effect of route direction and peak period reveals that the relative intratraveller variance out from the city in the afternoon peak and towards the city in the morning peak are not significantly different (-statistic 2.34, value 0.133). Unlike in Model 1 the RIV is significantly higher in the south region than the north, but this result may not be robust.
An alternative model formulation using the traveller recurrence as explanatory variable finds no significant effect of this variable. This indicates that there is no clear link between the average familiarity of the travellers with the variance composition of the route, at least at the aggregate level.
The analysis reveals that the relative intratraveller variance varies systematically with route and time period characteristics. Specifically, it tends to be higher in circumstances associated with heavy commuting traffic and congestion. This likely reflects that each driver has less influence on the chosen speed in such traffic conditions. Meanwhile, the type of road (motorway or arterial) has no impact, which suggests that the traffic characteristics of the routes are more important for the speed variance composition than the infrastructural characteristics.
4. Conclusions
This paper has investigated to what extent the vehicles traversing a route are recurring travellers depending on attributes such as road type, direction relative to the city center, and time of day. Using data from Bluetooth and Wifi sensors over a three-month period, we have found that the average number of trips per vehicle ID is higher towards the city in the morning peak and out from the city in the afternoon, which is consistent with the knowledge that commute trips tend to have the highest regularity across days.
Motivated by the finding that a substantial share of trips are made by recurrent travellers, the paper has proposed a model of route speed distributions that separates the variability into an intertraveller component, consistent across days and time intervals, and an intratraveller component. The intratraveller component is further split into day-to-day, interval-to-interval, and residual variability. Model estimation results show that the relative intratraveller variance is significantly higher in the commute direction (towards the city in the morning and out from the city in the afternoon) and on routes with high congestion levels. This is consistent with the intuition that more congestion leads to lower flexibility in the speed choice.
Due to some rehashing of vehicle IDs in the case study data, the precise magnitudes of intertraveller and intratraveller variances are difficult to estimate. However, the results indicate that a distinction must be made between the intratraveller variance, which corresponds to travel time uncertainty, and the total variance that is typically used in travel time reliability assessments. The relative magnitudes of the two terms vary systematically with route characteristics (direction and congestion) and time periods. The relations revealed in this paper may be used to estimate the relevant intratraveller variance based on the total variance and readily available route attributes. Without this correction, the costs associated with travel time variability may be overestimated.
Further research is needed to assess the generality of the findings in varying settings. The robustness of the results should also be verified by applying the analysis to data that do not suffer from limitations of rehashed vehicle IDs. Other topics for future work include exploring the speed variance model structure and potentially extending the linear mixed model formulation proposed here and extending the time frame of the analysis to incorporate seasonal variations. Finally, an interesting research direction is to investigate the causes for the intertraveller speed variability and the relation between frequency of recurrence and speed.
Data Availability
The Bluetooth/Wifi travel time data used to support the findings of this study are proprietary of Trafik Stockholm and so cannot be made freely available. Requests for access to these data should be made to the author and are subject to approval by Trafik Stockholm.
Conflicts of Interest
The author declares no conflicts of interest.
Acknowledgments
The authors would like to thank Trafik Stockholm for kindly providing the data used in the study and Maria Börjesson, Leonid Engelson, and Karin Brundell-Freij for valuable comments on the work. This research was funded by the Swedish National Transport Administration under grant no. TRV 2018/16380.