Abstract
Mobile phone location data enable us to obtain accurate and temporally detailed long-distance travel distribution. However, the traditional long-distance travel distribution model cannot normally handle this detailed temporal information. This study proposes an approach for handling temporally detailed information of long-distance travel distribution. Considering this approach, the origin-destination matrix decomposes into two variables (indicators): destination amenity and travel cost. They can be interpreted as composite indicators of several variables that are treated in the travel-destination choice multinomial logit model. Because they are calculated only from the origin destination, we can discuss their detailed temporal variations. In this study, time changes in destination amenities and travel costs of interprefectural travel in Japan are calculated to confirm the value of this approach. These indicators have succeeded in describing the pattern of domestic long-distance travel in Japan. These quantified indicators have facilitated the understanding of the national land structure. They are useful as outcome measures for policy-making. Moreover, these indicators explain the temporal applicability of the destination choice model. Specifically, the results of destination amenities have a large seasonal variation. This indicates that the parameters of the destination amenity model (i.e., the coefficients of the destination variables) are not seasonally stable. Therefore, this must be considered when dealing with destination choice for long-distance travel.
1. Introduction
Travel behavior outside metropolitan areas is more difficult to characterize than that within the metropolitan area, which is often routine and stable. In this study, we call such travel behavior outside the metropolitan area “long-distance travel” and address its difficulty caused by two features of long-distance travel: the small frequency of trips per person and the large variation in travel frequency among individuals. These characteristics make surveys with random sampling questionnaires inefficient because most respondents do not undertake long-distance travel during the year. However, some people travel long distances at such a high frequency that they cannot easily write a yearly record. Thus, few surveys can accurately capture country-widelong-distance travel.
To solve these data limitations, several scholars have developed techniques to complement the whole from small-sized data. For example, the gravity model [1], which is traditionally commonly used in transportation models, describes simple regularity. It shows the relationship between trip distribution and the zone size/travel level of service (LOS). This regularity was pointed out more than 70 years ago [2]. This model is commonly used in recent studies (e.g., Lenormand et al. [3], Zhang et al. [4], Cordera et al. [5], Chai et al. [6], and Grosche et al. [7]). Erlander and Stewart [1] showed that the parameters of the gravity model can be reasonably estimated even with samples as small as 1,000. Although their calculations are for inner-city traffic, the results show the ability of the gravity model to complement and predict the entire origin-destination (OD) matrix with a small sample of data. The disaggregated discrete choice model approach (e.g., Kato et al. [8], Yao and Morikawa [9], and Fu et al. [10]), which deals with travel destination choice, is of similar values. These models allow us to understand complementary travel distribution and predict its future values. The data required to estimate the coefficients are the disaggregated response results of the questionnaire. Moreover, the required number of samples for this estimation is sufficiently smaller than that for all OD traffic volume estimations.
Passively collected location data (e.g., mobile phone data) can solve the sample size problem. There is the available information on a large sample to create an OD matrix for long-distance travel. Moreover, several studies have already used mobile phone location data to study long-distance travel [11–14]. Therefore, the requirement for complementing the travel distribution by simple regularities is smaller than it has been in the past.
However, attempts to apply mobile phone location data to the traditional model have the following problems. First, some information that is readily available in the questionnaire is lacking. For example, information about the “destination,” purpose of the trip, and the transportation mode is not directly available. However, this will be less of a problem in the future because several approaches have already been proposed to infer this information (such as purpose and destination identification) [13, 15, 16] and travel mode identification [17]. Second, the temporal detail of the model is constrained by the explanatory variables. We can now use mobility information with temporal and spatial details. On the contrary, the socioeconomic and travel LOS indicators often do not have sufficient temporal and spatial resolutions. Consequently, the temporal detail of mobility data may be sacrificed in the traditional approach. Pitombo et al. [18, 19] propose a more accurate distribution model by applying decision tree algorithms. Nevertheless, it has the same limitation for applying it to the temporal detail. Therefore, several approaches have been proposed to find new patterns (regularities) by applying the factorization approach to mobility data [14, 20, 21].
Here, the combination of mobile phone location data and traditional models is used to search for newly available information while avoiding the second problem. This study proposes an approach based on this idea. Regarding this approach, temporally detailed information is treated by applying a simple description by the traditional travel-destination choice model. A key feature of this approach is that the two indicators, namely, destination amenities and travel costs, are estimated directly from mobile phone data, rather than as functions of variables such as travel time or population. Destination amenities and travel costs have different spatial impacts. Therefore, these indicators can be estimated by a simple assumption and an OD matrix in which most of the elements are filled. If multiday OD matrices are used, these indicators can be estimated for each of those days. Thus, we can determine detailed temporal changes in travelers’ perceptions of destination amenities and travel costs. This information is difficult to obtain using the model that describes them as a population or travel time function. This model differs from other travel destination choice models that use large-scale data records [22] because these indicators are estimated directly. The “pairwise constant” and region dummy in the models of Zhu and Ye [23] and Kristoffersson et al. [24] are conceptually partly similar to the two indicators used here. Nonetheless, it has a different meaning and spatio-temporal treatment.
Here, the proposed approach decomposes the wealth of information from the large sample by focusing on spatial patterns so that the travel costs and amenities due to unknown factors are also reflected. Thus, compared with the traditional approach, the proposed approach is less capable of understanding the mechanism, but it provides additional information to understand the national land structure without missing any information in the hypothesis setting phase.
To confirm the value of this approach, we calculate the time changes in destination amenities and travel costs of interprefectural travel in Japan. The results confirm the following two values: (1) the simple indicator describes the pattern of long-distance travel in Japan and (2) the time variability of these indicators informs the temporal applicability of the destination choice model. Considering the first value, the information obtained here is consistent with the expected (known) pattern. This indicates that we have successfully quantified the destination amenities and travel costs as perceived by the traveler. In addition, for the travel-cost indicator, we succeeded in revealing the national land structure without generating complex LOS data for long-distance travel.
The second value clarifies the additional information obtained from the proposed approach. The time-series features of destination amenities require attention because this indicator is constantly changing, and it has a large seasonal variation. This indicates that the parameters of the destination amenity model (the coefficients of the destination variables), which are estimated by information biased toward a particular season, are difficult to apply to other seasons. This point must be considered when dealing with destination choice for long-distance travel. To address this concern, we can use mobile phone location data as a powerful data source because of these data’s capability to provide a sufficient sample at any given time. In addition, we also succeeded in characterizing some seasonal variations in people’s sensitivity to LOS.
The rest of this study is structured as follows: Section 2 presents the approach of this study, which contains the OD travel matrix data and the methodology used to calculate and analyze the destination amenities and travel costs. Section 3 discusses the static results derived from the OD matrix of all-time points to clarify the spatial characteristics of the two indicators. Section 4 illustrates the estimation of time trends of the two indicators based on four years of daily data (1,461 days), and it summarizes the characteristics of their seasonal changes. Section 5 concludes the study. Finally, in section 6, we discuss the value of the results obtained in this study and the scope of application of the proposed approach.
2. Research Design
2.1. Data
We use the OD matrix data from “Mobile Spatial Statistics” [25], a statistic of population distribution that uses network-driven mobile phone network records. “Mobile Spatial Statistics” is also used in the studies of Kubo et al. [26], Yamaguchi and Nakayama [14], and Hara and Yamaguchi [27]. These aggregate data are generated from the operational data of 85 million cellular phones provided by NTT DOCOMO, Inc. Moreover, it enables the collection Japan’s population distribution with an approximate hourly frequency. We will analyze 1,461-day interprefectural OD matrices that are created using the stay and residence information in the operational data. This is expressed in the equation as follows:where indicates the date (i.e., . In addition, denote residence and travel destination, respectively, and are aggregated for each of the 47 prefectures. Therefore, is the estimated number of people staying in zone . Furthermore, their place of residence from the mobile phone operation data is denoted by . This study will be conducted using only prefecture-based aggregate data; i.e., no personal information will be handled. These aggregate data were generated by NTT DOCOMO, Inc. from mobile phone operational data after necessary processing to protect privacy (deidentification, estimation, and disclosure limitation). Terada et al. [25] explain the detailed procedures.
Here, we analyzed the aggregate data , which were generated from information during the specific time period 13:00–14:00. This is when people most likely stay at their travel destinations. This indicates that we assume that the travel destination is the prefecture, where the traveler is staying at 13:00–14:00. This assumption may deviate from people’s perceptions of the destination. This study presents an analysis under this assumption. The proposed approach does not depend on this time assumption, and we can apply it directly if more accurate and detailed OD matrices such as those of Bachir et al. [28] are obtained.
Thus, is the set of all 47 prefectures in Japan. In this study, interprefecture travel is treated as an example of long-distance travel. The administrative boundaries of prefectures shown in Figure 1 are historically decided; in Japan, most boundaries are set in straits or mountain ranges. As a result, many of the locations coincide with the boundaries of the daily living area, so this matrix shows a large number of unusual trips. As can be seen from the scale in Figure 1, most of the nonadjacent OD pairs are more than 100 km in linear distance, which is also close to the definition (larger than 100 km) adopted in Axhausen et al. [29].
2.2. Basic Characteristics of the Target OD Matrix
Here, we summarize the basic characteristics of the data used in the analysis. The number of elements analyzed is 47(47 − 1)1,461. In this set, only 19,138 (0.61%) of the combinations were either not observed or confidential information owing to the small size of the sample. The data used in the analysis are mostly filled with numbers greater than zero.
Here, we consider the original day-to-day dynamics of the data. Figure 2 shows the day-to-day dynamics of the number of travelers for the two OD pairs over a one-year period. These figures show the following two characteristics of the time-series change for long-distance travel: (1) there are several peaks in the period. However, they differ among OD pairs. For example, the travel from Tokyo to Hokkaido (a) has large short-term peaks in May, August, and January. In contrast, the trip from Fukuoka to Tokyo (b) has relatively long and weak peaks in April and late January. (2) The volume of off-peak travel also varies with season, even on the same day of the week. For example, the travel from Hokkaido to Tokyo (a) in July and February differs significantly.
(a)
(b)
The time trends shown here illustrate the difficulty of the approach to classify the dates into several groups since it is difficult to establish a clear grouping. Special peaks are observed in more than five periods, as shown in Figure 2. The number of peaks may increase after checking other OD pairs. Furthermore, even if we focus on periods other than the peaks and consider each day of the week, seasonal differences still have their impact. In addition, seasonal variation is likely to be different for each OD pair. In contrast, our approach allows us to understand the characteristics of these complex temporal variations by extracting information on two types of spatial patterns at all-time points.
2.3. OD Matrix Decomposition Method
This section describes how to derive the two types of indicators by decomposing the OD matrices using the concept of the destination choice model.
We consider a multinomial logit model for long-distance travel destination choice as follows:where is the probability that a resident of prefecture will travel to prefecture (stay at 13:00) on date . This form is the most basic multinomial logit model [30]. If , then it corresponds to the probability of “not traveling outside the residence prefecture.” These variables will be directly estimated in this study as unknown variables through maximum likelihood.
The systematic component of utility in most destination choice logit models is formulated as a linear sum of the relevant indicators multiplied by the coefficients. Considering the approach, socioeconomic indicators of destinations (destination variable) and the transportation level of service (LOS) are applied as the relevant indicators. For example, Marrocu and Paci [31] used the following variables as destination variables for destination choice models of long-distance travel: GDP, population density, natural environment (e.g., area of natural parks), and cultural and recreational attractions (e.g., number of museum visitors and restaurants). They also used the following variables as transportation LOS: geographical distance and prices. Similarly, Kato et al. [8] used the following variables as destination variables for destination selection in Japan: population, working population, GDP per capita, percentage of employees in the service sector, number of accommodations per area, and Hokkaido and Okinawa dummies. However, the temporal variation of these indices is smaller than that of the OD travel volume. Therefore, models using the previously mentioned indices can only deal with temporally aggregated behavior, such as the total value for one year. They cannot deal with detailed temporal changes, such as equation (1).
In contrast, this study calculates two types of indices, namely, ( and ), at each time point . These two types of indicators are obtained by decomposing the OD matrix information into two types of spatial patterns. In this study, the same spatial pattern as in previous studies will be used. For example, the indicators used in the model of Marrocu and Paci [31] fall into the following two types of spatial patterns: indicators defined per destination location (GDP, population density, natural environment, and cultural and recreational attractions) and indicators defined per zone pair (transportation LOS). The approach proposed in this study estimates all indicator values themselves, applying only the constraints that indicate the two types of spatial patterns described previously, rather than population or other indicators. With this approach, two types of more accurate indicators can be calculated for each time point.
Thus, the time variation of interprefectural travel distribution can be explained by the time-series change of two calculated indices ( and ). Here, the model’s form corresponds to the logit model of destination choice as in (2). Hence, the calculated indices ( and ) can be interpreted as composite indices of multiple variables in several destination choice models.
The definitions of the two indicators ( and ) are calculated in this study, and their meanings are explained as follows: first, is assumed to be a variable defined for each destination on each date, satisfying the following equations:
Second, (4) indicates that the same value applies in the case of the same travel destination, excluding the component. Third, (5) shows that this indicator is zero when “not traveling outside the prefecture” (with ). Index is the composite index of the destination variables as described. This indicator is expected to account for cultural and recreational attractions, population, and economic size as pointed out by Marrocu and Paci [31]. Subsequently, we will refer to index as the “travel destination amenity.”
Let be a variable defined for each OD pair and let it satisfy the following equations:
Furthermore, (6) shows that the variable remains constant though the OD direction is reversed. (7), together with (5), sets the deterministic utility of “not traveling outside the prefecture” to zero . The variable set by Kato et al. [8] for each OD pair , regardless of such direction, is a log-sum variable derived from the mode choice model. This variable is the expected utility of the mode choice result obtained by calculating and synthesizing transportation LOS, such as travel time, fare, frequency, and transit time for each mode. The calculated in this study is also an index calculated by synthesizing such multiple transportation LOSs. Subsequently, index will be referred to as “travel cost.”
In this study, these two types of variables are unknown, and the value that maximizes the log-likelihood is estimated as follows:where cannot be uniquely determined for all the variables owing to rank deficiency. This is because, in this model, the following relation holds for any .
This issue can be resolved by fixing one variable, either or .
Therefore, we will estimate the unknown variables with one additional constraint:
This indicates that Tokyo’s destination amenity is fixed at zero, and all the other variables are calculated at their relative values. Notably, the destination amenity and travel cost presented as follows are relative values estimated under the constraints of (10). Thus, the absolute values are meaningless, and we need to focus only on spatial relationships of values. More care needs to be observed when comparing the same value to other periods since the change in value includes the change in destination amenity in Tokyo. In this study, we focus on indicators of spatial differences that are comparable even under this constraint.
The method described here is unique as it focuses on spatial patterns on the OD matrices. The two composite indices obtained here are identified only according to the spatial characteristics shown in the equations (4)–(7). The other basic approaches to analyze matrices (such as principal component analysis) cannot handle indices such as . This is because the basic matrix decomposition is focused on columns and rows. However, the travel OD matrix is characterized by the fact that its diagonal components have significantly different meanings from any other matrix (or table) data. Moreover, they are strongly related to elements whose directions are opposite. Our proposed approach is unique as it extracts components that are tailored to the nature of such travel information.
The proposed decomposition approach corresponds to a simple case of decomposing a matrix of odds ratios into two spatial patterns as well. The decomposed components can be interpreted as the components of the utility function of the destination choice behavior in the multinomial logit model. If destination amenities and travel costs are consistent with the assumptions of the spatial pattern, we obtain two complete sets of indicators that consider travelers’ sensitivities as well. We should note that the information obtained in the following sections resulted from calculations based on these assumptions.
2.4. Analytical Procedures for Japanese Interprefecture Travel Data
In this study, the proposed approach is applied to Japanese interprefecture data to clarify its characteristics.
At first, we apply the aggregated data in (11) to verify the static characteristics of the interprefectural OD matrix while confirming the validity of the proposed approach. This is expressed as follows:
By applying this matrix to the proposed approach, the average destination amenity and travel cost can be calculated.
In this study, the characteristics of the calculated indicators are presented through several figures and regression analysis. For destination amenities, the following regression analysis is applied as follows:where is the population of prefecture and is the number of employees in the lodging industry in prefecture . The source for the population is the 2015 census; the source for the number of employees is the 2014 economic census [32]. This number of employees is considered to indicate the size of the service industry for visitors, such as the tourism industry, in each prefecture.
The following regression analysis is applied to the estimated travel cost in order to clarify the relationship with transportation LOS.where and are explanatory variables for the travel LOS between prefectural offices, respectively. is the straight-line distance between the prefectural offices, and is the shortest travel time using multiple transportation modes (air, rail, bus, and ships). For , we applied the values published in the national government statistics [33].
Then, by applying the daily matrix to the proposed approach, destination amenities and travel costs are calculated for each date. The variation of the two indicators over time is discussed, showing the correlation coefficients of the estimates and the coefficients of the regression analysis (shown as follows). The regression analysis is basically the same simple model as in the static case but applied to all days, respectively.
Here, since the assumption in (10) affects only the constant term , we can compare the other coefficients among different dates . A similar regression analysis is also performed for the estimated travel cost as follows:
3. Decomposition Results of the Static OD Matrix
In this section, we describe the characteristics of the proposed decomposition method, showing the results of applying the proposed approach to OD matrix data aggregated at only one point in time, as shown in (11).
3.1. Difference between the Observed and Estimated Results
We compare the log-ratio matrix of the observed data to to verify the validity of the decomposition. More importantly, is the matrix defined by (16) and (17) as follows:where has the following relationship from (2) to (3):
Therefore, a comparison between the right and left sides of (18) presents the ability of two different indicators for representing a long-distance travel distribution.
Figure 3 confirms the following three features observed in the OD table : (1) the diagonal components are all zero as defined in (17). (2) The closer the component to the diagonal component, the larger the value is (likely to be selected as a travel destination). Moreover, the farther the component from the diagonal component, the smaller the value is (less likely to be selected as a travel destination). The zones in this figure are arranged based on the prefecture codes assigned from northeast to southwest. The adjacent zones are spatially close to each other. Therefore, the aforementioned characteristics indicate that several travelers choose neighboring zones as their travel destinations. (3) Only the component (column) with Tokyo and Osaka as destinations has large values for most residential areas, regardless of location. This feature indicates that Tokyo and Osaka are the top travel destinations of several travelers despite their spatial distance. This may be because Japan’s major economic functions are concentrated in these two cities.
A comparison between Figures 3 and 4 confirms that they are approximately in agreement. This finding can also be confirmed from the scatter plot of Figure 5 and the correlation coefficient . These results indicate that the characteristics of the OD matrix between prefectures can be explained by the two components of the estimated travel cost to the migration and destination amenity matrices .
In our decomposition method, the two types of spatial patterns represented by the two matrices are fully accounted for, as we estimate all elements of matrices with minimal spatial constraints. Under this assumption, the residuals, excluding the observation error, can only be caused by the following two reasons (cases where the model assumptions are not satisfied): (1) travel costs differ depending on the direction. For instance, the perceived LOS differs because of different time values. (2) The destination values vary by the residential zone. For example, Hokkaido’s destination value is greater for people in metropolitan areas than for people in rural areas. Figure 5 indicates that the effects of these differences are significantly small in the context of Japan’s OD matrix size. We disregard them and focus only on other information (destination amenities and travel costs) to clarify their characteristics.
3.2. Estimated Results of the Destination Amenity
Figure 6 shows the estimation results of the destination amenity . The figure reconfirms the meaning of the three constraint equations for destination amenity, namely, (4), (5), and (10). First, constraint (4) implies that the destination amenity of each prefecture is the same for all the residential areas (excluding the diagonal component). Second, constraints (5) and (10) imply that the diagonal component and destination amenity of Tokyo are zero. The estimation results of Figure 6 indicate that the diagonal component and destination amenity of Tokyo are zero.
Subsequently, the differences in the destination amenities of each prefecture are examined. The destination amenity indicates the attractiveness of a destination toward travelers. For example, the probability of choosing a destination is expected to be proportional to the number of trips related to businesses, weddings, funerals, and other relationship-based trips. Figure 7 confirms this relationship. It shows the relationship between the estimated travel destination amenity and each prefecture population. The figure also confirms that the destination amenity of each zone is strongly correlated with the logarithm of the population. Therefore, the destination amenity is mainly determined by the population size. This result is consistent with a previous study that developed an inter-regional travel demand model for Japan based on a traditional approach Kato et al. [8].
In contrast, there are several exceptions. For example, Tokyo, Hokkaido, and Okinawa are located above the regression line. This indicates that they have high destination amenities. These prefectures have a higher value considering the urban functions and tourist attractions than the population size. This indicates that they can strongly attract tourists. On the contrary, regions such as Kanagawa, Saitama, and Nara prefectures are below the regression line (low destination amenity). Thus, they have a weak ability to attract travelers relative to population size.
Therefore, the ranking of the destination amenity by prefecture in Table 1 is examined. The top and bottom five prefectures in this table show two types of rankings. The ranking on the left is the estimated destination amenity, which is highly correlated with the population size as confirmed in Figure 7. The ranking on the right is the residuals from a simple regression by population as shown in Figure 7. Therefore, the ranking lists the prefectures with high- and low-estimated travel destination amenities per population size. Therefore, Okinawa, Hokkaido, and Nagano prefectures have high destination amenities per population size, in addition to Tokyo. These zones are recognized as resort areas, excluding Tokyo. These results indicate that the destination amenity is a quantitative index that integrates multiple attractions related to long-distance travel.
In contrast, Saitama and Nara prefectures have the smallest population-corrected destination amenity. These zones are adjacent to large cities similarly, such as Tokyo and Osaka. Therefore, several people in these prefectures commute to Tokyo and Osaka for work. When meeting those from other prefectures, they may often prefer to meet at their workplace (Tokyo or Osaka) because of better accessibility and recreational attraction. Consequently, the destination amenity (ability to attract travelers) is smaller than the population size. Moreover, the destination amenities of Tokyo and Osaka are largely based on this mechanism.
In addition, Table 2 shows the results of the regression analysis defined in (12). Table 2 shows that these two variables are each statistically positively significant, as expected. The value of indicates that these two variables can explain most of the destination amenity variation. The results show that the results obtained by aggregating all periods are almost identical to the conclusions obtained with the traditional approaches. On the other hand, the value of our approach in terms of destination amenities will become clear in the next section, where a similar model is applied to several time points.
3.3. Estimated Results of Travel Cost
We examine the estimation results of the travel cost matrix from Figure 8. The diagonal component is zero based on the condition in (7). In addition, the matrix is fully symmetric based on the condition in (6). The farther the distance from the diagonal component, the larger the absolute value of the travel cost tends to be. Each zone in Figure 8 is arranged in the same order as that in Figure 3 (i.e., from northeast to southwest). Therefore, the farther a pair of prefectures from the diagonal component, the farther the actual Euclidean distance is. The travel costs shown in Figure 8 are also close to the actual Euclidean distances.
An examination of the zone pairs of Tokyo departures and arrivals shows that the absolute values are relatively small, including the distant zones. This observation indicates that convenient transportation is available to most areas in Tokyo. Therefore, the air transportation network at Haneda Airport has a strong influence.
Therefore, a close examination of the diagonal components shows certain blocks with small absolute values of travel cost. The two yellow-colored squares in the lower right corner of Figure 8 represent the four and seven prefectures in Shikoku and Kyushu islands, respectively. This figure indicates that there are a lot of mutual travel activities inside a similar island. Other blocks from the southwest (from the lower right of Figure 8) can be identified as regional blocks.
The exponential value of this estimated travel cost is used as the distance. Moreover, each prefecture is plotted on the two-dimensional coordinate to describe this distance matrix in detail. Here, the positions of Tokyo and Osaka are fixed, and the coordinates of each prefecture are calculated to match the ratio of the distances between the prefectures on the two-dimensional plane and that of the estimated travel cost . Figure 9 shows the location plot of each prefectural office using the original coordinates. Considering this figure, only the major rail links (high-speed rail (HSR) in purple) are shown as links connecting the nodes as shown in Figure 10.
Figure 10 shows the following three characteristics of the estimated travel cost: (1) Japan’s transportation network is centered on Tokyo, and the shape of the country derived from the estimated travel cost is a circle centered on Tokyo. This structure is formed because the travel cost is defined by the LOS of air routes over a certain distance. Moreover, direct flights are available from Haneda Airport to approximately all the prefectures. (2) Regarding the circular structure, the city axis on the Pacific Ocean and Sea of Japan sides are located on the inner and outer sides, respectively. The cities on the Sea of Japan side, such as Akita, Ishikawa, and Tottori, are located on the outer side of the circle. The distances between them are farther from those on the Pacific Ocean side. They are also distant from other cities in Japan. This pattern approximately reflects the shape of the current HSR network. (3) The cities along the Tokaido HSR (Tokyo-Osaka) and Tohoku HSR (Tokyo-Miyagi-Aomori) lines do not move away from Tokyo in a linear direction. They rather move in a distorted fashion. Here, the zones where express trains (Nozomi, Hayabusa) stop and have high-frequency service tend to be relatively close to the center although they are far from Tokyo. The travel cost derived in this study is an index that includes detailed LOS, such as those that have been mentioned.
Then, Table 3 shows the results of the regression analysis defined in (13). Table 3 shows that these two variables are each statistically negatively significant, as expected. On the other hand, the value of indicates that 25% of variances remain as a residual after being explained by these two variables. The residuals are expected to reflect other information such as service frequency, fares, and differences between transportation modes. However, it is not easy to generate data to create a model that can discuss these detailed issues. The proposed approach allows us to understand the state of transportation service levels in the country as a whole, shown in Figure 10, without detailed data preparation.
4. Time-Series Change in the Two Decomposed Indicators
The time variability of intercity travel in Japan can be described by time-series changes in the two indicators proposed and defined in this study. Therefore, this section presents a calculation of the two indices for each day and checks the time-series changes for 1,461 days.
First, we compare the estimated indices with those calculated from the all-time aggregate and OD matrix in Section 4. The correlation coefficients are calculated for each date. Here, the correlation coefficient is not affected by the assumption in (10) since the constant term is irrelevant. Table 4 shows the mean, one and five percentiles, and minimum values of the correlation coefficients for 1,461 days.
The correlation coefficient value of the destination amenity indicates that the average value is large (at 0.961), whereas the correlation coefficient is smaller than 0.877 at approximately 5% (approximately 70 days). Furthermore, the correlation coefficient is smaller than 0.584 at approximately 1% (approximately 18 days). This finding indicates that the magnitude of travel destination amenities among the prefectures is significantly different from the average level at a certain time point.
Figure 11 shows the time trend of this correlation coefficient value. This figure confirms that the correlation coefficient has been fluctuating approximately around the same value every year. Therefore, the seasonal variation is larger than the interannual variation in the travel destination amenity. The correlation coefficient values are small in three periods. These are grayed out in Figure 11. Considering the left to right, these periods are Golden Week (GW and consecutive national holidays), Obon (religious holidays in Japan), and New Year holidays. Among these periods, Obon and New Year holidays have smaller correlations than the other periods. Thus, the distribution of the destination amenity (attractiveness of travelers) in each prefecture significantly differs from the average.
Subsequently, the correlation coefficients of travel costs in Table 4 are examined. The correlation coefficients between the travel costs at each date and the average values at the all-time points are 0.930, including those at the minimum points. This value is significantly large at the all-time points. Therefore, the relative distance between the prefectures indicated in Figure 10 is approximately the same at the all-time points.
Next, we applied the regression analysis (defined in (14)) to obtain more detailed time-series features of destination amenities. The value of for each day in Figure 12(c) indicates that the destination amenity for the three peak periods (GW, Obon, and New Year holidays) is hardly explained by the two variables. Thus, the results indicate the requirement of a different model for these periods. Furthermore, the estimated coefficients in Figures 12(a) and 12(b) show that the parameters of the model differ significantly depending on the time period. For example, the population coefficients are statistically insignificant in the three peak periods and relatively large in April and late January. The coefficient of the number of employees also shows large changes depending on the season as well as the day of the week. Thus, because of the large seasonal variation in destination amenities, the model with only a few groupings of periods in the traditional approach may be applicable only at certain limited points in time. The estimated provided us with these important caveats on the characteristics of destination amenities and the trend of time variation in Japan.
(a)
(b)
(c)
A similar regression analysis (defined in (15)) is also performed for the estimated travel cost . The value of for each day in Figure 13(c) indicates that variances of approximately 25% remain in the residuals in all dates. The estimated coefficients in Figures 13(a) and 13(b) show some seasonal differences, although the correlation is high as shown in Table 4. For example, the coefficient of travel time is small in the New year and Obon periods. This indicates a lower resistance to travel to places that take more time in these periods. Thus, the trend is different during GW (early May) when the coefficient of distance is smaller, not that of time.
(a)
(b)
(c)
5. Discussion
Calculating the two indicators in the proposed approach has the following benefits.
First, it evidences the temporal stability of the travel destination choice model in Japan. The spatial difference of the travel cost is constant over time (as shown in Table 4), which is a favorable characteristic. This indicates that the coefficients estimated from the data at one point can be applied at other points. Moreover, seasonal differences in destination amenities were found. This finding raises an important concern for several destination choice models. This is because it indicates the possibility of the coefficients of the destination variables changing significantly from season to season. Thus, a model estimating from data that are biased toward a particular period cannot be applied to a broader point in time. Obtaining such data through questionnaires is difficult because of people’s memory capacity and survey cost. To address this issue, we passively collected large sample data, such as mobile phone location data. These data are valuable because they easily provide a large sample over a long period.
Secondly, such a quantitative and simple description of the spatio-temporal characteristics of long-distance travel behavior is useful for transportation policy. The calculated travel costs demonstrate the structure of the country as perceived by the people. Figure 10 clearly shows the current status of long-distance transportation networks. This will be helpful in future discussions. In addition, the destination amenity identified here quantifies the ability of each prefecture to attract travelers. This may be a useful outcome measure for strategies to attract travelers because it can attract people from all over Japan. It can attract those who are less susceptible to the influence of few limited origins.
Furthermore, the decomposition approach proposed in this study is not limited to long-distance travel. It can be applied to any OD matrix data for which most of the elements are larger than zero. However, the analysis of OD matrix data for daily traffic, where many of the influencing factors have already been identified and where the phenomenon is stable over time, may not yield many new findings. Thus, we believe that the proposed approach is well suited for understanding highly variable patterns such as long-distance travel.
The approach and results described in this study have some limitations. To confirm the validity of the pattern obtained in this study, it is desirable to analyze the effect of assuming a time period set at . The critical limitation of this approach is the difficulty of applying the policy simulation. This is because the model does not contain variables that can be controlled directly. To discuss the effects of future policies, we determine that a model with traffic LOS and destination variables is more appropriate, which is similar to the case with traditional models. Applying the proposed approach has an advantage for calculating the outcome measures and confirming the prediction model’s time stability as discussed. In addition, the results obtained in this study may be limited to long-distance travel in Japan. Particularly, more analyses are required to answer the question, “Do the warnings discussed here apply to other countries or other types of trips?” This question can be answered by applying the approach proposed in this study to other multipoint OD matrices.
6. Conclusion
This study proposes a method for decomposing the long-term and daily interprefectural OD matrix into two types of spatial patterns. The indices calculated in the decomposition of this study are referred to as the destination amenity and travel costs. These are identified by the differences in the influence space patterns on the OD matrix, and these patterns correspond to the variables treated in the traditional destination choice model. Therefore, the two indices (destination amenities and travel costs) are composites of several variables treated in any destination choice model.
Using mobile phone location data, we can derive detailed time-series changes in these indices because they can be calculated directly from the OD matrices. This study analyzes the changes in the two indices of interprefecture travel in Japan for over 1,461 days.
By calculating the two indicators in the proposed approach, the characteristics of interprefectural travel in Japan are briefly described as follows: (1) the destination amenity is strongly correlated with the population size. This value indicates zone attractiveness toward travelers. (2) Hokkaido, Okinawa, and Tokyo have higher destination amenities than the population size. Hence, they are attractive travel destinations, in addition to their population size. (3) These destination amenities change significantly with the seasons. Therefore, seasonal variations should be taken into account when modeling destination amenities. (4) The calculation results of the travel costs show that Japan’s national land structure is circular, with Tokyo at the center. Prefectures facing the Pacific Ocean are located inside the circle, whereas those facing the Sea of Japan are located at the outer edge of the circle. Long-distance transportation services are relatively unimproved in prefectures facing the Sea of Japan. (5) Additionally, it was found that this travel cost structure shows little variation over time, unlike destination amenities.
Data Availability
The MSS data used to support the findings of this study were supplied by DOCOMO InsightMarketing, INC under license and so cannot be made freely available. Requests for access to these data should be made to DOCOMO InsightMarketing, Inc. The same data are not freely available, but only the most recent data can be seen in https://mobakumap.jp/.
Ethical Approval
Not applicable.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Authors’ Contributions
Hiromichi Yamaguchi conducted conceptualization, data curation, formal analysis, methodology, validation, visualization, writing of the original draft, writing, reviewing, and editing of the study. Mashu Shibata conducted formal analysis, methodology, visualization, writing, reviewing, and editing of the study. Shoichiro Nakayama conceptualized, wrote, reviewed, and edited the study.
Acknowledgments
This study was supported by JSPS (Japan Society for the Promotion of Science) KAKENHI Grant-in-Aid for Scientific Research (B) 20H02267, 20H02270, and 21H01455 and by the Ministry of Education, Culture, Sports, and Science and Technology, Japan (MEXT) LEADER project.