Abstract

The Hong Kong-Zhuhai-Macao Bridge (HZMB) is an important transportation facility connecting Hong Kong, Zhuhai, and Macao. Thus, analyzing the characteristics of cross-border behavior becomes crucial for enhancing the smart travel experience of the HZMB. Discrete choice models (e.g., logit models) are commonly used to describe travel mode choice behavior. Multinomial logit (MNL) is subjected to the independence of irrelevant alternatives (IIA) assumption. Nested logit (NL) model does not consider the heterogeneity of travel individuals. Mixed logit (MXL) model can overcome the above limitations, but it may neglect model uncertainty. Therefore, a Bayesian model averaging (BMA) approach is applied to model travel mode choice behavior considering using revealed preference/stated preference (RP/SP) fusion data collected by questionnaires online. A structural equation model (SEM) is adopted to explore the potential relationship between latent variables, and two travel modes (i.e., cross-border bus and cross-border private car) are selected to analyze the cross-border travel mode choice of the HZMB. The results reveal that the MXL-BMA approach can better explain the cross-border travel mode choice behavior. And the transportation modes arriving and departing the HZMB have a significant impact on the travel mode choice of the HZMB. The findings of this study can provide suggestions for designing personalized travel services for travelers across the HZMB.

1. Introduction

The Hong Kong-Zhuhai-Macao Bridge (HZMB) serves as a crucial transportation facility that connects Hongkong, Zhuhai, and Macao. Analyzing the travel mode choice behavior while crossing the HZMB bridge is of utmost importance for regional traffic planning, management, and control [13]. Travel mode choice behavior analysis can explore the cross-border travel preferences of HZMB users and identify the significant factors in travel decisions. Moreover, it explains the internal mechanism of travel demand for cross-border travelers and then guides the travelers to choose a more reliable, efficient, safe, and environmentally friendly cross-border travel mode. These can support the rational utilization of road transportation infrastructure resources [47].

The HZMB is a vital connection between Guangdong province, Hong Kong, and Macao. Several studies have analyzed the travel behavior between Hong Kong, Zhuhai, and Macao. Lu et al. [8] applied a nested logit (NL) model to explore the travel behavior characteristics of passengers between Hong Kong and Zhuhai via HZMB. Hu [9] adopted the K-means method to analyze the spatiotemporal characteristics of travelers entering Hong Kong using cell phone data, revealing a significant increase in entering passenger flow during holidays, with tourists predominantly originating from Shenzhen or neighboring cities. Liu and Shi [10] utilized the smart card dataset of Shenzhen Metro to describe the cross-border travel behavior between Hong Kong and Shenzhen and further explore the travel spatiotemporal characteristics.

For travel behavior analysis, the discrete choice models are the common methods, such as logit family models, including multinomial logit (MNL), NL, and mixed logit (MXL). These models aim at maximizing the random utility to represent individual choice behavior [1113]. Broach et al. [14] proposed a MNL model to explore the travel modes behavior of walking, bike, car, and bus. However, MNL model assumes that the random terms of alternative travel modes follow the Gumbel distribution and neglect the relationship between alternative travel modes. Moreover, it is subjected to the independence of irrelevant alternatives (IIA) assumption. To overcome this issue, NL model explains the relationship between different travel modes via sharing a part of random errors, which is helpful to accurately individual travel behavior [15, 16]. Fan [17] utilized RP/SP fusion data to establish an NL model for travel time and travel mode (i.e., car and bus) choice, aiming to identify the crucial factors influencing travel mode choice behavior. However, NL model does not consider the correlation between datasets and heterogeneity among individuals. MXL model introduces a random coefficient to represent the heterogeneity of travel individual behavior choice [18]. Wu et al. [19] identified the crucial factors on transportation mode choice behavior on urban subway entrance and exit and further revealed the importance of transfer facilities and environment on travel mode choice behavior. Ye et al. [20] adopted MXL model to explore the impact of shared bikes on travel mode and quantified the influence of various factors, such as travel attributes, building environment, and travel characteristics on the willingness to use shared bikes under different scenarios.

Machine learning-based models (e.g., random forest and neural network (NN)) take travel mode choice modeling as a classification problem and have been demonstrated the feasibility of describing travel mode choice behavior [2123]. Lindner et al. [24] applied artificial neural networks (ANN) and classification trees (CT) to model travel mode choice using multicollinear data with multiple dimensions. Lhéritier et al. [25] presented a nonparametric machine learning model to describe airline itinerary choice behavior considering the nonlinear relationship between attributes of travel mode and individual characteristics. Golshani et al. [26] explored the relative importance of exploratory variables and used ANN to model travel mode and timing. Machine learning-based models output logit models in fitting performance and predictive capability in travel mode choice behavior, but the interpretability is poor [27, 28].

To sum up, the previous studies on cross-border travel behavior analysis of Hong Kong-Zhuhai-Macao mainly focused on analyzing the characteristics of travel behavior (e.g., travel purpose and travel spatiotemporal distribution). Few studies focus on exploring the transportation mode and travel behavior characteristics. Moreover, MXL model is more capable of capturing the heterogeneity among individuals in travel behavior with RP/SP fusion data. However, there is model uncertainty, which may result in poor fitting performance. Bayesian model averaging (BMA) approach was proposed to overcome the model uncertainty for improving model performance in many fields (e.g., transportation safety and energy) [2931]. BMA approach can weigh candidate models and combine the outputs into a new model to handle the model uncertainty, which takes the fitting performance and advantages of candidate models into account. Zhou et al. [32] proposed an ensemble approach, which integrated copula-based Bayesian model averaging (CBMA) and multiple deterministic artificial neural networks (ANN) to accomplish accurate probabilistic PM2.5 prediction tasks.

Therefore, with the RP/SP fusion data, this study proposed an MXL-BMA approach to interpreting the cross-border travel mode choice behavior of HZMB. This approach can solve the model uncertainty of the discrete selection model. The contributions of this study are twofold: (1) the SEM is utilized to explore the causal relationships and mechanisms between latent variables and the corresponding observation variables, as well as among latent variables, which can further support accurately described cross-border travel behavior of HZMB and (2) the MXL-BMA is proposed to analyze cross-border travel mode choice behavior of HZMB, which can handle the model uncertainty and better interpret the cross-border behavior to inform transportation planning and policy making for improving smart services.

The rest of the article is organized as follows. Section 2 describes the data source and processing procedure in detail. The model description is described in Section 3, and Section 4 analyzes the results of travel mode choice behavior. Section 5 briefly concludes the study.

2. Data Collection and Preprocessing

2.1. Sample Size

Due to the influence of COVID-19 which affects travelers between Guangdong province, Hong Kong, and Macao, the cross-border travel behavior dataset from HZMB users traveling from Zhuhai to Hong Kong was collected via an online questionnaire survey. As of October 23, 2019, the average daily passenger trips to three ports via HZMB were approximately 66900. According to the total number of survey respondents, the sample size can be calculated by the following equation:where is the coefficient of reliability, which is set to 1.96 (the confidence interval is set as 0.95, i.e., ). is the variance with a value of 0.5, is the margin of error with a value of 0.05, and denotes the overall survey respondents with a value of 66900. Therefore, this study acquired 811 valid sample data, which exceeds the minimum sample size of 390.

2.2. Questionnaire Design

Previous studies suggested that the travel mode choice behavior is associated with various socioeconomic attributes of travelers (e.g., gender, age, and education), travel-related factors (e.g., travel purpose and travel time), and psychological perception factors (e.g., safety, reliability, and comfort). Therefore, the questionnaire of this study includes traveler socioeconomic attributes, revealed preference (RP), and stated preference (SP).

2.2.1. Traveler Socioeconomic Attributes

Table 1 shows the socioeconomic attributes of travelers, including age, gender, education, and so on.

2.2.2. Revealed Preference

The RP survey encompasses various factors, including travel-related factors and psychological perception factors. Travel-related factors are explicit variables because they are easily observable (see Table 2). The psychological perception factors are latent variables that are harder to observe (see Table 3). The corresponding observation variables are utilized to represent latent variables to explore the service perception of travel mode for cross-border travel. These latent variables include reliability, safety, convenience, and comfort, and the corresponding observation variables are quantitatively measured with the five-component scale method.

2.2.3. Stated Preference

To investigate the factors that influence cross-border travel mode choice behavior, various attribute combinations were designed for travelers to choose from. Note that various travel modes and travel times result in varying travel patterns among public transport users. Hence, different observation variables are utilized for different travel modes and six levels are established (see Table 4). According to the uniform design experiment theory, six survey scenarios are generated considering various combinations of variables, as shown in Table 5.

2.3. Data Quality Control

To ensure the data quality collected from the questionnaire survey, the reliability and validity test are conducted. The reliability test utilized Cronbach’s alpha coefficient, while the validity test used Kaiser–Meyer–Olkin (KMO) and Bartlett spherical tests. The results show that Cronbach’s alpha coefficient is 0.845, which is larger than the threshold value of 0.8, indicating that the data is reliable. Moreover, the KMO value is 0.891, which is greater than 0.8, and the significance level (i.e., value) of the Bartlett spherical test is less than 0.05, indicating each variable exhibits high independence. To sum up, the collected travel data is suitable for analyzing the travel characteristics of HZMB.

2.4. RP/SP Data Fusion

RP and SP data reflect travel behavior characteristics from different perspectives. RP data capture the actual travel behavior patterns, while SP data provide the information on travel preferences and choices of travelers. However, RP/SP fusion data provide a more comprehensive information for understanding travel behavior. In addition, previous studies demonstrated that more accurate results can be obtained in travel mode choice behavior using RP/SP fusion data rather than using RP data or SP data [33, 34]. Therefore, this study integrates RP and SP data with the help of the panel data method to analyze cross-border travel behavior of HZMB. In the random utility theory-based discrete choice model, the relationship between random errors in the utility function of RP and SP data can be represented as follows:where and denote the random errors of RP data and SP data. denotes the measured coefficient, which can be calculated using the random errors obtained from models based on single RP data and SP data.

3. Model Description

This study utilized the SEM model to explore the potential correlation among the latent variables, and then an MXL-BMA model is proposed to interpret the cross-border travel behavior under different travel modes.

3.1. Structural Equation Model

SEM is a statistical method, which combines exploratory factor analysis and path analysis to explain the relationships between latent variables and observation variables, as well as among latent variables, and to explore the causal relationships and mechanisms between multiple variables. It not only can handle the problems (e.g., multiple mediation and multicollinearity) to provide an accurate interpretation of the interrelationships between variables but also can allow measurement error in observation variables [35, 36]. In previous studies, SEM is utilized to investigate the relationship among latent variables that affected travel mode choice behavior, and between latent variables and corresponding observation variables. It should be noted that the observation variables solely serve to measure the latent variables, and do not impact the travel mode choice behavior.where and represent the exogenous latent variables and endogenous latent variables, respectively. denotes the regression coefficient matrix of relations among endogenous latent variables, denotes the regression coefficient matrix of relations between exogenous latent variables and endogenous latent variables. and are the observation variables of exogenous latent variables and endogenous latent variables and and are regression coefficient matrices between exogenous observation variables and exogenous latent variables and among endogenous observation variables and endogenous latent variables. , , and are the error vectors.

3.2. MXL-BMA Model
3.2.1. Mixed Logit Model

The MXL model [37, 38] assumes that travelers can repeatedly choose travel mode and there is a potential relationship with the utility of unobservable items. In addition, the observation variables are not restricted to conforming to normal distribution or being independent. In the MXL model, the coefficients of explanatory variables are the random parameters of a distribution, thus effectively addressing the IIA defect that exists in the traditional MNL model. Let denotes the coefficient of a traveling individual , which follows a distribution with parameter , that is . Then, the constructed MXL model can approximate any random utility model. The utility function can reflect the travel preference characteristics of individuals and provide a more comprehensive explanation of cross-border travel mode choice behavior. Stochastic utility theory uses the utility function to represent utility value and measure the travel mode choice. For MXL model, the utility value can be calculated as follows:where is the coefficient of the explanatory variable, denotes the estimated parameter of factors affecting state-dependent heterogeneity. and are data binary indicator variables. If the travel choice of travel individual matches the actual choice of the RP survey, , otherwise, . If the travel individual chooses travel mode under scenario , , otherwise, .

The random parameter is utilized to capture individual differences in preference, and it is typically modeled using a probability distribution, such as normal distribution, log-normal distribution, uniform distribution, distribution, and so on. This study selected four latent variables as random parameters and assumed that the parameters follow distribution. Hence, the utility function can be rewritten as the sum of random utility and error as follows:where is the random utility of the utility function and is the error. The probability of traveler choosing travel mode under scenario can be calculated as follows:where denotes the measured coefficient of RP/SP data fusion.

3.2.2. Bayesian Model Averaging

MXL model can avoid the bias from IIA attribute, but it neglects the model uncertainty, that is, it cannot determine whether an explanatory variable should be taken into account when analyzing travel behavior. These result in low accuracy of interpretability. BMA approach chooses a set of candidate models as model space and weights the candidate models with posterior probability to take their fitting performance into account. BMA approach integrates the outputs of candidate models into a new model to depict the travel behavior. Therefore, BMA approach is applied to overcome the model uncertainty of constructed MXL models and improve the interpretability of travel behavior analysis [39, 40].

BMA weights the models in model space with the help of posterior probability and combines the results into a deterministic model to depict the travel choice behavior. The model space contains models that can be denoted by , which is a set of candidate models. For the given data set , the posterior probability of quantity of interest can be represented as follows:where denotes the posterior probability of under the candidate model , is the posterior model probability and . According to the Bayes rule, can be calculated by the following equation:where denotes the prior probability of candidate model when it is regarded as a “true” model and is the marginal model likelihood of candidate model , that is,where is the parameter vector of model , denotes the prior probability distribution of parameter under model , and is the likelihood under the model with parameter . Therefore, the posterior mean and variance can be calculated as follows:

One difficulty in implementing BMA approach is how to determine a proper model space. A proper model space not only can save efforts on low-performing models but also improve interpretability. A popular method is Occam’s window, also known as the principle of parsimony [41]. It advocates for selecting the simplest models with the least number of assumptions, which can handle the problems of overfitting, low universality, and low interpretability in complex models. Therefore, Occam’s window is applied to determine a proper model space. It mainly follows these two principles:

First, if the posterior probability of a model is much smaller than the given optimal model, this model should be removed from the model space, as shown in (11), is the maximum posterior probability of model. The value is determined according to the actual situation. After many attempts, the is set to 20 in this study.

Second, if the posterior probability of complex model is smaller than the sample model, the complex model should not be considered, as shown in equation (12). This principle is used to exclude some complex models and is generally reduced to one or two models.

To sum up, equation (7) can be rewritten as follows:where .

4. Results

4.1. Latent Variables Analysis

The SEM is utilized to explore the internal relationship between latent variables, as well as latent variables and the corresponding observation variables. The calibration results of SEM are listed in Table 6. It can be observed that the coefficients of latent variables associated with the corresponding observation variables are all greater than 0.85 (the minimum value is 0.858), which indicates that the observation variables have superior capacity in representing the latent variables. Moreover, the absolute values of correlation coefficient between different latent variables exceed 0.5, indicating that there is robust association. Note that the positive values of coefficient denote the positive relationship between latent variables, and the negative values represent the negative relationship. For example, the coefficient between reliability and convenience is −1.397, indicating that the convenience will decrease when the reliability increases. In addition, all values are below 0.05, revealing that the calibration results are justifiable.

To further verify the calibration results from SEM, the chi-square (), the goodness of fit indices (GFI), root mean square of approximation (RMSEA), comparative fit index (CFI), Tucker-Lewis index (TLI), and incremental fit index (IFI) are selected as evaluation metrics, and the results are listed in Table 7. It can be seen that all the values of evaluation metrics meet the respective threshold, illustrating that the constructed SEM is well-calibrated and can effectively capture the potential relationship between latent variables and observation variables. Note that the threshold of varies with the sample size.

4.2. Factors Analysis of Travel Mode Choice

The MXL-BMA model is applied to explain the characteristics of cross-border travel behavior, and Occam’s window method is utilized to remove the complex models with poor performance. The results are shown in Table 8. Four high-performance MXL models (denote as model #1, model #2, model #3, and model #4) with posterior probability larger than 0.05 are selected, and the total posterior probability is 96.7%. Notably, model #1 has the highest posterior probability of 56.8%, suggesting the presence of model uncertainty in analyzing travel behavior.

Taking the travel mode of the cross-border taxi as an example, the parameters of travel modes for cross-border bus and cross-border private car are estimated, and the fitting results of the four high-performance models are listed in Tables 9 and 10. It should be mentioned that the variables with a slight impact on travel modes are not listed in the tables. The measured coefficient for RP/SP data fusion is calculated by equation (2) under the models based on RP data and SP data. The value is 0.245, which is in the range [0, 1], indicating that the proposed choice models for cross-border travel match the utility maximization principle. Note that the parameter estimates of SP variables are revised values, which are multiplied by the measured equation (12) of RP/SP data fusion.

For socioeconomic attributes of travelers, gender has a significant effect on the cross-border bus. The coefficient is negative, indicating that female travelers are more inclined to choose buses for cross-border travel. Travel purpose shows a noticeable impact on cross-border private car, tourists tend to take cross-border bus (the coefficient is negative). The transportation mode from origin to Zhuhai plays a salient role in cross-border travel, with the choice behavior of bus and private car being particularly pronounced. Specifically, the coefficient for the transportation mode of bus in cross-border bus travel mode is 4.272, while that of private car in cross-border private car travel mode is 5.075. For the transportation mode from Hong Kong to destination, the travelers who prefer to take the bus or subway are more inclined to choose cross-border bus travel mode, which is consistent with the transportation mode from origin to Zhuhai. As for the travel mode of cross-border private car, the choice of transportation mode is consistent with the last distance (i.e., from origin to Zhuhai), which tends to take a private car for cross-border travel. In addition, the travel time from Hong Kong to destination shows a more significant influence on travel mode, particularly for cross-border private car travel mode, with a coefficient of 0.238, suggesting that time-conscious travelers intend to consider using a private car for cross-border travel. Moreover, whether traveling together between Hong Kong and Macao and the number of transfers significantly affect the choice of cross-border bus travel mode.

For the variables of the travel scenario, the travel time, travel cost, and waiting time show a significant impact on travel mode choice behavior, while having a negligible effect on parking time (or parking cost). The coefficient of travel time is −0.149, indicating a more pronounced impact on travel mode choice behavior than travel time and waiting time. This suggests that travelers are more sensitive to travel time when deciding on travel mode. Additionally, the travel scenario preference reflects the dependence effect with a coefficient of −0.894 < 0. This illustrates a noticeable difference in travel behavior between the SP and RP scenarios, implying that travelers tend to seek alternative cross-border travel modes when the services and policies of the current travel mode vary.

For latent variables, safety, comfort, and convenience have a noteworthy impact on cross-border travel choice behavior. Furthermore, the coefficient of safety for travel mode of cross-border bus is positive, indicating that travelers pay more attention to safety for cross-border travel and tend to take buses. In addition, the coefficients of comfort are positive for both travel modes of cross-border bus and cross-border private car, with a higher coefficient value for cross-border private car. This reflects that travelers who prioritize comfort tend to choose private cars for cross-border travel. The variable of convenience has a significant impact on all travel modes, as supported by both negative coefficients. This indicates travelers who choose the travel mode of cross-border bus place a higher emphasis on convenience than the travelers who take cross-border private cars. To sum up, convenience shows a more remarkable impact on cross-border travel mode choice behavior compared to other latent variables.

4.3. Discussion

The BMA approach integrates the outputs of candidate models into a certain model to improve the interpretability of travel mode choice behavior. For quantitative evaluation of the fitting performance of MXL-BMA model and the MXL models in the model space, the Bayesian information criterion (BIC) and Akaike information criterion (AIC) are considered as the metrics, the results are listed in Table 11. The goodness of fit results demonstrate that the MXL-BMA model provides a superior description of cross-border travel behavior compared to MXL model. Notably, the MXL model selected in this section is the one with the best-fitting performance among the candidate models.

To comprehensively compare the explanatory performance of MXL-BMA model and MXL model in analyzing cross-border travel mode choice behavior, the variables with a high contribution to utility function (i.e., has a great impact on travel mode choice behavior) and show a larger difference between the two models are taken as examples to analyze, that is the transportation modes (i.e., bus and private car) from origin to Zhuhai and from Hong Kong to destination (i.e., bus, private car, and subway), as shown in Table 12. It can be observed that the parameter coefficients of MXL-BMA model are more effective than those of MXL model in capturing the influence of connecting transportation modes on cross-border travel behavior. For the MXL-BMA model, the transportation mode of private car from the origin to Zhuhai shows the difference from other transportation modes, indicating that travelers who choose the cross-border bus travel mode do so due to the absence of a private license plate to pass through the HZMB. As for the transportation mode of the bus from Hong Kong to destination, MXL-BMA model exhibits a 7.5% increase in coefficient compared to MXL model for cross-border private car travel mode, while showing a 4.5% decrease for cross-border bus travel mode. Moreover, regarding the transportation mode of subway, the coefficient of MXL-BMA model increases by 8%, indicating that MXL-BMA model provides a more accurate representation of the influence of the developed public transport system of Hong Kong on cross-border travel. In addition, the coefficient difference of other variables between the two travel modes is small, which affects that they have a comparatively lower impact on the overall utility function.

5. Conclusions

This study proposed an MXL-BMA model to describe the cross-border travel behavior of HZMB travelers using RP/SP survey data. An SEM model is constructed to explore the potential correlations between latent variables, and the BMA approach is applied to model the MXL model for explaining the cross-border travel mode choice characteristics. The main conclusions are summarized as follows:(1)The MXL-BMA model effectively captures the cross-border travel mode choice behavior on HZMB. Specifically, the number of transfers and the connecting transportation modes during cross-border travel on HZMB exhibit a significant influence on cross-border bus travel mode, whereas time-conscious travelers tend to select the cross-border private car. To sum up, the transportation mode arriving and departing the HZMB has a noticeable impact on cross-border travel modes.(2)The MXL-BMA model demonstrates a superior ability to explain cross-border travel mode choice behavior compared to the MXL model. Furthermore, for the section from origin to Zhuhai, the MXL-BMA model reveals the travelers who choose the travel mode of cross-border bus instead of the private car due to restricted access to HZMB. Conversely, for the section from Hong Kong to destination, the MXL-BMA model can more accurately explore the characteristics of the travelers who choose the private car.

Note that the dataset used in this study is collected from a questionnaire survey online, which may not entirely represent the actual travel behavior. For future work, the data collection range can be expanded to improve data quality, and machine learning-based methods (e.g., Bayesian-related approach and decision tree-based approach) can be applied to explore cross-border travel mode choice behavior.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Data curation was performed by Bo Lin; Formal analysis was performed by Yajie Zou, Wanbing Han, and Shubo Wu; Funding acquisition was performed by Bing Wu and Linbo Li; Investigation was performed by Bo Lin and Malik Muneeb Abid; Methodology was performed by Yajie Zou, Wanbing Han, and Bo Lin; Supervision was performed by Yajie Zou, Bing Wu, and Linbo Li; Validation was performed by Wanbing Han, Bo Lin, and Malik Muneeb Abid; Yajie Zou, Bo Lin, and Shubo Wu wrote the original draft; Yajie Zou, Wanbing Han, and Malik Muneeb Abid reviewed and edited the manuscript. All authors reviewed the results and approved the final version of the manuscript.

Acknowledgments

This research was funded by the National Key Research and Development Program (2019YFB1600703) and the Shanghai Science and Technology Committee (19210745700).