Abstract

With the fast development of metro systems in many big cities, it is important to study the characteristics of passenger flows based on metro data for the management to guarantee service quality and safety. In this article, we build statistical models for the data of passengers’ tap-in and tap-out times in both no-transfer and one-transfer cases, and propose a Bayesian approach to estimate parameters in the models. These estimators can be used to evaluate a number of measures, which describe degrees of congestion and comfort, and to quantify their uncertainties. Application of our approach to Beijing metro shows different passengers follow different patterns between different routes and between off-peak and peak hours.

1. Introduction

As urban rail transit systems become more complex, there has grown up a pressing need for the safety and operation efficiency of urban rail transit in big cities. To alleviate the problem of traffic congestion and to meet passenger travel comfort requirements, operation management departments have formulated many programs, such as adding a current-limiting fence and adjusting the train operation status. Therefore, it is very important to analyse the metro passenger flows, especially in peak hours. Such a study can provide technical support for operators to implement reasonable management and efficient guidance, which are crucial to the quality assurance of the metro service and safety.

With continuous improvement of rail transit, automatic fare collection (AFC) systems can collect a wealth of passenger travel information and provide an important data support for metro operation departments to study passenger travel characteristics, formulate passenger flow control measures, and adjust driving plans. In recent years, mining behavior characteristics of passengers from AFC data has become an active branch in the field of passenger flow research (see, e.g., Paul [1]; Kusakabe et al. [2]; Sun and Xu [3]; Zhao and Yao [4]; Zhou et al. [5]; Sun and Schonfeld [6]; Chen et al. [7]; and Wu et al. [8]). For a fixed pair of origin-destination stations, studying behaviors of passengers based on only AFC data is actually a black box problem due to the lack of boarding information, which brings great challenges in accurately describing passenger flows. Probabilistic models are appropriate tools to study this problem. Fu et al. [9]; Lee and Sohn [10]; and Hörcher et al. [11] estimated the posterior probabilities of choosing different routes. Zhu et al. [12] proposed a probabilistic passenger-to-train assignment model to infer the probability of the passenger boarding each feasible train. Zhu et al. [13] inferred left-behind passengers in congested metro systems based on the probabilistic passenger-to-train assignment model. Besides AFC data, more information such as the distribution of access time is needed in these probabilistic models. Zhao et al. [14] developed a method to estimate passengers’ route choice pattern using smart card data. However, their method is based on the assumption that the time that passengers spend to walk between the platform and the origin-destinations entrance/exit is less than the departure interval between two adjacent trains. In practice, this assumption may not hold, especially during peak hours. Leurent and Xie [15] proposed a stochastic model to reposition the distances along platforms during train waiting. Their method requires both smartphone and AFC data, while smartphone data are not easy to obtain in many research studies. With only AFC data, few studies presented accurate probability distribution models. Recently, Xiong et al. [16] have proposed a new passenger-to-train assignment model. By deriving the probability distributions of AFC data, they provided maximum-likelihood estimates of unknown parameters in this model in no-transfer situations.

This article presents a Bayesian analysis of the statistical models of AFC data in both no-transfer and one-transfer situations. Our work extends Xiong et al. [16] in the following two directions. First, Xiong et al. [16] provided only point estimation for the parameters that lacks uncertainty quantification. Instead, we propose a Bayesian inferential method to quantify the uncertainty of parameter estimation. Interval estimation for the parameters and related passengers’ indices can be derived. Second, we show that Xiong et al. [16]’s estimation method cannot be straightforwardly extended to one-transfer situations due to an identifiability problem. With some prior information, our Bayesian method avoids this problem and yields reasonable estimators. Examples including real data applications and simulation results show the effectiveness of the proposed method.

The article is organized as follows. Section 2 presents the Bayesian inferential method for no-transfer situations. Section 3 describes our method for one-transfer situations. Section 4 gives applications of Bayesian estimators in deriving passengers’ characteristic indices. Section 5 conducts simulation studies. Section 6 provides real data examples in the Beijing metro system. Section 7 draws conclusions.

2. Bayesian Inference for No-Transfer Situations

In this section, we focus on a typical metro route, in which the passenger begins to tap in at the origin station and ends in tapping out at the destination station by taking only a train; that is, there is no any transfer station. We make the Bayesian inference for the metro passenger flow including estimating the probabilities of taking trains and the probability density functions of egress time, where egress time is the time it takes to walk to the tap-out fare gate after alighting from the train.

Since millions of people take metro trains everyday in big cities like Beijing, it is not difficult to collect sufficient data from passengers whose tap-in times are almost the same in large stations [16]. Suppose we get the AFC data of passengers whose tap-in times are . Let denote the set of their tap-out times. In addition, we assume that the egress times of the passengers are independently and identically distributed from the log-normal distribution with unknown parameters , where and . A random variable is distributed from if is distributed from the normal distribution with mean and variance . The log-normal distribution is commonly used to describe passengers’ walking times at metro stations [17]. Define

Here, and can be obtained from the automatic vehicle location (AVL) system. Let , and . We need to estimate the unknown parameters . Based on the joint distribution of , the likelihood function can be given byAnd the posterior distribution iswhere the prior distributions of and denoted by and , respectively, are independent.

Here, we choose the prior distribution of as the noninformative prior [18]. For the probability parameters , the Dirichlet distribution [19] is a natural prior. With hyperparameter vector , let , that is,where and . In our problem, it seems not very reasonable to use , which gives uniform priors for . We adopt to use a data-driven method to specify the hyperparameters. Let be the maximum-likelihood estimators of (see Xiong et al. [16]). We use in (4). It should be pointed out that the selection of does not heavily influence the posterior when the sample size is sufficiently large.

We apply the Metropolis—Hastings [20, 21] algorithm to draw random samples from the posterior in (3), and the corresponding Bayesian point and interval estimation for the parameters can be given.

3. Bayesian Inference for One-Transfer Situations

This section discusses one-transfer situations. We first show that the parameters in the likelihood function are not identifiable based on only automated data. Then, we give the Bayesian estimation method with some data-driven priors.

A one-transfer metro trip with two segments can be described by the following four steps: (1) tapping in and walking to the platform at the origin station, (2) waiting on the platform for a train until boarding a wanted train, (3) transiting between platforms and waiting until taking another train, and (4) getting off the train and tapping out at the exit gate (see Figure 1).

Similar to the previous section, suppose the tap-in times of the passengers are . Let denote the set of their tap-out times, and assume that the egress times of the passengers are independently and identically distributed from . Define

Without loss of generality, let the smallest values of in all be 1; that is, and both begin with and . We have and . For example, if , then , and . Let , , and .

Let denote the probability that the passenger takes the th feasible itinerary on segment 1 and the th feasible itinerary on segment 2 for , . We have . Given the observations , the likelihood function iswhere , and . In the maximum-likelihood estimation method that maximizes (4), we can only obtain the estimators of , while the individual are inestimable, . For example, let and . The parameter vector contains four components. By maximizing (4), we can estimate only two quantities, and . The estimators of individual cannot be determined. Therefore, the maximum-likelihood estimation method in Xiong et al. [16] for no-transfer situations cannot be extended to the one-transfer situations.

To overcome the above identifiability problem, we use prior information of the parameters and present a Bayesian inferential approach. Similar to the previous section, let the prior distributions of and be independent. We choose the prior distribution of as the noninformative prior and assign the Dirichlet prior to , that is,where are hyperparameters.

Consequently, the posterior distribution of iswhere is given in (6). It can be seen from (8) that the individual is not completely involved in . The Metropolis–Hastings algorithm can be used to draw random samples from the posterior distribution of each component in the parameter vector .

We now provide a method to specify the hyperparameters in (7). First, we can compute the maximum-likelihood estimators of in (6) for . Next, consider the segment from the transfer station to the destination station, which is a no-transfer route. Let denote the probability that the passenger, whose tap-in time is , takes his/her th feasible itinerary in this no-transfer route for , where can be different from . Based on the automated data on the segment, we can use the maximum-likelihood estimation method in Xiong et al. [16] to give estimators of . For , we modify as . Consequently, we select

4. Applications

This section presents some applications of the Bayesian estimators derived in the previous sections to the analysis of metro passenger flows. First of all, we should point out that the parameters and can be viewed as measures that describe some aspects of characteristics of metro passenger flows. Therefore, our estimators can be used to infer these measures. Furthermore, let denote the boarding time, that is, the time from tap-in to boarding a train; and let denote the transfer time, that is, the time from alighting from the first train to boarding the second train. In the following, we infer and from both population and individual perspectives based on the estimators of and .

4.1. Population Inference

In no-transfer situations, for a passenger whose top-in time is , his/her boarding time is a random variable, which only takes the finite values . It is not difficult to derive the probability distribution, mean, and standard deviation of boarding time asrespectively. With the estimators of in Section 2, we can estimate these quantities.

For one-transfer situations, the probability distribution, mean, and standard deviation of of passengers who tap in at time arerespectively. With the estimators of in Section 3, we can also give estimators of the above quantities. Similar to , the transfer time is also a random variable that takes the finite values . We can derive the probability distribution, mean, and standard deviation of asrespectively, and the estimators of can also be plugged in the above equations.

4.2. Individual Inference

For no-transfer situations, Zhu et al. [12] presented a probabilistic passenger-to-train assignment model (PTAM), which can be used to infer the most likely train that a certain passenger took as long as the parameters in it are estimated. Xiong et al. [16] proposed a modified PTAM (MPTAM) for both no-transfer and one-transfer situations. The parameters in MPTAM are actually those in the likelihood functions (1) and (4), which can be estimated by the proposed Bayesian method.

We first consider no-transfer situations. If the tap-in time and tap-out time of a passenger are and , respectively, then the MPTAM formula gives

With the estimators of and in Section 2, (13) can infer the probability that the passenger took each itinerary. In addition, we can derive the posterior distribution, mean, and standard deviation of boarding time of this passenger asrespectively, and the estimators of in (13) can be plugged in the above equations to give estimated values.

For one-transfer trips, the MPTAM formula is

The posterior distributions, means, and standard deviations of boarding time and transfer time of this passenger arerespectively. Estimators of the above quantities can be computed based on the estimators of and in Section 3.

5. Simulation Study

5.1. No-Transfer Cases

In this section, we conduct a simulation study to evaluate the proposed Bayesian method for no-transfer situations. Consider the combinations of parameters as

We assume that all passengers tap in at time 0, and let , , (unit: minute). According to each parameter combination, we generate tap-out times with and 300, and then use the proposed Bayesian method in Section 2 to estimate the parameters. The R package MCMC is adopted to implement the Bayesian inference. Root mean square errors (RMSEs) over 100 replicates are displayed in Table 1. It can be seen that our method can yield accurate estimation as increases.

5.2. One-Transfer Cases

We next conduct a simulation study for one-transfer cases. The following parameter combinations are considered:

We assume that all passengers tap in at time 0, and let , (unit: minute). Observations of tap-out times are generated according to each parameter combination with and 200. In the proposed Bayesian method, we use the method in the last paragraph of Section 3 to determine the hyperparameters in (7). We first compute the estimators . Since there are no data to estimate , we randomly generate in with . The hyperparameters are set as . The value of describes the distance between and . The lesser the , the more informative the prior.

We also use the R package MCMC to implement the Bayesian inference proposed in Section 3. For and , root mean square errors (RMSEs) of our estimators over 100 replicates are shown in Table 2. As expected, the proposed estimators perform well with small .

6. Case Study

6.1. Two No-Transfer Routes

In this section, we apply our method to analyse several datasets from Beijing metro. The Beijing metro system is complex with 311 stations and 23 lines [22]. We first consider two no-transfer routes in Line 6 of Beijing metro as follows:Route 1from Qingnianlu to Dongdaqiao (4 stops)Route 2from Huangqu to Dongdaqiao (6 stops)

For Route 1, there are 25 and 649 trips collected during off-peak and peak hours, respectively. For Route 2, we have the corresponding data of sample sizes 15 and 525, respectively. All the data were extracted from weekdays in one month of 2019 at tap-in time 7:00 am (peak hour) and 1:00 pm (off-peak hour). We conduct the Bayesian inferential method in Section 2 via the R package MCMC. Table 3 presents the results of estimated parameters. The point estimators (posterior means) and interval limits of credible intervals are reported. From Table 3, we can see that is larger than in Route 1 during off-peak hours, and this is most likely because the passengers’ tap-in times are very close to the departure time of the first train. Besides, during peak hours, the values of probabilities are not highly concentrated on one, which reflects the presence of crowded stations at that time. The origin station of Route 1, Qingnianlu, lies in Route 2, and the passengers in Route 1 are more difficult to get on the train than Route 2 during peak hours, which is reflected by the fact that and of Route 1 are larger than those of Route 2.

Figure 2 depicts the density curves of the egress times, and Table 4 presents some characteristics including means, standard deviations (SDs), and modes of the estimated densities. These results show that there is no big difference in egress time between the peak and off-peak hours for both Route 1 and Route 2. This is because Dongdamen, the destination station of the two routes, is a relatively unpopular station in Line 6. Thus, even during peak hours, this station is not so crowded. The passengers on the crowded trains are more likely to get off at the next stations in Line 6.

Besides parameter estimation, as mentioned in Section 4, we can infer boarding times of the passengers. Table 5 presents the means and limits of credible interval of boarding times. We can see that the boarding times of peak and off-peak hours are close in Route 1, and this is because the headway between every two trains during off-peak hours is longer than that during peak hours. On the other hand, the boarding times in Route 1 are longer than those in Route 2, which is consistent to the fact that in Route 1 is much less than that in Route 2.

We next present an application of our method in individual inference with MPTAM. We take Route 1 during peak hours for example and letfor . Using (13), we compute the values of for given and show their curves in Figure 3. It can be seen that , and are very close to 1 within a certain period of time. Besides, we find that the uncertainties of them are very small since the uncertainties of the estimators of have little effect on . We also compute the mean boarding times of the passenger whose tap-out time is , , and show its curve in Figure 4. As expected, the curve increases with a ladder-like shape as increases.

6.2. A One-Transfer Route

The AFC and AVL data used here are from a one-transfer route in Beijing metro in 15 weekdays of 2018. The first and second segments of this one-transfer route both contain six stops. We have passengers’ AFC data, and their tap-in times are all 8:00 am. Due to a confidentiality agreement with the organization that provides the data, the original data including the station and train information cannot be open.

The proposed Bayesian method is used to analyse the passengers’ tap-out times, . The hyperparameters are specified by the method in Section 3 based on 52 passengers’ data of the second segment. Table 6 shows the parameter estimation results. We can see that the passengers’ boarding probabilities, , are dispersed. This indicates that there were many left-behind passengers during peak hours because of crowded platforms and train carriages. Table 7 gives the means and limits of 95% credible intervals of boarding times and transfer times. Compared with the train headway (about two minutes), the boarding and transfer times seem too long, and this may reflect uncomfortable travel experience of the passengers.

Similar to the no-transfer situation, we computefor based on our estimators and MPTAM (15). Figure 5 shows their trajectories. For an individual passenger, we can use his/her tap-out time to infer which combination of itineraries he/she may take via the trajectories.

Figure 6 shows the curves of and computed by (9) and (10). They can be used to infer the mean boarding and transfer times of an individual passenger given his/her tap-out time.

7. Conclusion

In this article, we build statistical models for AFC data in both no-transfer and one-transfer cases, and then propose a Bayesian approach to estimate the parameters in the models. These estimators can be used in many measures for describing degrees of congestion and comfort in metro passenger flows. Applications to simulated and real datasets indicate that our approach is effective.

Compared with the maximum-likelihood estimation, the Bayesian method possesses some appealing features including flexibility in small-sample cases, improvement in model identifiability, and quantification of uncertainty. With appropriate prior information, it is a very useful tool for analysing metro data. Based on automated data, we will extend our approach to more complex situations in metro systems such as those with more than one route choices in the future. In addition, the log-normal assumption for the egress time may be violated if the station has more than one exits [16]. A future topic is to apply Bayesian nonparametric density estimation methods to such situations.

Data Availability

The data used in Section 6, which contain the automatic fare collection data and automatic vehicle location data of several routes in Beijing metro, are from income accounting in several subway companies. The authors can use them in scientific research, but are not authorized to make them publicly available.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

Xiong’s work was partially supported by the National Key R&D Program of China (Grant nos. 2021YFA1000300, 2021YFA1000301, and 2021YFA1000303) and the National Natural Science Foundation of China (Grant nos. 12171462 and 11871033). Sun’s work was partially supported by the Foundation Sciences of Beijing Jiaotong University (Grant no. 2020YJS206). Qin’s work was partially supported by the High-level Talents Training Program of Ministry of Transportation of China (Grant no. I18I00010).