Abstract
Accurately predicting passenger flow at rail stations is an effective way to reduce operation and maintenance costs, improve the quality of passenger travel while meeting future passenger travel demand. The improvement of data acquisition capability allows fine-grained and large-scale built environment data to be extracted. Therefore, this paper focuses on investigating the relationship between the built environment around the station and the station passenger flow and discusses whether the built environment data can be applied to the station passenger flow prediction. Firstly, the evaluation system of station passenger flow influencing factors is built based on multisource data. The inner relationship between built environment factors and station passenger flow is investigated using the Pearson correlation analysis. Based on this, a multilayer perceptron (MLP)-based passenger flow prediction model was developed to predict the passenger flow at key stations. The study results show that the built environment factors impact station passenger flow, and the MLP prediction model has better prediction accuracy and applicability. The results of the study can be applied to predict the passenger flow scale of rail stations without historical passenger flow data and thus are also applicable to new rail stations.
1. Introduction
With the rapid economic development, urban transportation demand is growing and motor vehicle ownership is increasing, but due to the limited urban construction area, the imbalance between transportation supply and demand can cause traffic congestion and safety problems in large cities, which reduces the quality of travel for residents. Big cities have chosen to vigorously develop rail transportation to solve the aforementioned problems.
Urban rail transit has the shortcomings of high operation and maintenance costs and cannot be developed indefinitely. Usually, the passenger flow prediction work of existing stations utilizes the historical passenger flow data of existing stations. Once the nature of the land or population distribution around the station changes, the historical passenger flow data of the station does not play a decisive role in the future passenger flow scale prediction, so it is necessary to establish a direct relationship between built environment data and passenger flow. With the gradual maturity of the application of big data technology, the acquisition of refined and large-scale building attributes, population characteristics, and other data becomes possible, which provides new ideas for the station’s passenger flow prediction. Exploring the relationship between the built environment around the site and the passenger flow, and predicting the passenger flow of the site through the built environment data is the way to accurately grasp the scale of the site's passenger flow and maximize the reduction of operating costs.
In recent years, researchers have conducted in-depth research on urban rail transit passenger flow prediction, mainly focusing on the analysis of rail transit station passenger flow influencing factors, rail transit station passenger flow prediction objectives, and methods.
At present, the factors influencing rail passenger flow mainly focus on the following three aspects: population characteristics, transportation facilities, and location factors. The current status of passenger flow influencing factors research is shown in Table 1.
In the analysis of factors affecting rail traffic passenger flow, researchers have paid most attention to location factors, mainly focusing on land use and establishing weighted regression models of geographic factors through GIS technology, which is the current hotspot in the analysis of factors affecting rail passenger flow [6]. However, current studies rarely consider all three types of factors (demographic characteristics, transportation facility categories, and location factors) at the same time and usually consider only one or two of them, which cannot comprehensively reflect the reasons why passengers choose rail transportation to travel, leading to deviations in the prediction of station passenger flow.
The accuracy of metro station passenger flow prediction determines the construction scale and internal structure of the station, which affects the operation of the station and the travel experience of passengers. At present, rail transit station passenger flow prediction methods mainly include the four-stage method and the direct prediction method [7].
The four-stage method is a set-count prediction method consisting of the following four components: traffic generation, traffic distribution, traffic mode division, and traffic assignment, which was first proposed in the Chicago Regional Transportation Study and is one of the commonly used methods for passenger flow prediction at present [8]. However, this method counts and analyzes the traffic behavior of each traveler according to the traffic zone, and when applied to the more microscopic scenario of a rail station, it cannot accurately grasp the relationship between the built environment around the station and the passenger flow, and the prediction accuracy cannot be guaranteed [9], and the method itself does not have a high degree of accuracy, and very small errors are passed to the next step, which may cause greater deviations in the prediction results [10].
The direct demand model (DDM) is a study that takes rail stations as the object, considers the economic and built environment around the station and other elements, and explores the quantitative relationship between them and the passenger flow to predict the future traffic scale. Compared with the four-stage method, DDM is simpler in operation and more accurate in prediction [11] and can be implemented by linear regression models [12], geographically weighted regression (GWR) [13], artificial neural networks (ANN) [14], and k-nearest neighbor (KNN) [15]. The linear regression model is a regression model that characterizes the correlation between variables through a linear functional relationship [16]. Currie et al. [17] conducted a linear regression analysis of the relationship between rail passenger flow and occupational density, car ownership, station level of service, and fares to establish a predictive model for rail passenger flow, and the results of the study showed that the station level of service is the main driver of passenger travel. GWR is a relatively microscopic spatial analysis method that can elucidate the local travel characteristics within the study area [18]. Cardozo et al. [19] used a geographically weighted regression model to predict inbound rail stations in Madrid and compared the prediction results with those of ordinary least squares (OLS) and found that GWR has a higher prediction accuracy than OLS, indicating that the spatial analysis technique can be applied as a direct prediction model in the prediction of passenger flow at rail stations. The k-nearest neighbor is a method that classifies every record in a dataset. Based on KNN, Bai et al. take the trend factor and time interval factor of passenger flow into consideration, to reduce the risk that the original method has fewer evaluation criteria in the matching process [20]. Multilayer perceptron (MLP), a commonly used artificial neural network with features such as adaptive and real-time learning, can be applied to traffic prediction [21–23]. Lin et al. [24] considered the spatial correlation between passenger flows from the perspective of single stations and the whole network, respectively. The prediction of metro passenger flow and bus passenger flow was performed using MLP.
With the deepening of the metro network, the attributes of metro stations and the nature of the surrounding land are also changing, so there is an interaction between passenger flow and land use. Existing studies rarely explore in depth the degree of influence of different variables on station passenger flow, which may have an impact on the prediction accuracy with the change of land use nature. For new urban rail transit stations, once built, it is difficult to adjust the station scale and internal layout again. If the actual passenger flow of new stations is underestimated, it will lead to wasted construction costs and high operation and maintenance costs; if the actual passenger flow of new stations is overestimated, it will not meet the travel demand of surrounding passengers and cause station congestion, resulting in long passenger queuing time, lower passenger satisfaction, and poor service level. In addition, few studies can make more comprehensive forecasts of passenger flow from different time dimensions. Therefore, for rail operators to accurately predict the passenger flow scale of newly opened stations without historical data, a direct passenger flow prediction model with built environment data is needed. The main work of this paper is as follows:(i)Obtaining data on the nature of land use, residential and working population, and the number of other transportation connections around the station, building an evaluation system of the station passenger flow impact factors, and analyzing the impact of each factor on the passenger flow of the rail transit station.(ii)To build a multilayer perceptron-based passenger flow prediction model for rail transit stations, to predict the average daily passenger flow and peak hourly passenger flow of stations under two scenarios: weekdays and nonweekdays and to compare the prediction results of multiple regression models, k-nearest neighbor and radial base neural networks to analyze the prediction accuracy.
The rest of the research in this paper is presented as follows: Section 2 presents the data sources of this study and a correlation analysis of the factors influencing urban rail traffic. Section 3 develops a station prediction model based on a multilayer perceptron. Section 4 discusses the feasibility of the prediction model. Section 5 summarizes the contributions and limitations of this paper, as well as the outlook for future work.
2. Methodology
2.1. Influence Factor Selection
Combining the current status of domestic and international research, four guideline layers of land development intensity, station connectivity characteristics, station surrounding population characteristics, and other transportation connections are selected to explore the influence of each factor on station passenger flow through seven calculated indicators.
2.1.1. Land Development Intensity
It is the ratio of the total area of building land to the area of a certain region. The higher the land development intensity of a certain region, the greater the attraction of the region to the population, and therefore the greater the passenger flow of the regional rail stations.
2.1.2. Station Connectivity Characteristics
This refers to the degree of access a station has to the entire rail network. The commonly used indicators include median centrality, proximity to the center, the number of connected stations, etc. The number of connected stations refers to the number of other stations connected to a station. It is generally believed that the accessibility of interchange stations is higher than that of noninterchange stations because interchange stations can change lines and directions. Therefore, the passenger flow is generally higher than that of noninterchange stations.
2.1.3. Population Characteristics around the Station
Population density and residential or working population density are mainly considered. The residential or working population density is a group of people with stable commuting needs who need to travel between two places during working days, which will generate a certain amount of rail traffic demand.
2.1.4. Other Transportation Connections
The density of shared bicycle connections around the station and the density of bus stop connections around the station are mainly considered. It is generally believed that the more shared bikes or buses that stop within the acceptable walking range around the station, the more convenient it is for passengers; otherwise, passengers are likely to choose to get off at other metro stations with more convenient connections.
2.2. Prediction Model Based on Multilayer Perceptron
2.2.1. Model Construction Ideas
The prediction objectives of the model are the average daily passenger flow of the station and the peak hourly passenger flow of the station, and the data content is shown in Table 2. The independent variables are the built environment factors affecting passenger flow as the model’s input parameters, and the data content is shown in Table 3. Two hidden layers are set to build the prediction model of urban rail transit station passenger flow based on multilayer perceptron, and the model structure is shown in Figure 1. The built environment data and passenger flow of the whole station are used to train the model and predict the station passenger flow.

2.2.2. Model Principle and Parameter Setting
The model has four layers: the input layer, hidden layer 1, hidden layer 2, and output layer, in which there are 9 neurons in the input layer and 1 neuron in the output layer, and the number of neurons in the hidden layer is calculated according to the empirical formula in (1), and the number of neurons in the hidden layer of the model is determined as 16.where p is the number of neurons in the hidden layer and n is the number of neurons in the input layer, which is calculated as 15 and is generally taken as an integer power of 2 according to experience, so the value is 16.
The 7 neurons in the input layer correspond to 7 independent variables, and the input vector of the mth orbit station is Xm = (xm1, xm2, xm3, xm4, xm5, xm6, xm7) with 288 sets of input vectors. The weight of the connection between the ith node in layer k−1 and the jth neuron in layer k is , and the threshold of the ith neuron in layer k is . The model generates initialized weight values and thresholds for normal distribution, where the weight values are in the range (0, 1). ( a1(k−1), a2(k−1), a3(k−1), ..., a (k−1) ) is the output data of the k−1th layer in addition to the input layer, and the neurons of each layer are weighted to sum the data input from the previous layer, as shown in equation (2); then, the output is passed through the activation function, f denotes the activation function, and the activation function is generally set as a nonlinear function. The activation function is generally set as a nonlinear function, which can add a nonlinear transfer function to the model. In this paper, the Sigmoid function is chosen as the activation function, as shown in Equation (3), and is the output of the jth neuron in the kth layer.
The MLP model is usually trained using the error backpropagation (BP) algorithm for the weights and thresholds of each neuron in the model. The principle of the BP algorithm is to calculate the error of the output layer according to equation (4) and distribute the error backpropagation to all the neurons in each layer and iterate the thresholds and weights through (5) and (6). The operations given previously are repeated until the iteration is stopped when any of the following conditions are satisfied, at which time the result of the output layer is the final output result.
Termination conditions are as follows:(1)The training time exceeds 20 minutes(2)The number of iterations exceeds 600(3)The error E of the training set is within 10% where n is the number of samples, is the true value, and is the output of the MLP model. where and are the threshold and weight before correction; and are the threshold and weight after correction; η is the learning rate, taking values between (0, 1), and in this paper, we take η = 0.0001.
2.2.3. Compare Model Parameter Settings
The MLP model is compared and analyzed with three benchmark models, RBF, KNN, and multiple linear regression, with the following parameter settings.
The construction idea of RBF is similar to that of the MLP model, but only one hidden layer with 16 neurons is set, the radial basis function (RBF) is used for the transfer function, and the rest of the input and output layers are set in the same way as the MLP model.
The hyperparameter n_neighbors of the KNN model is obtained by iterating through 1 to 10 to obtain the optimal value, and the Euclidean distance is used to calculate the intersample distance. The weight of the distance is expressed as the reciprocal to achieve the prediction of the target.
The multiple linear regression model uses the principle of least squares to find the regression equation, and the regression equations for different scenarios are shown in Equations (7)–(10).
For the workday scenario, when the forecast target is the average daily traffic,
For the weekday scenario, when the forecast target is peak hour traffic,
For nonworking day scenarios, when the forecast target is average daily traffic,
For nonworking day scenarios, when the forecast target is peak hour traffic,where x1 is the building density around the site, x2 is the number of connected stations, x3 is the total population density around the site, x4 is the residential population density around the site, x5 is the working population density around the site, x6 is the shared bicycle connection density around the site, and x7 is the bus stop connection density around the site.
3. Beijing Metro Station Passenger Flow Forecast
The metro network in 2017 was selected as the study case, and 90% of the station sample data were randomly selected as the training set and 10% as the prediction set. The prediction target for each station can be derived from the independent variables of all stations in the prediction set. Each model is trained three times, and the one with the best training accuracy is selected as the prediction model. For the prediction results of individual stations, a few typical representative stations in the prediction set are selected for a detailed description.
3.1. Data Sources
3.1.1. POI Data
Point of interest (POI) usually refers to geographical objects that can be abstracted as point markers. This study uses 2017 rail transit data, and the heat map of POI data distribution in Beijing after processing is shown in Figure 2, and the distribution of POI by administrative districts in Beijing is shown in Figure 3. Overall, the POI points in Beijing are concentrated in the central urban areas, and the suburban areas are more sparsely distributed. Among them, Chaoyang District has the highest number of POI points, with 296117 points, and Mentougou District has the lowest number of POI points, with only 14997 points. The density of POI points in the east and west urban areas is the largest, with 1592.329243/km2 and 1557.074033/km2, respectively, the most densely distributed, while the density of POI in Mentougou District, Yanqing County, Huairou District, and Miyun County is lower, with less than 20/km2.


3.1.2. AFC Data
AFC data has the advantages of fine data, large scale, and fast system update and has been better applied in the research of passenger flow prediction and residents’ travel behavior patterns [25]. The data used in this paper are the AFC data for a total of 11 days from April 12 to 22, 2017. A schematic diagram of the metro network in 2017 is shown in Figure 4, with 19 lines and 288 operating stations. The data recorded by the AFC system have a total of 19 fields, and the commonly used key fields are shown in Table 4. There are eight working days and three nonworking days in the selected 11 days, and the average daily data entries are about 6 million for working days and 4 million for nonworking days.

This study counted the daily passenger flow of the whole metro station from April 16 to April 22, 2017, and it can be seen in Figure 5 that the network-wide passenger traffic from Monday to Thursday did not vary much, remaining around 5.48 million, and Friday increased compared to the previous four days, with the number increasing to around 5.74 million. The network-wide passenger flow on weekends decreased significantly compared to weekdays, by 3.7 million and 3.35 million, respectively. It can be seen that the passenger flow at rail stations has different characteristics compared to nonworking days, indicating that commuter travel is the main service target of Beijing’s rail transit system.

Passenger flow also has different characteristics from the station’s perspective. For example, the size of the residential population around the residential station will have an impact on the scale of passenger flow [26], and the intensity of land development around the station also makes a difference in the passenger flow attractiveness of the station [27], and the comparison of the average daily passenger flow between the interchange and noninterchange stations is shown in Figure 6, where the average daily passenger flow of interchange stations is generally higher than that of noninterchange stations.

3.1.3. Population Data
Baidu Wise-Eye Population Data is a commercial geographic intelligence data platform launched by Baidu Maps based on the massive location big data, geographic big data, and road condition big data sources of Baidu Maps, which are capable of being mined to get high accuracy and wide coverage of the residential and employment-population distribution. The processed population distribution of Beijing in 2017 is shown in Figures 7–9. On the whole, the resident population of Beijing is concentrated in the central urban areas, and the suburban areas are more sparsely distributed. Among them, Chaoyang District and Haidian District have the largest resident populations, 2581997 and 2336486, respectively, while Yanqing County and Mentougou District have the smallest populations, both not exceeding 220,000; Xicheng District has the largest population density of 17862.60774 persons/km2 and is the most densely distributed, while Yanqing County has the lowest population density of fewer than 2000 persons/km2.



3.2. Analysis of the Relationship between Influencing Factors and Passenger Flow
3.2.1. Calculation Results of Impact Factor Indicators
To facilitate statistics and calculations, the station impact range is represented as a circle with a uniform radius, and the circle’s center is the rail station’s location. Because passengers generally reach the endpoint by walking after arriving at a certain rail station or walking to find shared bikes and bus stops, only the acceptable range of walking connection is considered in the selection of the radius, and the value is generally 800 m as the researcher derived from the survey [28, 29], and the overlapping part of the circular buffer is divided based on the Tyson polygon, which makes the final result without overlapping area. The influence range of the 800-meter radius of the Beijing rail station is shown in Figure 10.

The building density around the stations is shown in Figure 11. The calculation results show that the central city is generally higher than the suburbs, and the stations with the highest building density are Jinsong Station, Unity Lake Station, Wangjing Station, and Futong Station, and the nature of the land is mainly commercial service facilities.

The number of station connections is shown in Figure 12. The visualization results show that the interchange stations of the Beijing metro line network are mainly used for the connection of loop and radial lines without obvious aggregation characteristics, and the number of station connections is mostly four, connecting four directions.

The population characteristics around the stations are shown in Figure 13. The population density in the central city of Beijing is higher than that in the suburbs, among which the residential population is mainly concentrated in the southern area of the city, while the working population has a higher density in the northern part of the city, thus generating certain commuting demand.

(a)

(b)

(c)
Other transportation connections are shown in Figure 14. From the results, it can be concluded that Beijing’s rail stations are relatively well connected, with most stations having more than 225 bike-sharing connections and more than 20 bus stop connections.

(a)

(b)
The daily average and peak hour passengers are the prediction objectives of this paper, reflecting the station’s passenger flow in two-time spans: all-day and peak hour, respectively. The average daily passenger flow reflects the overall level and scale of passenger flow at a station, while the peak hour passenger flow reflects the number of passengers a station needs to accommodate in a short period. The average weekday daily passenger flow, average nonworkday daily passenger flow, weekday peak hour passenger flow, and nonworkday peak hour passenger flow of Beijing rail stations are shown in Figure 15.

(a)

(b)

(c)

(d)
3.2.2. Correlation Analysis
The results of the Pearson correlation analysis are shown in Figure 16.

(a)

(b)
The correlation analysis shows that the correlation indexes of each indicator and passenger flow are above 0.3, which are all correlated, and the correlation between the indicators and passenger flow on weekdays is generally higher than that on nonworking days. Comparing the relationship between the daily average passenger flow and peak hour passenger flow with each indicator, the correlation indexes of the two are relatively close.
In both scenarios, the strongest correlation is the density of shared bicycle connections around the station, indicating that more travelers use shared bicycle connections. The higher the station traffic, the more people enter and exit the station, and more shared bicycles are needed near the station, and conversely, the number of shared bicycles has an impact on passengers’ choice of rail station travel. The lowest correlations for the workday scenario are residential population density and bus stop connection density around the station. The nonworking day scenario has the lowest correlation of working population density.
Overall, the high correlation between the three calculated indicators of population characteristics is due to the calculation method of the data, and there is a certain linear relationship between several data points. The lowest correlation is between the number of connecting stations and each other indicator, and there is no strong correlation between the remaining data. Therefore, these nine indicators are consistent with the multiplicity of factors.
3.3. Predicted Results
3.3.1. Accuracy Analysis of Model Prediction
The evaluation metrics of model training accuracy use Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), and R-squared (R2), and the calculation formula is shown in (11–14).
For the training accuracy of the four models, as shown in Tables 5 and 6, the results of the 10% prediction group (a total of 28 groups of data) were selected to analyze the model training accuracy.
Comparing the training accuracy of the four models, the MLP model has a better training effect under the four conditions. The MLP model has the best performance in predicting average daily ridership, and the multiple linear regression model has the best performance in predicting peak hour ridership. Although the advantage of the MLP model in MAE, RMSE, and MAPE is not obvious, R2 is better than other models, indicating that the MLP model has a good fitting effect.
By comparing the training accuracy of the two prediction targets of average daily passenger flow and peak-hour passenger flow, the error of the four models in predicting peak-hour passenger flow is slightly higher than that of average daily passenger flow, but the difference is not obvious. Therefore, the four models apply to the two prediction targets.
By comparing the training accuracy of working day and nonworking day scenarios, it is found that the training accuracy of the RBF model in nonworking day scenarios is slightly higher than that in working day scenarios, the multiple linear regression model has better performance in working day scenarios, MLP and KNN models have no significant difference in the accuracy of the two scenarios, indicating that multiple linear regression is more suitable for the prediction of working day scenarios with more regular data. The other models are suitable for both scenarios.
3.3.2. Typical Site Prediction Results
In this paper, among the 28 Beijing rail transit stations in the prediction set, typical representative stations are selected according to the scale of passenger flow, surrounding land use, and population distribution of each station as the objects of accuracy verification of the prediction model, namely, Wangjing Station, Xuanwumen Station, Wangfujing Station, Sanyuanqiao Station, and Sun Palace Station.
The absolute error (AE) and relative error (RE) between the predicted and real values are used to evaluate the accuracy of the model, and the calculation formulae are shown in (15) and (16).where n is the number of samples, is the true value, and is the output of the MLP model.
The comparison of the prediction accuracy of the four models for track stations is shown in Tables 7 and 8. Overall, the MLP model has a good training effect in all four cases. The MLP model performs better overall in the prediction of each type of station; the prediction accuracy of the RBF model and KNN model are highly volatile and slightly less stable; the multiple linear regression model has significantly higher prediction accuracy for weekdays than for nonworking days.
Comparing the prediction accuracy of passenger flow under two scenarios, weekday and nonworkday, the prediction accuracy of the four models is generally higher than that of nonworkday for the weekday scenario. This result is related to the commuting behavior on weekdays, where passengers have more definite travel demand and travel to and from their residence and workplace with more regularity, while the travel on nonworkdays is more random, so the predicted passenger flow from the built environment data may have some deviation.
Comparing the training accuracy of the two prediction targets of daily average passenger flow and peak hour passenger flow, all four models have slightly higher errors in predicting peak hour passenger flow than daily average passenger flow because individual stations do not necessarily have peaks in passenger flow time variation, so the selected peak hours may not be representative.
The comparison of prediction results of the Wangjing station is shown in Figure 17. Among the prediction results, the MLP model performs the best in terms of stability and prediction accuracy, and the error is less than 30%; the prediction results of the multiple linear regression model and KNN model are less accurate but more stable; the prediction accuracy and stability of the RBF model are poor, and the prediction error ranges from 16.77% to 70.34%.

The comparison of prediction results of Xuanwumen Station is shown in Figure 18. In the prediction results, the MLP model and the RBF model perform better, with prediction errors below 40%, while the remaining two comparison models are less stable, with errors greater than 50%. The four models have the largest errors in the prediction results of nonworking day peak hour passenger flows.

The prediction results of Wangfujing station are shown in Figure 19. Among the prediction results, the MLP model has the highest accuracy, with all errors below 20%. However, all models have no significant advantage in predicting passenger flow during peak hours, especially during nonworking day peak hours, which is related to the time distribution characteristics of the stations. Most of the prediction results of the station are lower than the actual values because the surrounding stations have large shopping malls, offices, and commercial areas, where the passenger flow is higher than expected, and there is randomness in travel.

The prediction results of Sanyuanqiao station are shown in Figure 20. In the prediction results, the prediction accuracy of the MLP model for the daily average passenger flow is better than that of the comparison model; in terms of the peak hourly passenger flow, the prediction results of the MLP model and the comparison model are different, and the prediction error of the MLP model is at a medium level.

The prediction results of the Sun Palace station are shown in Figure 21. In the prediction results, the prediction accuracy of the MLP model is the best in the weekday scenario and slightly inferior to the comparison model in the nonweekday scenario, but the overall prediction errors are all below 15%. The prediction accuracy of this station is also the highest among all stations because the average daily passenger flow of Sun Palace station is about 40,000–50,000, and the passenger flow is characterized as “bimodal,” which is more common in Beijing’s rail transit system, so there are more samples of this type during training and the prediction accuracy is higher.

4. Discussion
In the decision-making stage before the line or station is put into construction, the forecast results of the average daily passenger flow can be used as a reference for the scale of station construction and line use models, and subsequently, the impact of new lines or stations on existing lines or stations can be analyzed on this basis.
The peak hourly passenger flow reflects the number of passengers a station needs to accommodate in a short period, and the prediction results provide a theoretical basis for designing station facility parameters such as station escalator width to avoid crowding caused by actual passenger flow exceeding the designed passenger flow or waste of resources caused by actual passenger flow much smaller than the designed passenger flow, to achieve the goal of reducing rail transit operation and maintenance costs and improving passenger travel quality.
5. Conclusion
Based on mining Beijing POI data, Beijing rail transit AFC data, and Baidu Wise-Eye population data, the article establishes a passenger flow prediction model based on multilayer perceptron for predicting the average daily and peak hourly passenger flow of stations and verifies the accuracy of the prediction results by taking Wangjing Station, Xuanwumen Station, Wangfujing Station, Sanyuanqiao Station, and Sun Palace Station as examples. The prediction results of the RBF model, KNN model, and multiple linear regression model are compared to verify the effectiveness of the MLP model. The main research results and conclusions are as follows:(1)A total of seven indicators in four aspects, namely, land development intensity, station connectivity characteristics, station peripheral population characteristics, and other transportation connections, are explored to influence the degree of rail transit station passenger flow. The results show that the correlation between each indicator and weekday passenger flow is generally higher than that on nonworking days, and the indicator with the highest correlation is the density of shared bicycle connections around the station, indicating that more travelers use shared bicycle connections. The higher the passenger flow of the station, the more people enter and exit the station, and more shared bikes are needed near the station, and conversely, the number of shared bikes will have an impact on passengers’ choice of rail station travel.(2)The MLP-based rail station passenger flow prediction model is constructed, and three comparison models, the RBF model, the KNN model, and the multiple linear regression model are set up to analyze the prediction results of typical stations. The training results of the RBF model fluctuate at individual stations and have a large degree of dispersion, thus causing poor stability in station prediction as well.(3)The prediction of typical stations with representative predictions shows that there are differences in the prediction accuracy of different stations. The Wangjing and Sun Palace stations with the highest prediction accuracy are both characterized by an average daily passenger flow of 40,000 to 50,000 people per day, with obvious “bimodal” passenger flow characteristics, which are more common in Beijing’s rail transit system, so the training has more samples of this type and higher prediction accuracy.
However, due to the study’s time, level, and conditions, the paper has some shortcomings, such as the prediction target being a single station and the impact of the new station on the whole network is not considered.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The research was supported by the National Key Research and Development Program of China (grant no. 2020YFB1600703).