Abstract
This paper summarizes the results of an effort aimed at improving train operation schedules on Wuhan-Guangzhou high-speed railway (WG-HSR). The real-record train operation and passenger tickets-booking records of WG-HSR are used for statistical analysis on the train service quality and passenger distribution. More specifically, the train service frequency and interval time at each station are analyzed. Based on this, the temporal and spatial distribution of capacity utilization in each section are investigated. In order to get a holistic view of passenger flow characteristics, the passenger volume during different time periods and between several origin and destination (OD) pairs are investigated to characterize travellers’ spatial-temporal preferences. The passenger distributions on some long-distance trains are shown to get the number and proportion of cross-line passengers travelling on the WG-HSR. Moreover, for a better understanding of the seat capacity utilization of trains, the load rates of trains in various sections and time periods are investigated. Specifically, the relationship between the average load rate of trains and trains’ running distance is explored, finding that the longer the non-cross-line train travels is, the higher the average load rate is. This study provides insightful findings that help understanding HSR operation and conducting further research.
1. Introduction
Demand for HSR transport in China is increasing rapidly. By the end of 2021, the operating mileage of HSR in China had accumulated to 40000 km, ranking the first in the world [1]. By April 30th, 2021, the passenger volume has been over 10 billons in total, accounting for 40% of all the passengers travelling by train [2]. HSR has become a preferred transportation means with the characteristics of comfort, rapidness, convenience, and punctuality rate. For instance, since Beijing-Guangzhou HSR started to operate in 2011, the airline passenger volume from Beijing to Wuhan has dropped by 23.8% [3].
The WG-HSR, as one of the busiest railway corridors in Chinese HSR network, has been transporting a large number of passengers for thirteen years since 2009. With the rapid expand of Chinese HSR network, WG-HSR has gradually become one of the backbone lines in HSR network. It connects other HSR lines; thus, many trains originally departing from other lines may converge on WG-HSR, leading to capacity bottlenecks. On the other hand, the ever-increasing travel demand implies a continuous high pressure on the existing capacity of the railway infrastructure. Railway operators are required to constantly improve train operations on the specified line and make the best possible use of the potential capacity, considering the acceptable levels of train performance, punctuality, and safety. For extensions of capacity at railway stations and in line sections, it is necessary to get a view of the current train operation quality, including train service pattern, capacity utilization, passenger distribution, and load rate of trains. Based on this, more efficient rolling stock usage, train operation, and intelligent traffic management strategies come out.
A systematic analysis of train operation and passenger flow of WG-HSR based on the real record data can result in a better understanding of the current situation of HSR operations, such as the train service frequency of stations, the interval time between trains departure and arrive, the spatial-temporal distribution of passengers, and the seat capacity rates of trains. These measurements are significant for high-speed railway operators to provide effective, efficient, and viable train timetable and services.
In the recent decade, the availability of reliable and accurate data of railway operations gives important opportunities in the railway planning process. It can be used for evaluating timetable performance, providing insight into the stochastic processes of railway operations, evaluating station capacity, detecting bottlenecks, and analyzing the passenger volume distribution. Visualization of train operation data with computer and information technology is of significance, which would help operators better identify the problem in practice operation.
The performance of HSR trains can be evaluated from the perspectives of train operation, passenger distribution, and the load rate. Based on the train operation data and tickets-booking data from China Railway Company, this paper presents the results of a comprehensive analysis of train operations on HSR in China. The objective of this study is to present the results of a comprehensive analysis of train operations on HSR in China, by analyzing the large-scale train operation data, tickets data, and real record timetables from WG-HSR. Thus, we can get an overview understanding of train service frequency, capacity utilization, and passengers’ preferences for various train services and train load rates distribution. The contributions and innovations of this paper can be categorized as follows:(1)The paper proposes a real record data-driven analyzing framework and declares clearly how to conduct research to quantify the performance of HSR operation. These provide theoretical references and insights for other studies or designs. Statistics analysis and data visualization methods are applied in the framework to examine the underlying mechanism of train operation, passenger distribution, and the load rates. Currently, data-driven methods are widely used in the transportation research field based on observation records and the present paper is a good attempt to conduct research on data analysis.(2)Statistical analysis results of real record data can provide a theoretical supplement for capacity evaluation, ticket reservation, and short-term passenger prediction. A detailed description of train operation quality, train capacity utilization, and passenger distribution characteristic is presented. The weaknesses and potential defects in train operation are declared. The results of these issues are helpful for improving the performance of HSR operation from various aspects, such as capacity bottleneck identification, train line planning, train operation trend assessment, and tickets’ allocation.(3)The paper tries the first attempt to comprehensively illustrate the characteristics of WG-HSR operations from a long-term perspective, and it can investigate the evolution process of WG-HSR operation. Real-world operation analysis of WG-HSR is conducted with long-term real record data from multiple sources, involving train timetables, ticket booking, and passenger volumes. Statistical analysis results can reflect the evolutionary change in WG-HSR operation.(4)Train operation analysis can benefit timetable optimization and give insights into how to improve capacity utilization. Train service pattern, the interval time distribution of train event, and the capacity utilization are presented in detail. Particularly, the spatial-temporal distribution of operation trains, the histogram of trains arrival interval frequency, and the uneven of capacity utilization during various time periods are addressed by adopting statistical analysis methods and visualization methods.(5)Passenger flow analysis is helpful for timetable rescheduling, train line planning optimization, and passenger prediction. Enhanced research on passenger flow analysis is conducted from various perspectives. Particularly, the temporal-spatial passenger distribution during one day at stations and sections is addressed. The proportion of long-distance passengers on trains and the uneven distribution of passengers during holidays are identified and visualized. The characteristics of OD passengers along WG-HSR are specially investigated. The multisource data-based passenger analysis method is an innovative attempt to estimate a deep and comprehensive exploration of passenger flow evolution mechanism, which is a theoretical supplement for passenger analysis of high-speed railway.(6)The comparison of load rate concerning diverse trains is investigated. The general relationship between the load rate and the travelling distance of trains is explored, which can detect the weaknesses in train capacity utilization and provide the scientific basis for train capacity allocation issues.
The remainder of this paper is organized as follows: Section 2 presents a brief review of current studies concerning train operation data, passenger travel characteristics, passenger flow forecast, and seats allocation problem. Section 3 outlines the layout of WG-HSR and describes the real record data collected from railway company and provides the analysis framework of this paper as well. In Section 4, a statistical analysis of train operation and capacity utilization is carried out. In Section 5, we get a systematic recognition of passenger flow characteristics, including the spatial-temporal distribution of passengers and the composition of passenger flow in some long-distance trains. In Section 6, the load rate of trains is displayed, and the relationship between train load rate and the travelling distance is discussed. In Section 7, conclusions and further study are discussed.
2. Literature Review
In the recent decade, the possibility to access real record data allows a variety of opportunities to analyze railway operations. Data-driven methods are widely used in transportation research field based on observation records. Some related researches are shown as follows.
As for the researches concerning train operation data, Goverde and Hansen [4] analyzed daily train operation data using a tool named TNV-Prepare. TNV-Prepare data contains signaling and interlocking information of an entire traffic control area. Each train event time includes train description steps, section entries and clearances, signals, and point switches. Several researchers dedicated to the data-driven visualization based on train operation records. Ushida [5] proposed methods called chromatic diagram, figuring out where delays were occurring and how they were propagating at a glance. Graffagnino [6] plotted some graphs by means of mathematical methods and the graphical data visualization method in order to continually optimize the timetable with extensive use of historical train traffic data. Ochiai et al. [7] visualized train traffic record data collected from the web-based train traffic information by the chromatic diagram. Kecman and Goverde [8] presented a process mining tool based on event data records from the Dutch train describer system TROTS, including arrival and departure time at stations, train paths on track section, and blocking time level. Nash and Ullius [9] used Open Time Table to analyze the differences between actual and scheduled timetable data and presented this information in a variety of graphical and statistical formats to improve timetables. Meng and Zhou [10] described the train flow from time and space dimensions by using cumulative flow variable. Prasad and Shekhar [11] evaluated the passenger rail service quality of Indian railways by developing RAILQUAL instrument based on SERVQUAL and rail transport quality. Hansen [12] described the main features of the planned line network, train services, passenger volume, and train operations quality based on the empirical data. Wen et al. [13] used the real record train operation data from WG-HSR to present primary delay distributions, model the primary delay duration distribution, and show the affected trains distribution. Chen [14] conducted a sustainability analysis on the WG-HSR, focusing on the efficiency and equity aspects of this rail line’s direct transportation impacts and indirect nontransportation impacts. Cheng and Chen [15] conducted an empirical analysis in order to assess how the HSR influences the operational capacities of freight and conventional passenger rail services, by examining the spatial and temporal variations of transport capacities of different rail systems. Ghofrani et al. [16] provided a comprehensive review of the recent applications of big data in three areas of railway transportation, namely, operations, maintenance, and safety, by a novel taxonomy framework proposed by Mayring [17]. Eltved et al. [18] proposed k-means clustering based on smart card data to analyze the impacts of long-term planned disruptions on railway passenger travel behaviors.
In terms of ticket-booking data, many researches related to real record passenger ticket-booking data focused on the passenger travel characteristics, passenger flow forecast, and seats’ allocation analysis. As for passenger travel characteristics, Li et al. [19] took advantage of WG-HSR ticket-booking data to analyze the disequilibrium of passengers from time and space aspects. Based on the passenger empirical data, Zhang and Yi [20] defined proportion factors to present statistic characteristics of passenger distribution at stations. Aoki and Toshiyuki [21] analyzed the arriving and departing passengers flow at each station. Hetrakul and Cirillo [22] investigated the heterogeneous characteristics of passenger behavior across railway markets based on Internet booking data with limited individual variables. Celikkol-Kocak et al. [23] discussed the passenger travel characteristics of the current HSR users in Turkey from different aspects. Yang et al. [24] characterized the passengers’ preferences between different HSR stations based on a detailed analysis of the passenger tickets data from a Chinese HSR corridor. Sun et al. [25] got passengers’ choice behaviors of various train services and their ticket purchase timing decisions by analyzing the large-scale ticket sales data. Cirillo et al. [26] proposed a dynamic discrete choice model of ticket cancellation and exchange for railway passengers based on the tickets data. Jiang et al. [27] estimated a microsimulation model to address the scheduled timetable evaluation problem based on big passengers’ data, taking the number of alighting and boarding passengers, train load factors, passengers’ waiting time for trains, and the number of waiting passengers on the platforms into account.
In terms of passenger flow forecast, Tsai et al. [28] constructed two novel neural network structures for short-term railway passenger demand forecasting. Jiang et al. [29] proposed a hybrid approach combining EEMD and GSVM for short-term prediction of HSR flows based on the passenger empirical data. Zheng et al. [30] proposed a two-stage time variants exploration method, and a real passenger flow dataset was collected from Taipei rapid transit corporation, in order to investigate the viability of the proposed method. Lai et al. [31] estimated a hybrid short-term passenger flow forecasting model considering the impact of train service frequency; the operational data of Beijing-Shanghai high-speed railway from 2012 to 2016 were used to verify the effectiveness of the model.
Regarding seats’ allocation problem, Cheng [32] showed the Taiwan high-speed rail development and the load factor of trains from January to July were presented. Ongprasert [33] studied the seats’ allocation problem for intercity high-speed rail services in Japan, including revenue maximization, average passenger load factor, and the number of passenger rejection. Hetrakul and Cirillo [34] estimated passenger choice models to explain ticket purchase timing of passenger railway based on the ticket-booking information and solved a joint pricing and seat allocation problem for revenue management. Li et al. [35] displayed the train attendance and load factor of trains running on WG-HSR and then explored the relationship between train attendance and load factors, which can contribute to a better seat allocation. Yazdani et al. [36] proposed a real-time seat allocation algorithm to control the distribution of passengers with free-seat tickets aiming at minimizing the boarding/alighting time across an entire route and estimated a simulation system and performance indicator to evaluate the efficacy of the proposed seats’ allocation algorithm.
Despite of the breakthroughs of the above literature, there is a lack of systematic and detailed statistical analysis of HSR train operation and passenger distribution based on the real record data. Besides, seldom literature focuses on load rate of trains, limited by the difficulty of real record data access. Specially, the limitations of current research are shown as follows.(1)Few literature conduct research studies related to the HSR train operation from a systematic view. Generally, most researches evaluate the performance of HSR from just one perspective, such as train operation quality, the passenger flow structure and the distribution of primary delay, and the capacity utilization. However, few researches assess the operation quality of HSR from diverse aspects simultaneously.(2)The current researches cannot reflect the evolution process of train operation over a long period. Train operation is associated with timetable, and timetable in China is programmed and updated frequently due to remarkable variation in passenger demand or the improvement of infrastructure. Usually, the current researches dedicate efforts to timetable evaluation, just focusing on a certain timetable, and few studies concentrate on various timetables at different time periods.(3)The current researches on railway passenger flow usually take one day as research unit or use the survey data, which is too rough to illustrate the microscopic features of passenger flow. It is necessary to consider origin and destination (OD) passengers’ temporal-spatial preferences during one day in conjunction with train dispatching decisions at the line planning stage, and just a few researches have characterized the OD passengers’ travel behavior during different time periods [24].(4)Few researches focus on the spatial-temporal analysis of load rate. Load rate is a key indicator to measure the conveying capacity utilization of trains. The load rate depends heavily on passenger demand, train routes, train capacity, and departure times. Analysis on the relationship between load rates and the influence factors is valuable for the train operation plan rescheduling.
To be different from the existing researches, train operation records of the WG-HSR line from 2015 to 2016 are collected. Firstly, based on the long-term real record data, this paper tries to conduct a systematic analysis on the operation quality of WG-HSR comprehensively, concerning the train operation, passenger distribution, and load rate, and thus provides a fully comprehensive overview of the operation quality of HSR. Then, to reflect the evolution process of train operation over a long time, this paper evaluates the timetables during different time periods with multiple facts, including service frequency of trains, interval time between train events, and capacity utilization of HSR. Meanwhile, to characterize the spatial-temporal preferences of passenger flow, this paper investigates passengers’ travel behaviors during different time periods and analyzes the distribution of OD passengers between several origin and destination pairs, based on the ticket-booking data from WG-HSR corridor. Furthermore, for a better understanding of the trains load rate, this paper discusses the load rate distribution of cross-line trains and non-cross-line trains, respectively. The load rate is affected by the running distance and departure time of trains. The curves of load rate varying with trains‘ routes and departure times are presented.
Analyzing the train operation quality and passenger distribution of HSR based on long-term real record data has a positive significance on further studies and practices, especially for Chinese HSRs, and can be helpful for the optimization of HSR operation.
3. Data Description and Analysis Framework
3.1. Data Description
3.1.1. Description of WG-HSR Layout
In southern China, WG-HSR with 1,069 km directly connects Wuhan with Guangzhou. Figure 1 shows the layout of WG-HSR and the stations on the line. There are 17 stations on the line and Table 1 lists the abbreviation of each station. Note that Lechang East station (LCE) on WG-HSR did not service for passenger until May 1st, 2017, and before the date LCE station just operated as train passing station and overtaking station [37]. The train operation records and passenger tickets-booking data used in this paper were collected before 2017. Thus, as for train operation analysis, LCE station is taken into consideration since trains run through LCE station, which influences capacity utilization. On the contrary, passenger flow analysis is conducted without regard to LCE station because the station did not service for passenger before 2017.

With the developing of Chinese HSR network, WG-HSR and Kunming-Shanghai HSR (HK-HSR) intersect at Changsha South station while WG-HSR and Chengdu-Shanghai HSR (HHR-HSR) intersect at Wuhan station. Nanning-Hengyang HSR (NH-HSR) joins into WG-HSR at Hengyang station. The analysis of train operation quality and passenger distribution is a typical case to learn about the train service and passenger demand of HSR in China.
3.1.2. Train Operation Data
The daily train operation records of the WG-HSR line were collected. Only the train data related to 14 stations and 13 sections from Guangzhou north to Chibi north were obtained from the Guangzhou Railway Company. These stations are colored in orange in Figure 1. Note that LCE station can handle the departure and arrival of trains and train operation records of this station should be considered. The data gathered from February 24th, 2015, to November 30th, 2016, includes 29662 records for up-direction trains and 29662 records for down-direction trains. Employing the data, we can formulate and figure out the dwell times and the respective distributions at each station. Table 2 shows several samples of train operation records.
In addition, train running records contain the following information.(i)Train ID, including train types distinguished by G and D(ii)Names of stations(iii)Arrival times, departure times, planned arrival times, and planned departure times in the format of “year/month/day and hour: minute: second”(iv)The interval between train events at stations, including the interval between the successively arriving trains and the interval between the successively departing trains at each station.
Note that the statistical analysis mainly focuses on the G-type trains and the reasons are as follows. (1) G-type trains account for a larger percentage of the total number of trains. There are about 100 G-type trains and just 5 D-type trains per day. (2) The D-type trains are overnight trains with comfortable sleeper cabins running between Guangzhou and Beijing. The departing time of D-type trains mainly concentrated in the time period from 20 : 00 to 21 : 00. And there are no G-type trains running at night. (3) The capacity of WG-HSR at night is sufficient and the train operation is simple during the night.
3.1.3. Passenger Volume Data and Tickets-Booking Data
Passenger volume data and passenger tickets-booking data were obtained from the Railway Passenger Transport-Decision Support System (RPT-DSS) of Guangzhou Railway Company. RPT-DSS is a railway passenger tickets database, which can collect and process the passenger tickets by Online Analytical Processing (OLAP). This system could help the policy makers get a comprehensive understanding of passenger flow information, train seat occupancy, and other passenger trains’ operation technical indicators.
Daily passenger volume data between different stations in the segment from GZS station to WH station were collected from January 1st, 2012, to December 16th, 2015.
Limited to the jurisdiction of Guangzhou Railway Company, only the tickets-booking information of Guangzhou Railway Company is available. The information of ticket from GZS station to CSS station was gathered from October 1st, 2015, to October 31st, 2015.
The information on the tickets whose travelling routes are in the jurisdiction of Guangzhou Railway Company is also collected, just for one day, December 2nd, 2015. Table 3 shows some samples of ticket-booking records. Note that LCE station did not service for passengers before May 1st, 2017, and there is no passenger ticket-booking data concerning the station. In this paper, passenger flow analysis of LCE station is not taken into consideration.
The passenger ticket-booking data gathered from RPT-DSS includes the following elements.(i)The depart date of the passengers(ii)The origin station and destination station of passengers(iii)Train ID, including train types distinguished by G and D(iv)The origin station and destination station of trains(v)The classification of the seats, including the soft seats and the standby seats(vi)The number of passengers in different seats.
3.2. Analysis Framework
To understand train operation and passenger distribution, we propose a real record data-driven analysis framework consisting of four parts. Figure 2 shows an overview of the analysis framework and methods for train operation and passenger distribution. Firstly, data cleaning and preparation are conducted to identify the outliers in train operation data in order to fetch valid data for statistical analysis. Secondly, based on the prepared train operation data and ticket-booking data, data processing and mining are carried out to extract the evaluation indexes. Then, various graphs concerning the evaluation indexes of train operation and passenger flows are designed by means of data visualization. It is a convenient way to present characteristics of train operation and passenger flow. Eventually, graphic analysis related to train operation, passenger distribution, and the load rates is constructed to give a detailed assessment of the performance of WG-HSR operation. Mathematical statistics, statistics analysis, data visualization method, and chart analysis method are adopted in the framework to demonstrate the train operation quality and passenger distribution of WG-HSR.

3.2.1. Data Cleaning
The purpose of data cleaning is to find and process “dirty data” in raw data. Since raw data used in this paper was obtained from independent systems in Guangzhou Railway Company over a long-time span, the quality of raw data is low. Data cleaning is necessary to eliminate fault data, noises data, and inconsistent data or restore missing data contained in the dataset. There are three issues to be dealt with in the data cleaning process.(1)The formats of timestamps indicating the arrival and departure time of trains at stations are not uniform. Some timestamps were recorded in minutes, such as “65230 min”; while some timestamps were recorded in the format of “year/month/day and hour: minute: second,” such as “2015/12/31 22 : 12 : 30.” In the data cleaning process, all timestamps are recorded in “second” uniformly.(2)There are outliers in the tickets-booking data, especially in the passenger flow time series data. Fault data and noise data are potential outliers, and these data should be identified and restored from the original dataset. In this paper, the distant-based outlier detection method is used, and the detected outliers are deleted or restored from the original dataset.(3)Data missing and data duplication are common in raw database collected from RPT-DSS. Since passenger flow data is derived from ticket-booking data, missing data and outliers should be restored. The interpolation method is applied to repair the missing data and outliers. DE duplication is conducted to remove the duplication data from raw dataset.
3.2.2. Data Processing
Data processing aims at acquiring the prepared data with essential attributes according to the research objectives. In this paper, the raw data is processed into prepared data by means of data extraction, data transformation, and data integration. The prepared data concerning train operation, passenger flow, and load rate of trains can be described as follows.
Train operation data contains the following items:(i)Train arrival intervals at each station per day(ii)The number of up-direction trains on HSR per day(iii)The service frequency of trains at each station(iv)The service frequency of trains during different time periods at each station(v)Capacity utilization in different segment and time period.
Capacity utilization can be calculated by formulas (1) and (2), where is the capacity utilization of segment during one hour, is the number of arriving trains in the segment during one hour, is the max number of trains that can run in the segment per hour, and is the minimum headway (5 min in the study) of trains.
Passenger flow data contains the following items:(i)The total number of passengers conveyed by all the trains on HSR(ii)The passenger flow during weekday and vacation per day(iii)The number of passengers departing from each station(iv)The number of passengers arriving at each station(v)The number of passengers transferred by trains during different time periods(vi)The passenger volume between different origin and destination pairs (OD passenger)(vii)OD passenger flow during different time periods.
Load rate of trains contains the following items:(i)The load rate of train in different section(ii)The average load rate of train over the whole trip.
The load rate and the average load rate of the train can be calculated by formulas (3) and (4), where is the load rate of train as the train running in section , is the number of passengers on the train as the train running in section , is the seat capacity of train, is the average load rate of train over the whole trip, and is the total number of sections the train is passing by.
All the investigated indicators listed above can be classified into macroscopic indicators and microcosmic indicators. The microcosmic indicators refer to the items with temporal features, which are generally measured by hour, including the service frequency of trains during different time periods at each station, capacity utilization during different time periods, and OD passenger flow during different time periods. Microcosmic data indicates the fluctuation of train operation and passenger distribution, the temporal attributes of which can be labeled according to the timetable of HSR. All the other items are macroscopic indicators, which are generally measured by day. Since raw data is recorded by day, the macroscopic data can be easily calculated by adding up each separate date in the real record dataset.
3.2.3. Data Visualization
Data visualization is a term frequently used to describe multivariate data. In this paper, some software such as MATLAB, Eviews, and Python are applied to visualize the prepared data. In this way, line graph, bar graph, scatter graph, and histogram are presented, which can demonstrate the characteristics of the prepared data directly.
3.2.4. Charts Analysis
Charts and graphs are efficient ways to show the performance of train operation and the distribution of passenger flow. Based on the charts and graphs, charts analysis results are presented from the following perspectives: (1) train service frequency, the interval time of trains, and capacity utilization of WG-HSR; (2) spatial-temporal distribution of passengers departing from each station and the OD passengers between different stations; and (3) the load rate of trains and the distribution of load rate varying with distance on WG-HSR. The analysis results and conclusions are significant to improve train operation quality.
4. Statistical Analysis on Train Operation
In this section, the train operation quality is presented by dealing with train operation data, including the total number of service trains on the line, stopping frequency of trains, interval time between events, and the capacity utilization of each section.
4.1. The Service Frequency of Trains on WG-HSR
The number of up-direction trains running on WG-HSR per day from March 2015 to October 2016 was obtained. Boxplot in Figure 3 shows the temporal-spatial distribution of trains running on WG-HSR, from GZN station to CBN station.

(a)

(b)
Figure 3(a) shows the average number of trains per day during each month, which can indicate the temporal distribution of trains running on WG-HSR. The horizontal axis is the month sequence, while the vertical axis is the average number of trains per day in one month. Each box in the figure illustrates the scattering zone of the train quantity in each month. And the red line shows the average number of trains per day every month. In the investigation period, the red line ascends with sharp fluctuations, indicating that the average number of trains each month shows an increasing trend. It means that WG-HSR is much busier as time goes by, owing to the increasing demand of passengers and the expansion of HSR network in China. Meanwhile, the fluctuation of trains’ number in different month is obvious. There are also some outliers in Figure 3(a), which means the number of operating trains in these days is especially low. The fluctuation is affected by the vacation or timetable adjustment. For example, the number of trains in 2015-10 and 2016-10 is higher than most of the rest months of the years. Because of the National Day vacation, the number of trains increases to satisfy passengers’ travel demand.
In Figure 3(b), the horizontal axis is the stations’ names while the vertical axis is the average daily number of trains departing from each station. It is obvious that the number of trains running in the section from HYE station to CSS station is larger than the other stations, while the number of trains running from MLE station to CBN station is lower. The uneven distribution of the number of trains in different station is affected by the layout of HSR network, as well as the operation of cross-line trains. In the railway network, HYE station connects Nanning-Hengyang HSR while CSS station connects Shanghai-Kunming HSR. The trains departing from Nanning-Hengyang HSR run into WG-HSR at HYE station; thus, the number of trains running in the section from HYE station to CSS station is more than that in the section from GZS station to HYE station. Similarly, some cross-line trains leave the WG-HSR and run into Shanghai-Kunming HSR at CSS station, which results in a reduction in the number of trains in the section from MLE station to CBN station. It can be included that the layout of railway network plays an important role in train operation.
The distribution of trains stopping frequency is shown in Figure 4. Figure 4(a) shows the statistical results of trains stopping frequency at each station. The vertical axis is the average number of stopping trains at different stations per day from March 2015 to October 2016. It is obvious that the train stopping frequencies at SG station, CZW station, HYE station, and CSS station are much higher than the rest stations, indicating the fact that the passenger demands at these stations are higher. The number of stopping trains at ZZW station and YYE station keeps a medium level while the trains’ stopping frequency at other stations is low. Since LCE station did not service for passengers, no passenger trains stop at the station.

(a)

(b)
Figure 4(b) shows the number of trains with different stopping frequencies. The horizontal axis is the stopping frequencies of trains running from GZS station to CBN station while the vertical axis stands for the average number of trains with different stopping frequency per day. It is clear that the stop frequencies of trains concentrate in the range from 3 to 5. The number of trains with 5 stops ranks the first while seldom trains stop for 13 times on the WG-HSR. That is, seldom trains stop at every station during the travel.
This part conducts research on trains’ travelling speeds. Figure 5(a) shows the histogram of trains’ travelling speeds. The statistical results show that the travelling speeds range from 109 km/h to 385 km/h and the average travelling speed is 256.2 km/h. The travelling speeds of most trains concentrate in the range from 240 km/h to 270 km/h. Figure 5(b) shows the scatter plot of trains’ travelling speeds and trains’ stop frequencies. The line represents the average train travelling speed of trains with different stop frequency. The line shows a decreasing trend, and it can be concluded that the average train travelling speed decreases as the train stopping frequency increases. Since the mean value of trains’ travelling speeds is 256.2 km/h, it can be drawn that the travelling speed of one train will be lower than the average value if the train stops for more than 2 times on the WG-HSR.

(a)

(b)
4.2. The Interval Time between Train Events
Figure 6 shows the spatial-temporal distribution of trains’ arrival intervals. The boxplot in Figure 6(a) presents distribution of trains’ arrival intervals at each station per day. The average train arrival interval time is 10 min. The trains’ arrival intervals at stations can be classified into 3 levels. Firstly, the average train arrival interval time in the segment from HSW station to CSS station is the lowest, which means this segment is very busy. The up-direction trains originating from Nanjing-Hangzhou HSR occupy this segment, resulting in the lower train arrival interval time. Secondly, the average train arrival interval time in the segment from QY station to HYE station keeps at the media level; in this segment, most trains originate from GZS station. Thirdly, the average train arrival interval time in the segment from MLE station to CBN station increases and remains higher, and the segment is not too busy since some trains may travel into Kunming-Shanghai HSR station at CSS station.

(a)

(b)

(c)
Figure 6(b) shows the temporal distribution of trains’ arrival intervals during different time periods per day. The boxplot shows the dispersion of trains’ arrival intervals in each hour while the line stands for the average train arrival interval time of the hour. Three kinds of primary time intervals can be obtained: (1) early morning time (typically 05 : 00–7:00): trains start to operate during this time period. The service frequency of trains on WG-HSR in this time period is low and the trains’ arrival intervals are higher. (2) Peak time (typically 08 : 00–18 : 00): in this period, the trains’ arrival intervals decrease and then remain relatively low, which means the service frequency of train increases during this time and the density of trains on the HSR is high. (3) Evening and night time (typically 19 : 00–23 : 00): in this period, the trains’ arrival intervals show a slight increase then decrease during 23 : 00.
Figure 6(c) represents a histogram of trains’ arrival intervals duration. The horizontal axis is the train arrival interval duration while the vertical axis is the frequency of each interval. The blue bar in Figure 6(c) indicates the frequency of different arrival interval duration. Most of the arrival interval durations concentrate in the range from 0 to 25 min. The red line in the figure represents the cumulative proportion of trains with different train arrival interval durations. It is obvious that most train arrival interval duration is less than 20 min, while the percentage of train arrival interval duration less than 13 min is about 81%.
4.3. Analysis on the Capacity Utilization of WG-HSR
The scheduled railway timetable in China is programmed and updated every year or every season due to variation of passenger demands or infrastructure, especially as new lines start to operate. The scheduled timetables in WG-HSR was revised on 2015/07/01, 2016/01/10, and 2016/05/15, leading to four timetable periods. Statistical analysis based on the real record timetables is carried out to show the spatial-temporal distribution of trains arriving at different stations. During each timetable period, the trains’ operation followed the scheduled timetable; thus, there are no great differences in the real record data of trains‘ operation every day. The average number of trains arriving at one station during different time duration per day is calculated, as shown in Figure 7. In this heat map, the horizontal axis stands for the station while the vertical axis is the time for one day; the number of trains arriving at stations is measured by the colored area in the figure.

(a)

(b)

(c)

(d)
The spatial-temporal uneven distribution of trains is obvious. In terms of spatial uneven distribution, the number of trains arriving at MLE, YYE, and CBN stations are fewer than at other stations, leading to lower capacity utilization in the segment from MLE to CBN. Conversely, the numbers of trains arriving at HYE, HSW, ZZW, and CSS stations are much more than at other stations because Nanning-Hengyang HSR joins into WG-HSR at HYE station and some trains run into the WG-HSR from Nanning-Hengyang HSR at HYE station. Thus, the train operation in the segment from HYE station to CSS station is busier, leading to a potential capacity bottleneck.
In terms of temporal uneven distribution, there are some time periods/hours during which trains arriving or departing intensively, leading to trains’ squeeze. The time periods mentioned above can be called train peak hours. In Figure 7, the darker the color is, the more intensive the trains are. Thus, the rectangle with a darker color in Figure 7 can be regarded as peak hour, which is also a potential bottleneck. Firstly, since the number of trains departing/arriving at station during different times is various, the distribution of train peak hours at different stations is diverse and dispersed. Then, from the view of rail network, the propagation characteristic of peak hours at different station might congregate in some blocks, which may cause a serious bottleneck in train operations.
In Figure 7(a), during the time from 11 : 00 to 12 : 00, the peak hours can be seen at GZN, YDW, and HYE stations. As time goes by, the peak hours spread over time at different stations with the moving of trains. It is clear that at 17 : 00 peak hours can be found at HYE, HSW, ZZW, and CSS stations. This demonstrates that the peak hours of distinct stations are different, caused by the temporal uneven distribution of trains, as well as the congregation of trains with different operation routes.
In Figure 7(b), during the early time from 7 : 00 to 9 : 00, the number of arriving trains in the segment from GZN to SG station is higher than any other time periods, since they are the first stations that the trains depart from or pass by, and in the morning these stations are busy in general. For the segment from HYE to CSS station, the peak hour starts to arise at 11 : 00 and lasts for a long time, ending at 17 : 00, which may lead to a capacity bottleneck.
In Figure 7(c), the peak hour can be found in the segment from HYE to CSS station at 12 : 00 and 13 : 00. The number of trains running on the WG-HSR in Figure 7(d) is more than that in Figure 7(c), but the train arriving distribution in Figure 7(d) shows a similar trend as in Figure 7(c).
To examine the temporal-spatial distribution of capacity utilization of WG-HSR, the capacity utilization in different segments and time periods is investigated, as depicted in Figure 8. When the headway for Chinese HSR is 5 min, the max number of trains running in a section during one hour is 12. Based on this and the real record train operation data, the capacity utilization of each section during different time can be calculated. The capacity utilization in each section is calculated every three hours, from 6 : 00 to 23 : 00. In Figure 8, the horizontal axis stands for the segment and the vertical axis is the capacity utilization.

(a)

(b)

(c)

(d)
From an overall perspective, the capacity utilization during the time period from 2016/01/10 to 2016/05/14 is much lower than other timetable periods, as shown in Figure 8(c). In terms of temporal uneven distribution, the capacity utilization during the time period from 9 : 00 to 17 : 00 is about 60%, which is higher than other time periods. More trains are scheduled to operate during these time periods to satisfy the passenger demand. The capacity utilization from 18 : 00 to 20 : 00 shows a medium level, which is about 40%. During the early hours every day from 6 : 00 to 8 : 00, it is obvious to see a decreasing trend in the capacity utilization from GZN station to CBN station since most of the up-direction trains originate from GZN station. Similarly, there is an increasing trend in capacity utilization from GZN station to MLE station during the time period from 18 : 00 to 21 : 00 since fewer trains depart from GZN at this time.
When it comes to the spatially uneven distribution of capacity utilization, the capacity utilization in the segment from HYE station to MLE station is higher than other segments during most of the time period except for the period from 6 : 00 to 8 : 00. The segment is easy to be a bottleneck due to the high-capacity utilization. There is a significant decrease in the capacity utilization at the station MLE, and the capacity utilization in the segment from MLE to CBN is lower and more trains can be scheduled in this segment.
5. Passenger Distribution on WG-HSR
Figure 9 depicts the daily passenger volume transferred by WG-HSR from January 1st, 2012, to December 16th, 2015. The orange line stands for the moving average values (moving windows = 7). The passenger flow shows an increasing trend as well as periodic characteristics. The fluctuations in passenger flows every year are similar. Generally, the passenger volume in the time duration from the first couple of days in January to the end couple of days in February shows an increasing trend, which is affected by the New Year Holiday in January and the Spring Festival vacation during the time period. Usually the average number of passengers in February per day is higher than that of January, since the Spring Festival rush is considered to be the largest people migration and many people back and forth between the hometown and working place. During March, the passenger demand is low. Then, the passenger volume reaches brief peaks, respectively, in April and May due to the Tomb-Sweeping Day and Labor Day in the months. During the time from July to the end of August, the passenger volume presents a shape of M, which is affected by the summer vacation and a lot of students back and forth between school and home. The passenger volume in September is low and then reaches a peak in the first couple of days in October due to the National Day vacation. After the National Day vacation, the passenger volume decreases and keeps steady in November and December.

5.1. The Spatial Distribution of Passengers
The data of daily passenger volume between different stations in the segment from GZS station to WH station are collected from January 1st, 2012, to December 16th, 2015, and then the average passenger volume in different sections per day can be got. The heat map in Figure 10(a) presents the spatial distribution of passengers between different stations. The horizontal axis stands for the departure station while the vertical axis is the arrival station, and the number of passengers between stations is measured by the colored area in Figure 10(a). The number of passengers departing or arriving at GZS, CSS, and WH stations is larger than at other stations, accounting for a large proportion. Since Guangzhou, Changsha, and Wuhan are prosperous cities, the passenger demands in these cities are more intense.

(a)

(b)
Figure 10(b) presents the precise number of passengers in each section in descending order. Firstly, the symmetry of the distribution of passenger flows in the up-direction and down-direction is obvious. For example, the larger up-direction passenger volume can be found in the sections GZS-CSS and GZS-WH while the larger down-direction passenger volume can be seen in the sections CSS-GZS and WH-GZS. Then, a large variation in the passenger demand between different sections is found. The number of passengers in the section GZS-CSS ranks first, followed by the section GZS-WH. It is obvious that passenger volumes in big cities are larger. For instance, passenger demands among the GZS, CSS, and WH stations are larger, since these stations are important terminal stations and they are all big cities appealing to passengers.
Figure 11 shows the distribution of passengers departing from different stations. In Figures 11(a) and 11(b), the number of passengers departing from GZS station is far larger than that of the rest stations. Since Guangzhou is a big city with a strong economy, the passenger demand is great. The number of passengers departing from CSS station ranks second, since CSS station is an important terminal in the HSR network. Passengers departing from WH, CZW, and HYE stations are larger, since the cities where the above stations located in are big cities. The passenger demand in small cities such as QY, YDW, and MLE is lower.

(a)

(b)
5.2. Temporal Distribution of Passengers
Figure 12 shows the temporal uneven passenger distribution on WG-HSR. Limited by the data, we just focus on the down-direction passengers travelling on the line from Changsha to Guangzhou on 2015/12/02. It can be found that the number of passengers reaches a peak at the time period 9 : 00–12 : 00 and another peak arises at the time period 15 : 00–18 : 00, while the passenger volume dips during 6 : 00–9:00 and 18 : 00–21 : 00.

(a)

(b)

(c)

(d)

(e)

(f)
5.3. Passenger Fluctuation during Holidays and Weekends
Figure 13(a) shows an uneven time-spatial passenger distribution in each section every day in October 2015. The passenger distribution in most sections shows a peak in the first couple of days in October, except for the down-direction passengers in the section WH-CSS. Then, the passenger volumes decrease from the 9th of October and keep steady in the left days of October. The passenger distribution in different directions of the same section is symmetrical, but it is not the case during the National Day Holiday (from October 1st to October 8th); the number of down-direction passengers in sections WH-GZS and CSS-GZS is much more than that in the opposite direction sections GZS-WH and GZS-CSS. Guangzhou is a more attractive city than Wuhan and Changsha for passengers. At the same time, the number of up-direction passengers in the section CSS-WH is much more than the number of down-direction passengers in the section WH-CSS. People prefer to go to Wuhan to spend their holidays compared to Changsha. During the National Day Holiday, the number of passengers travelling between GZS station and WH station ranks the first, followed by those between GZS station and CSS station, and the passenger volume in the sections CSS-WH and WH-CSS is the least.

(a)

(b)
Figure 13(b) shows the distribution of passengers departing and arriving at the stations on WG-HSR every day in October 2015. The number of passengers departing or arriving at GZS is much more than at other stations. The passenger volume at CSS station ranks second, followed by that of WH station. The passengers at HYE station and YYE station are lower. Owning to the National Day Holiday, the passenger volume during the first couple of days (2015/10/01–2015/10/08) at each station is much higher than at any other time.
5.4. Passenger Distribution between Different Stations
Figure 14 shows the average passenger flows during weekdays and weekends of different months in 2015. The number of passengers during February is larger than that of the rest months, owing to the Spring Festival Holiday. The Spring Festival Holiday lasts for seven days from February 18th, 2015, to February 24th, 2015. Spring Festival Holiday is the most important vacation in China and a large number of people travel between working place to hometown by trains to celebrate the vacation. The number of passengers in April is larger since there is Tomb-Sweeping Day festival in April. The Tomb-Sweeping Day is an important vacation in spring. Activities like going hiking, planting trees, and sweeping tombs are popular during the festival, resulting in a large passenger demand on Tomb-Sweeping Day. Usually, the passenger demand during weekends is higher than on weekdays, which is obvious in January and April. During February, there is no significant difference in passenger demand between weekdays and weekends, just because the Spring Festival Rush occupants most of the days of February and lasts for a long time.

(a)

(b)

(c)

(d)

(e)

(f)
6. Load Rates of Trains
6.1. The Load Rate of Non-Cross-Line Trains
In China, the HSR trains are classified into two types according to the departing and arriving stations. As for WG-HSR, if both the origin and destination stations of the train are on the WG-HSR, the train is called non-cross-line train. Otherwise, the train is called cross-line train.
The load rate data of non-cross-line trains on WG-HSR from March 23rd, 2015, to March 29th, 2015, is collected. Based on the data, a comparison of the load rate between the sections GZS-CSS and GZS-WH is shown in Figure 15. The load rate fluctuates more dramatically in the section GZS-CSS than the section GZS-WH. In the section GZS-CSS, the peak load rate can be seen at 9 : 00 and 15 : 00, while the trough can be seen at 19 : 54. The load rate in the section GZS-WH decreases to the lowest level around 8 : 00 and then fluctuates from 10 : 00 to 17 : 40 without a peak.

(a)

(b)
6.2. The Load Rate of Cross-Line Trains
There were 135 up-trains running on the WG-HSR on December 2nd, 2015, among which 81 were cross-line trains and 54 were non-cross-line trains. Several cross-line trains in Figure 16 are taken as an example to analyze the load rate of trains in each section. The cross-line trains on WG-HSR are classified into four types according to the running routes and running directions; the load rate of these trains is shown respectively in the four figures in Figure 17.


(a)

(b)

(c)

(d)
For the trains running on the WG-HSR and HHR-HSR to the direction of Chengdu (Figure 17(a)), the load rate variation tendencies of trains are similar. The load rate of trains reaches its peak in the section CSS-WH and then shows a dramatic decrease since WH station, leading to serious seats’ capacity waste. The average load rates of G1032 and G1312 approach 80% in the section CSS-WH. The average load rate of G312 and G1316 is about 50% in the section GZS-WH.
For the trains running on the WG-HSR and HHR-HSR to the direction of Shanghai (Figure 17(b)), the average load rate of trains in the section GZS-WH is higher than in other sections. There is a steady decrease in load rate after the station WH. The average load rates of G276 and G280 are higher than those of G294 and G288.
For the trains running on the WG-HSR and HK-HSR to the direction of Huaihua South (HHS) station (Figure 17(c)), the load rates of trains are higher in the section from Shenzhen North (SZN) station to CZW station and then keep steady. The average load rate of trains G6142 ranks the first, about 70% while the load rate of train G9666 ranks the last, about 35%. The load rates of the other three trains almost tie, about 40%.
For the trains running on the WG-HSR and HK-HSR to the direction of Shanghai Hongqiao (SHHQ) station (Figure 17(d)), the load rate of trains fluctuates with the stations. All the load rates of trains show a sharp decrease at Nanchang West (NCW) station. The average load rate of trains during the section SG-NCW is higher than the section NCW-SHHQ.
In conclusion, the load rate varies with trains and section. The load rate of the train in the section of WG-HSR is higher than the rest sections in train’s route. The existing train operation plan needs to be improved because load rate in some sections is low, leading to seats’ capacity waste.
6.3. Distribution of Load Rate Varying with Distance and Time
The load rate data of trains running on WG-HSR in March 2015 is collected, based on which the scatter plot related to load rate, running distance, and time periods is made, as shown in Figure 18. In terms of distance, most of the scatter points concentrate in the distance between 350 km and 550 km, which means the travelling distances of most passengers on WG-HSR range from 350 km to 550 km. There is a positive correlation between load rate and the running distance. The scatter plot proves that the load rate increased with the travelling distance, contrary to the load rate performed by the crossing-line trains, whose load rate decreases as the running distance increases. The load rate approaches 100% when the travelling distance is longer than 850 km. Before 9:00, the load rate is relatively low but with an obvious increase trend. From 9:00 to 19:00, the load rate keeps steady and then there is a decrease after 19:00. Compared to distance, the load rate does not change much over the time.

(a)

(b)
The scatter point graph concerning load rate of cross-line trains, running distance, and time are presented, which is shown in Figure 19. As to the graph related to the load rate and running distance, most of the scatter points concentrated in the distance range from 100 km to 1200 km. When the running distance of train is longer than 1200 km, the number of scatter points decreases. The trend line is plotted to show the relationship between load rate and running distance. The trend line proves the load rate keeps steady in the distance between 100 km and 700 km and then decreases as the running distance is longer than 700 km, which means the travelling distance of most passengers on the cross-line trains is between 100 km and 700 km. In terms of the graph related to the load rate and time periods, the scatter points distribute dispersedly during the time from 6 : 00 to 24 : 00. In the time from 6 : 00 to 7 : 00 and the time from 22 : 00 to 24 : 00, the number of scatter points is lower than the rest time of the day. The trend line is presented to show how the load rates of trains vary with time periods. It is clear that the line demonstrates an increasing trend from 6 : 00 to 12 : 00 and then decreases from 12 : 00 to 24 : 00. The fluctuation of trend line can reflect the passengers’ travelling preferences during different time periods.

(a)

(b)
7. Conclusions and Further Study
This paper presents a statistical analysis on HSR train operation and passenger volume based on the train operation real records and ticket-booking data on WG-HSR. The temporal-spatial distribution of train operation, capacity utilization and passenger flow are analyzed. Some conclusions can be got as follows.
For the train operation, the number of trains running in the section from Hengyang East station to Changsha South station is larger than other sections, while the number of trains running from Miluo East station to Chibi North station is lower. The stop frequency of trains concentrates in the range from 3 to 5 and most trains stopping for 5 times on the WG-HSR. The travelling speed of trains ranges from 109 km/h to 385 km/h, and the average travelling speed is 256.2 km/h. It can be drawn that the travel speed of one train will be lower than the average value if the train stops for more than 2 times on the WG-HSR. There are peak hours during which trains arrived intensively, and the peak hour spreads over time at different stations. From the view of rail network, the propagation characteristic of peak hours at different station might congregate in some blocks, causing a bottleneck in train operations.
The spatial and temporal characteristics of passenger flow are analyzed. It shows that the passenger volume between big cities such as Guangzhou, Wuhan, Changsha, and Shanghai is higher. The passengers prefer to travel during the time 9 : 00–12 : 00. The cross-line passengers travelling on WG-HSR make up a large proportion, and most of the cross-line passengers travel from Guangzhou to Shanghai.
At last, the paper gets the relationship between load rate, running distance, and running time based on the real records data concerning load rate. The correlation between load rate and the running distance of non-cross-line trains is positive while the correlation between load rate and running distance of cross-line trains is negative. Compared to running distance of trains, the load rate does not change much over the time. The load rate of trains in different sections is uneven, and the existing train operation plan should be improved based on the actual passenger demand.
This work is part of our research on capacity evaluation of Chinese HSR. Some limitations should be addressed in future work. For instance, some more ticket-booking data related to cross-line trains should be got to analyze the passenger distribution on WG-HSR, and the relationship between characteristics of passenger flow and capacity utilization should be explored. In our future study, an updated train operation plan and timetable should be proposed considering the passenger demand and capacity utilization.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported by the Fundamental Research Funds for Chengdu University of Information Technology (Grand no. KYTZ202262), the National Nature Science Foundation of China (Grants nos. U1834209 and 71871188) and the Fundamental Research Funds for the Central Universities (Grant no. 2682021CX051). The authors are grateful for the contributions made by their project partners.