Abstract

Commuting pattern is one of the most important travel patterns on the road network; the analysis of commuting pattern can provide support for public transport system optimization, policy formulation, and urban planning. In this study, a framework of the key commuting route mining algorithm based on license plate recognition (LPR) data is proposed. And the proposed algorithm framework can be migrated to any similar spatiotemporal data, such as GPS trajectory data. Commuting pattern vehicles are first extracted, and then, the spatiotemporal trip chains of all commuting pattern vehicles are mined. Based on the spatiotemporal trip chains, the spatiotemporal similarity matrix is constructed by dynamic time warping (DTW) algorithm. Finally, the characteristics of commuting pattern are analysed by the density-based spatial clustering of applications with noise (DBSCAN) algorithm. Different from other researches that analyse the commuting pattern using machine learning algorithms based on all data, this study first extracts commuting pattern vehicles and then designs a key commuting route mining algorithm framework for commuting pattern vehicles. Taking Hangzhou as an example, through the framework of mining algorithm proposed in this study, the commuting pattern characteristics and key commuting routes in Hangzhou have been successfully excavated, and policy suggestions based on the analysis results have also been put forward.

1. Introduction

With the increasing number of motor vehicles, roads are becoming more and more congested, especially during morning and evening peak hours [1]. Commuting is the most important travel purpose on the road network and also the main cause of road network congestion in the morning and evening peak hours. Generally speaking, commuters are mainly composed of people who work or go to school. Commuting refers to the behavior of going from home to the company or school in the morning peak hours and from the company or school to home in the evening peak hours. The morning and evening peak hours refer to the time period when the traffic flow of the urban road network is very large in the morning and afternoon, respectively. Analysing the commuting pattern is expected to help ease traffic congestion [2]. Vigorously developing the public transport system is recognized by the transportation scientists as the best solution to alleviate traffic congestion [3]. However, if there is no theoretical basis, the barbaric development of public transport will bring a high financial burden to the government [4, 5]. Therefore, many measures to optimize and develop public transport lines have been proposed [6, 7]. Public transport system is one of the main travel modes that the commuters rely on, so analysing the travel behavior of commuters can provide support for the optimization of public transport system. For example, the commuting time and trajectory of commuters can be analysed to help adjust bus deployment strategies and optimize bus lines [8]. The analysis of commuting travel behavior can also provide support for public transport companies to formulate differentiated market strategies and attract more commuters to travel by public transport [9]. And for the current popular customized shuttle bus, the analysis of commuting pattern is particularly important [10]. The formulation of many policies also requires the analysis of commuting pattern as a theoretical basis, such as the travel restriction policy used in many big cities of China. This travel restriction policy has a huge impact on commuters, because commuters generally travel in the morning and evening peak hours, and their travel flexibility is small; it is difficult for them to avoid the impact of the travel restriction policy. Therefore, the determination of travel restriction time period, area, and rules that need to be determined for the implementation of the restriction policy requires the analysis of commuting pattern as a theoretical basis [11]. In addition, the analysis of commuting pattern is also very meaningful for urban planning and transportation planning. For instance, Ma et al. [12] analysed the commuters using public transport in Beijing based on the transit smart card data and found that there had been a serious imbalance between occupational and residential areas in Beijing. In the subsequent urban planning, some measures such as adding some commercial areas in residential areas can be considered to reduce the imbalance. Therefore, commuting pattern analysis is a very important research direction, which can provide support for public transport system optimization, policy formulation, urban planning, and traffic planning.

With the development of intelligent transportation system (ITS) [13, 14], a large amount of spatiotemporal big data is collected, such as mobile phone signaling data, GPS trajectory data, and license plate recognition (LPR) data. With the help of these data, commuting pattern can be analysed in depth. Based on LPR data, this study first identifies commuting pattern vehicles on the road network and then analyses the spatiotemporal travel characteristics of the commuting pattern vehicles. The morning and evening peak spatiotemporal trip chains of these vehicles are extracted, and the travel time similarity and travel trajectory similarity between vehicles are calculated, respectively, based on the spatiotemporal trip chains. Then, the spatiotemporal similarity matrix between vehicles is constructed by travel time similarity and travel trajectory similarity. On this basis, this paper analyses the spatiotemporal characteristics of the commuting pattern of the road network, summarizes the rules of urban commuting pattern, and provides support for policy formulation and urban planning.

The remaining of this paper is organized as follows: Section 2 summarizes the relevant literature. Section 3 introduces the data used in this study. Spatiotemporal characteristics of commuting pattern vehicles are analysed in Section 4. Then, Section 5 introduces the method of similarity measurement of spatiotemporal trajectory and the algorithm framework of key commuting route mining based on the spatiotemporal trajectory similarity matrix. On this basis, Hangzhou is taken as an example to mine key commuting routes. In Section 6, we discuss the results of key commuting routes mining and the land use properties of the origins and destinations of key commuting routes. Finally, the conclusion of this work is given in Section 7.

Although commuting is a very important type of travel pattern, currently, there is no unified definition of the commuting pattern. This study adopts the definition proposed by Yao et al. [2], which defines the commuting pattern as a type of regular trip, usually occurring in the morning or evening peak hours, during which the commuter travels between the same origin and destination (generally home and workplace or home and school) over a long period of time.

Currently, there are many studies on commuting behavior. From the perspective of data sources, the studies can be roughly divided into two categories. One is to carry out research on commuting pattern and commuting travel mode choice through traditional survey data, such as questionnaire or household survey. The other is to analyse commuting pattern based on large samples or even full samples of spatiotemporal big data. The studies based on questionnaire data mainly analyse the influencing factors of the choice of commuting mode. Ingvardson et al. [15] proposed an integrated choice and latent variable model, analysed the choice of commuting mode based on the questionnaire data, and took psychological needs into account in the model. Lizana et al. [16] analysed the impact of attitude, habit, socio-economic factor, and bicycle facility and riding experience on commuting mode choice based on the questionnaire data and found that attitude has an important impact on the commuting mode choice. EK et al. [17] analysed the influencing factors of walking and cycling in commuting based on the questionnaire data and found that the most important influencing factors are health, environment, and accessibility. This kind of research is often analysed in detail, and the description of behavior is comprehensive and in-depth. In addition, questionnaire data can be used to analyse the correlation between factors related to subjective emotions and attitudes of people and commuting, which is an advantage that spatiotemporal big data does not have. For example, based on 7,837 questionnaires from 327 communities in China, Yin and Shao [18] analysed the relationship between commuting, built environment and happiness, and found that there is a nonlinear relationship between commuting time and happiness. However, limited by data collection methods, the amount of questionnaire data is generally small, and the robustness of conclusions is easily affected by factors such as sampling and questionnaire quality, while spatiotemporal big data have advantages in these aspects. With the rapid development of ITS [19], more and more multi-source spatiotemporal big data have been accumulated, and the analysis of commuting pattern based on multi-source spatiotemporal big data has become a hotspot.

Mobile phone signaling data are a data type that has received widespread attention, because of its wide coverage and full sampling rate. A large number of studies are based on mobile phone signaling data to analyse the commuting pattern. Bonnetain et al. [20] proposed a framework for processing mobile phone signaling data, performing trip chain interruption and finely reshaping the trajectory. Based on this framework, they processed the mobile phone signaling data of more than 10 million individuals and mined some useful information in the city, such as the proportion of various travel modes, commuting routes, urban activities, and traffic flow. Yang et al. [21] obtained the home and work locations of each individual based on mobile phone signaling data. On this basis, they analysed the population distribution, urban commuting pattern, job-housing balance, and some other issues. Based on the mobile phone signaling data of Wuhan city, China in 2016, Liu et al. [22] used multiple regression and moderated multiple regression models to study the impact of land use type and accessibility on commuting flows. It is found that residential areas are the main origin of commuting flows, commercial areas attract a lot of commuting flows, and the type of land use plays an important role in regulating the relationship between the commuting distance and commuting flows. Although the mobile phone signaling data have the above advantages, these data are generally located through mobile phone base stations. Therefore, the positioning accuracy of mobile phone signaling data is not high, and there exit data drift and the “ping-pong effect.” As a result, there are some limitations in the analysis of micro-commuting behavior based on mobile phone signaling data [23].

Compared with mobile phone signaling data, GPS trajectory data have the advantages of high precision and high acquisition frequency. So based on GPS trajectory data, a more refined analysis of commuting behavior can be performed. Li et al. [24] analysed the travel behavior of shared bicycles based on e-bike GPS trajectory data and found that 9.4% of the morning peak trips were spillover commuting. Fu et al. [25] analysed the commuting behavior and its spatiotemporal distribution characteristics of taxis based on the taxi GPS trajectory data and effectively identified the place of employment and residence according to OD analysis. Gokasar and Cetinel [26] analysed the road network congestion situation and bottleneck during morning and evening peak hours using bus GPS trajectory data. All kinds of GPS trajectory data play an important role in commuting pattern analysis with the advantages of high precision and high acquisition frequency of track points. However, the GPS trajectory data are often only about a single industry or a single type of vehicle, so the analysis scope is limited.

LPR data are a new type of data source; they are generally collected by the government. Due to privacy problem, LPR data are difficult to obtain. However, LPR data are still receiving widespread attention for their wide coverage of motor vehicles, full sampling rate, and high accuracy. It is conceivable that once privacy problem is resolved; LPR data will exert tremendous power. Therefore, the analysis of commuting pattern based on LPR data is a very promising research direction. Chen et al. [27] extracted ten features that reflect the travel characteristics of motor vehicles based on LPR data and used the kmeans algorithm for the clustering analysis of vehicles. They divided the vehicles on the road network into nine categories, including the commuting pattern vehicles. Commuting pattern vehicles are vehicles with commuting behavior characteristics. They are usually used for commuting to and from work or school. Chang et al. [28] proposed a two-stage kmeans clustering algorithm to identify commuting pattern vehicles on the road network. Yao et al. [2] proposed a novel commuting pattern vehicle recognition algorithm based on LPR data and extracted commuting pattern vehicle recognition rules, which can identify commuting pattern vehicles on large sample datasets. Currently, the research on commuting pattern analysis based on LPR data focuses on extracting indicators reflecting commuting characteristics and then identifying commuting pattern vehicles on the road network. However, the analysis of spatiotemporal characteristics after identifying commuting pattern vehicles is relatively weak. That is, the current researches all study the differences between the commuting pattern and other travel patterns but lacks in-depth analysis of spatiotemporal travel behavior for commuting pattern vehicles only, especially on the aspect of key commuting routes.

In order to solve these problems of commuting pattern analysis based on LPR data, we mainly have the following two innovations. First, different from the previous research on commuting pattern analysis, this study first portraits the road network vehicles and identifies the commuting pattern vehicles and then extracts the key commuting routes and analyses the characteristics of commuting pattern based on the spatiotemporal trajectory of commuting pattern vehicles. Compared with the pattern recognition of the whole sample dataset, this study can reduce the heterogeneity of sample dataset and focus on the commuting pattern without being disturbed by other travel pattern. Second, the algorithm framework proposed in this study can mine key commuting routes, which cannot be obtained by the current research.

3. Data Description

3.1. The License Plate Recognition Data

This study uses LPR data for commuting pattern analysis, and the LPR data used in the case study comes from the entire month of June 2016 in the urban area of Hangzhou City, Zhejiang Province, China. The LPR data contain the information of passing cars captured by the cameras installed on the road network, which includes the encrypted license plate, the lane where the vehicle is captured, vehicle type, the time when the vehicle is captured, and the longitude and latitude of the shooting camera. The details and examples about the data are shown in Table 1. The distribution of all cameras is shown in Figure 1; the blue line represents the boundary of the urban area of Hangzhou City, which also contains most cameras. Therefore, the LPR data can well capture the vehicle travel behavior in the urban area of Hangzhou. The urban area of Hangzhou is also the main commercial area of Hangzhou. The largest hospitals, schools, and companies in Hangzhou are gathered here. This area is also the most congested area in Hangzhou. Therefore, the government has also implemented the travel restriction policy for this area.

3.2. Data Preparation

According to descriptive statistical analysis and data cleaning results of LPR data, the data used in this study showed that there were 1,472 cameras in June 2016 in Hangzhou, the average number of vehicles detected per day was about 1.255 million, and the average number of detection records per day was 8.81 million. After analysis, it was found that about 93.93% of vehicles were detected less than 20 times a day, and only about 6.07% of vehicles were detected more than 20 times.

The data quality problems mainly included three problems: data missing, incorrect license plate number recognition, and repeated detection of vehicles. The amount of missing data was counted, which accounted for 4.17% of the total data. Missing data refer to the record that at least one field missing, including license plate number, detected time, detected location, and vehicle type. Once there is a missing field, the subsequent algorithm will not work normally. The missing data were deleted. The analysis of license plate recognition errors showed that the amount of incorrect license plate number recognition only accounted for 0.03% of the total data, a very small proportion. The incorrect recognition data were directly deleted. As for the vehicle repeated detection problem, there are situations in which a camera not only captures vehicles in its own lane but also captures vehicles in adjacent lanes, so repeated detection of the same vehicle at the same time may occur when the adjacent lanes are also equipped with shooting cameras. The statistical results showed that the amount of repeated detection data accounted for 1.00% of the total data. For the repeated records, one of them was kept at random.

4. Spatiotemporal Characteristics of Commuting Pattern Vehicles

Commuting pattern vehicles have attracted extensive attention for their high proportion, strong travel regularity, small travel flexibility, and great impact on the traffic state of the road network. This section mainly identifies commuting pattern vehicles on the road network and then analyses the spatiotemporal travel characteristics of the commuting pattern vehicles, laying a foundation for the analysis of the subsequent sections.

Using the commuting pattern vehicle recognition method proposed by Yao et al. [2] to identify commuting pattern vehicles, a total of 124,570 commuting pattern vehicles in Hangzhou in June 2016 are obtained. The commuting pattern vehicle recognition method mainly consists of five steps. Firstly, the commuting travel behavior features are extracted, and a total of nine spatiotemporal features reflecting commuting travel behavior are extracted. Secondly, the dimension reduction is carried out. Because the extracted nine features have high correlation, the factor analysis method is used to reduce the dimensionality, and three linearly independent factors are obtained. Thirdly, three linearly independent factors obtained in the second step are used to cluster based on the iterative self-organizing data analysis technique algorithm (ISODATA) to obtain commuting pattern vehicle labels in the sub-dataset. Fourthly, the decision tree model is trained by using the sub-dataset that has the commuting pattern vehicle labels obtained in step 4, and the commuting rules are extracted on this basis. Fifthly, the commuting rules obtained in step 4 are used to identify the commuting pattern vehicles in the whole dataset. After identifying the commuting pattern vehicles, the spatiotemporal travel characteristics of the commuting pattern vehicles are analysed. First, analyse the first and last detected time distributions of the commuting pattern vehicles. The first and last detected time of a commuting pattern vehicle respectively reflects the time it first starts and ends the travel in a day, which are generally the time the commuter travels from home to work and back home from work. An interval of 30 minutes is used, so each day is divided into 48 intervals, which are 00 : 00-00 : 30, ..., 23 : 30-24 : 00. In each time interval of each day, count the number of vehicles whose first and last detected time is in it. Calculate the average number of vehicles in each time interval on all weekdays and weekends. The results are shown in Figure 2.

It can be seen from Figure 2 that on weekdays, the first and last detected times of commuting pattern vehicles are concentrated in the morning and evening peak hours, respectively, and there is almost no vehicle starting or ending traveling during the off-peak hours and night time. On weekends, the travel time distribution of commuting pattern vehicles is relatively more dispersed, and the travel volume is smaller. The peak value on weekends is about 1/3 of the peak value on weekdays.

The distribution of the detected frequency of commuting pattern vehicles in each hour is analysed, and the results are shown in Figure 3. It can be seen from Figure 3 that on weekdays, commuting pattern vehicles mainly travel during the morning and evening peak hours, and there are also some commuting pattern vehicles traveling during off-peak hours, the amount of which is far less than the amount of commuting pattern vehicles that travel during the morning and evening peak hours. On weekends, the travel distribution of one day is more evenly distributed, but there is also a phenomenon that more commuting pattern vehicles travel during the morning and evening peak hours.

In addition to analysing the commuters’ travel behavior from the temporal perspective, it is also very important to analyse the spatial distribution characteristics of commuting pattern vehicles. First, extract the first and last detected locations of the commuting pattern vehicles in the morning and evening peak hours, respectively. The first and last detected locations of a commuting pattern vehicle in the morning and evening peak hours are, respectively, near the commuter’s home and workplace, in general. Therefore, the analysis of the distribution of the first and last detected locations during the morning and evening peak hours can reflect the occupational and residential distribution in Hangzhou. This study adopts the research result of Yao et al. [2] to determine the morning and evening peak hours in Hangzhou City, which are 6 : 30–10 : 30 and 15 : 30–21 : 30, respectively. Figure 4 shows the heat maps of the first and last detected locations of commuting pattern vehicles during the morning and evening peak hours, respectively. The heat maps show that most of the first and last detected locations are concentrated in the urban area of Hangzhou, which is because that the LPR data collection equipment is mainly concentrated in the urban area of Hangzhou. It is also shown that the origin and destination points are evenly distributed in the urban area of Hangzhou, which means that the distribution of workplace and residence in Hangzhou is relatively balanced.

5. Commuting Pattern Mining and Analysis

After getting commuting pattern vehicles, in-depth commuting pattern characteristics can be obtained by mining the spatiotemporal trajectory of commuting pattern vehicles.

5.1. Spatiotemporal Trajectory Similarity Measurement

LPR data cannot accurately record the complete trajectory of all vehicles on the road network. Vehicles are captured and recorded only in the place where the detection equipment is deployed. For a commuting pattern vehicle , its spatiotemporal trajectory in a day can be expressed as Si = {[li1, ti1],…, [lij, tij],…, [lin, tin]}, where lij is the latitude and longitude of this vehicle when it is detected by the detector on the road network at time tj and n is the number of times this vehicle is detected in the whole day. Spatiotemporal trajectory Si can be divided into temporal trajectory STi and spatial trajectory SSi, where STi = {[ti1,…, tij,…, tin]}, SSi = {li1,…, lij,…, lin}. The spatiotemporal trajectory S of each commuting pattern vehicle can be mined, so a spatiotemporal trajectory set D = {S1,…, Si,…, SN} is generated every day, in which there are spatiotemporal trajectories of N commuting pattern vehicles. Similarly, spatiotemporal trajectory set can also be divided into temporal trajectory set DT and spatial trajectory set DS, where DT = {ST1,…, STi,…, STN}, DS = {SS1,…, SSi,…, SSN}.

Similarity measurement between different spatiotemporal trajectories is an important concept in pattern mining. There are many similarity measurement methods, the most common one is the use of Euclidean distance. In addition to this method, many other methods have been proposed, such as dynamic time warping (DTW) method [29]. In addition to judging the similarity of two spatiotemporal trajectories only from spatiotemporal trajectories, semantic similarity is also considered. A series of measurement methods that comprehensively consider trajectory similarity and semantic similarity have been proposed [30, 31], which can not only consider the temporal and spatial information of trajectory but also use the semantic information. This study mainly focuses on the spatiotemporal characteristics of commuting pattern and does not consider semantic information. Therefore, the DTW method is adopted to measure the similarity of different spatiotemporal trajectories. DTW method was first applied to speech matching, and then, it was continuously expanded and applied in many other fields. Now, it is also widely used in the transportation field, such as in traffic flow theory and traffic control [32, 33]. The DTW algorithm first gets the optimal matching of two trajectories and then calculates the similarity of the two trajectories. Its advantage is that the two trajectories can be of unequal length, and this feature happens to be required in the comparison of trajectory similarity, because the number of detected times of different vehicles in a day is often different. In order to understand the DTW algorithm better, the important concepts and algorithm steps are to be introduced next.

Assume that there are two travelers whose temporal trajectories are STi and STj, where STi = {[ti1,…, tin]}, STj = {[tj1,…, tjm]}. In order to use the DTW algorithm to obtain the similarity of the two temporal trajectories, construct an n∗m matrix Mn,m as shown in (1) first,

In (1), duw denotes the distance between tiu and tjw, and it is calculated as (2)where sec(.) is the function that converts time into seconds.

After getting the above distance matrix Mn,m, warping path W needs to be calculated. W is a continuous sequence of elements in which the elements are elements in Mn,m. W defines a mapping between STi and STj. W is shown in Equation (3),

W needs to meet the following three conditions.(1)Boundary conditions: = (1, 1), = (n, m), which means that W starts with the d11 element of matrix Mn,m and ends with the dnm element.(2)Continuity: Given -1 = (a’, b’), then for the next point of the path = (a, b), and need to meet the conditions that a - a’ ≤ 1 and b -b’ ≤ 1. This condition requires that adjacent elements in W can only be adjacent elements or diagonal elements in matrix Mn,m.(3)Monotonicity: Given -1 = (a’, b’) and = (a, b), it needs to be ensured that a – a’ ≥ 0 and b – b’ ≥ 0. This condition makes the elements in W monotonic in time.

There are many paths that satisfy the above three conditions. The similarity between the two temporal trajectories is defined as Equation (4),

Generally, we set when the path is from (a-1, b-1) to (a, b) and when the path is from (a-1, b) or (a, b-1) to (a, b). This setting compensates for the different lengths of different warming paths. The similarity based on the DTW algorithm can be calculated efficiently by the dynamic programming algorithm, but the calculation time complexity of dynamic programming algorithm is still high, so the similarity based on the DTW algorithm is generally slow.

The similarity calculation method between two temporal trajectories based on the DTW algorithm is introduced above. The similarity calculation method between the two spatial trajectories SSi and SSj is similar, but when constructing matrix Mn,m, the calculation method of element distance between two spatial trajectories needs to be changed to Equation (5),

In (5), liu denotes the u-th element in SSi, ljw denotes the -th element in SSj, and dist(.) is the function that gets the straight-line distance.

5.2. Commuting Pattern Characteristics Mining

With the help of the DTW algorithm, the similarity between the spatiotemporal trip chains of any two commuting pattern vehicles can be calculated. After obtaining the similarity matrix Ws, how to mine the commuting pattern characteristics through Ws is the key problem to be solved in this section.

Since the trajectories of commuting pattern vehicles on the road network are different, and even a large number of them are completely opposite in space, there will be a lot of outliers and abnormally large values in Ws. These abnormal values will cause great interference to the results. Most clustering algorithms require the data to be cleaned first, and cluster analysis is performed when there is no obvious abnormality in the data. This kind of clustering algorithm cannot play a good role in this study. We tried the spectral clustering and hierarchical clustering and found that spectral clustering and hierarchical clustering cannot successfully mine the characteristics of commuting pattern through Ws. The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm [34] is different from the partition and hierarchical clustering algorithm. It defines the cluster as the largest set of points connected by density, divides the high-density area into clusters, and can identify and filter the noise points. Therefore, the DBSCAN algorithm is very suitable for the situation. This study will use the DBSCAN algorithm to mine commuting pattern characteristics based on matrix Ws.

The DBSCAN clustering algorithm is a density-based clustering algorithm. It does not need to determine the initial number of clusters, which can be determined automatically by the algorithm. In addition, it can automatically eliminate outliers. The DBSCAN algorithm divides all sample points into three categories. The first category is the core points. A point will become a core point if and only if more than minpts points are within distance eps from the core point. A point is reachable if its distance from the core point is less than eps. The second category is the density-reachable points. Point q will become a density-reachable point from p, if and only if a path that starts and ends with p and q, respectively, and the distance between any two adjacent points in this path is less than eps. The third category is the outliers. A point will become an outlier if and only if it cannot be reachable from any other point. With the help of the definition of these three categories of points, the samples can be automatically divided into different clusters.

The pseudocode of the DBSCAN clustering algorithm is shown in Algorithm 1.

Input: Similarity matrix Ws, a pre-defined parameter minpts, a pre-defined parameter eps.
Output: Cluster set C = {C0,…, Ck}
 Algorithm steps:
(1)Traverse all objects to get the core points set Ω, if Ω is an empty set, clustering cannot be performed; otherwise, proceed to the next step;
(2)Set k = 0, mark all samples as unvisited, randomly select a core point p and set it as visited, initialize the queue Q = <p>;
(3)Take out the elements in the queue Q in turn until Q becomes an empty queue, if the element is a core point, add the unvisited object in its eps neighborhood to the queue Q;
(4)Generate the cluster Ck, which includes all the objects that have changed from unvisited to visited this time, k = k + 1;
5.3. Case Study

Based on LPR data, this study mines the characteristics of the commuting pattern during morning and evening peak hours in Hangzhou. Commuting pattern vehicles are defined as motor vehicles that travel to and from the same origin and destination in a certain regularity over a long period of time. They usually travel during the morning and evening peak hours and travel between home and workplace or home and school. For the mining of the commuting pattern, the most important is the commuting behavior during the morning and evening peak hours. According to the analysis of Yao et al. [2], the morning and evening peak hours in Hangzhou are 6 : 30–10 : 30 and 15 : 30–21 : 30, respectively. This study uses these two time periods for analysis. First, mine the spatiotemporal trip chains of all commuting pattern vehicles during morning and evening peak hours based on the LPR data. Next, construct the spatiotemporal trajectory set D and create the temporal trajectory set DT and the spatial trajectory set DS, respectively. After that, construct the temporal similarity matrix MT and the spatial similarity matrix MS based on DT and DS, respectively, through the similarity measurement method of spatiotemporal trajectory proposed in section 5.1. Then, construct the spatiotemporal similarity matrix MST:where α denotes the scale factor, which is set to 0.5 in the case of commuting pattern mining in Hangzhou. Different values can be set according to the temporal trajectory similarity and spatial trajectory similarity in the actual analysis to realize the different weight distribution of time dimension and space dimension in the spatiotemporal similarity matrix.

In order to describe the commuting pattern more accurately, we select the data of three days on June 13 (Monday), June 15 (Wednesday), and June 17 (Friday) in 2016 for commuting pattern mining. Because of the large amount of data and the high time complexity of constructing the similarity matrix based on the DTW algorithm, the results cannot be obtained in an acceptable time when using the data of the whole Hangzhou City. Therefore, random sampling is carried out every day at a sampling rate of 7% during the morning and evening peak hours. As a result, about 5,000 commuting pattern vehicles during the morning and evening peak hours, respectively, every day are randomly sampled for analysis. Since a total of three weekdays of data are used for analysis, if the conclusions are consistent, it can be indicated that randomness has little influence on the results and the conclusion is relatively robust. After obtaining the spatiotemporal similarity matrix MST, the commuting pattern characteristics are mined based on the DBSCAN algorithm. The results show that the DBSCAN algorithm can reveal the commuting pattern characteristics very well. The specific results will be shown in section 6. All the experiments are processed on a laptop with an Intel Core i7, 2.6 GHz CPU with 8 GB DDR. The data are stored in the Oracle database, the algorithm framework is implemented using Python 3.7, and the experimental settings are described as follows: data cleaning is implemented in the Oracle database, and data preprocessing, commuting pattern vehicle recognition, and spatiotemporal similarity matrix construction are all implemented by the writing code in Python 3.7. The DBSCAN algorithm is implemented by the scikit-learn package.

In Section 4 and section 5, we introduce a key commuting route mining algorithm based on LPR data. Figure 5 gives the framework of the algorithm. With the framework, urban key commuting routes can be identified, which can provide support for policy-making.

6. Results and Discussion

6.1. Analysis of Commuting Pattern Characteristics Based on DBSCAN Algorithm

Commuting pattern analysis is conducted on the data of three days in total on June 13 (Monday), June 15 (Wednesday), and June 17 (Friday) in 2016. Take June 15, 2016, as an example for visualization. The visualization results of other dates can be seen in Attachment 1 of Supplementary Materials. Figures S1S4 in the Supplementary Material are the visualization results of the first four clusters during the morning peak hours on June 13 based on the DBSCAN algorithm. Figures S5S8 in the Supplementary Material are the visualization results of the first four clusters during the evening peak hours on June 13 based on the DBSCAN algorithm. Figures S9–S12 in the Supplementary Material are the visualization results of the first four clusters during the morning peak hours on June 17 based on the DBSCAN algorithm. Figures S13S16 in the Supplementary Material are the visualization results of the first four clusters during the evening peak hours on June 17 based on the DBSCAN algorithm. The commuting pattern during morning peak hours on June 15, 2016, is clustered into 94 clusters. The first four clusters have significantly more vehicles than other clusters, which include 122, 100, 81, and 63 vehicles, respectively. After excluding outliers, the proportions of the number of vehicles in the first four clusters to the total number of vehicles are 11.2%, 9.1%, 7.4%, and 5.8%, respectively. The visualization results of each vehicle trajectory of the first four clusters are shown in Figure 6. The green dots represent the first detected locations of the vehicles, and the blue dots represent the last detected locations. The trajectories are represented by the orange lines. The darker the color, the more the vehicles passing through the trajectory. It should be noted that the outliers here do not refer to the outliers caused by detection errors but to the outliers obtained in the analysis process of the DBSCAN algorithm. These outliers represent the vehicles whose spatiotemporal trajectories differ greatly from those of other vehicles in the data. Cluster 0 and Cluster 2 are both from the north to the south of the urban area of Hangzhou, and Cluster 1 and Cluster 3 are both from the south to the north of the urban area of Hangzhou. The origins of Cluster 0 and Cluster 2 are mostly concentrated in Xihu District, where there are many residential areas and schools, and the destinations are mostly concentrated in the center of Hangzhou and Qianjiang New City, where there are many commercial areas and large companies. The directions of commuting flow of Cluster 1 and Cluster 3 are just opposite to those of Cluster 0 and Cluster 2, mostly from south of the urban area of Hangzhou to the north of Xihu District. This is because in recent years, the government of Hangzhou City has focused on reducing the imbalance between job and housing in urban planning, and job opportunities have increased a lot in Xihu District. In addition, with the development of technology companies, such as Alibaba group, Yuhang District has also developed into a gathering area of big technology companies. However, there are no data in Yuhang District in this dataset. Therefore, the destinations are mostly distributed in the north of Xihu District, which is the necessary area to Yuhang District. It is observed that the situations on Monday and Friday are similar to that on Wednesday, indicating that the main commuting flows during the morning peak hours of Hangzhou on weekdays are from the north to the south of Hangzhou, mainly from Xihu District to the center of Hangzhou and Qianjiang New City. Combined with Figure 6, the key routes of commuting pattern vehicles include Yuhangtang Road, Moganshan Road, Shangtanggaojia Road, Zhonghegaojia Road, and so forth. And the commuting flows in the opposite direction are also very obvious. When optimizing traffic control, special attention should be paid to these two flow directions, the key commuting routes, and the traffic management strategies of origin and destination areas. According to the key commuting routes in the morning peak hours, the signal control in the morning peak hours can be optimized. For example, the green wave control of Yuhangtang Road and Moganshan Road can be considered to be implemented. In addition, the boundary of the travel restriction policy can be adjusted based on the key commuting routes mining result. For example, the restricted area can be adjusted as the area that contains the main commuting routes. In this way, the effect of the travel restriction policy can be ensured, while the restricted area is reduced.

Similarly, a total of 75 clusters are obtained during evening peak hours on June 15, 2016. The first four clusters have significantly more vehicles than other clusters, which include 144, 105, 88, and 30 vehicles, respectively. After excluding outliers, the proportions of the number of vehicles in the first four clusters to the total number of vehicles are 18.4%, 13.3%, 11.2% and 3.8%, respectively. The visualization results of each vehicle trajectory of the first four clusters are shown in Figure 7. The directions of commuting flows during evening peak hours are opposite to those during morning peak hours. The flows in Cluster 0, Cluster 1, and Cluster 3 are from the south to the north of the urban area of Hangzhou, and flows in Cluster 2 are from the north to the south of the urban area of Hangzhou. It shows that the main commuting flows during the evening peak hours are from Hangzhou city center and Qianjiang New City to Xihu District, and there are also a certain percentage of vehicles in the commuting flows in the opposite direction. It is also observed that the situations on Monday and Friday are similar. The commuting pattern during the morning and evening peak hours further verifies that the commuting pattern is indeed a pattern that has strong regularity, from home to work during the morning peak hours and back to home from work during the evening peak hours. According to the analysis results in the morning and evening peak hours, it can provide support for signal control optimization and policy formulation. For example, the green wave control of Yuhangtang Road and Moganshan Road can be considered to be implemented. For Hangzhou, the setting of tidal lane may not be very suitable, because there are large commuting flows in both directions. In addition, the boundary of travel restriction policy can also be adjusted according to the analysis results.

6.2. Analysis of the Land Use of Origin and Destination

There is an obvious commuting pattern during the morning and evening peak hours in Hangzhou. Based on the commuting pattern mining algorithm framework proposed in this research, the characteristics of the commuting pattern in Hangzhou have been mined successfully. In order to further analyse the characteristics of the commuting pattern, Point of Interest (POI) data from Amap belonging to Alibaba group are crawled and are combined to analyse the relationship between the commuting pattern and land use.

This study expects to use POI data to reflect the type of land use around a certain location. In order to achieve this goal, the POI types are reclassified, referring to the “GB/T 21010-2017 [35]” and “GB 50137—2011 [36].” In this study, the POI data are reclassified into four categories, as given in Table 2.

Take the morning and evening peak hours on June 15, 2016, as an example to show the relationship between the commuting pattern and land use. Generally, this kind of analysis has two processing ideas: first, grid division of cities and second, analysis through the land use properties of the buffer area. Since the origin and destination in this study are gathered at the detection equipment, rather than scattered in the whole city, the second method is used for analysis. The setting of the buffer area radius is still a problem that has not been determined yet. Most of the existing studies are set to 1 km [3739], and some studies are set to 500 m or other distances [40]. Due to the characteristics of LPR data, the origin and destination points gather at the detection equipment, and the detection equipment is installed at the intersection; in order to understand the land use properties of the origin and destination points, we need to set the radius of the buffer area large, so the buffer area radius is set as 1 km in this study. The origins and destinations of Cluster 0 are given in section 6.1. Set the buffer area with the origin and destination points as the center and 1 km as the radius, and analyse the land use types within the buffer area of the origin and destination points separately. Since the number of POIs in each category is quite different, it is difficult to get a credible conclusion if the number of POIs in each category is compared directly. In order to remove the influence of this factor, the frequency density (FD) and category ratio (CR) of each reclassified category are calculated through (7) and (8),

FDi denotes the frequency density of POIs of a class i in the buffer area, ni is the number of POIs of the class i in the buffer, Ni is the number of POIs of the class i in the whole Hangzhou City, and CRi denotes the CR of POIs of the class i.

Calculate the CRi within the buffer area of the origin and destination points of Cluster 0 during the morning and evening peak hours, respectively. The result is shown in Table 3. It can be found that, in general, the POI category distributions corresponding to the origin and destination points during the morning and evening peak hours are all relatively uniform. That is, the origin and destination points during the morning and evening peak hours are all the land of mixed categories. The separation of workplace and residence is not very obvious, which is also the effect of Hangzhou government’ insistence on job-housing balance in the later urban planning process. However, even so, by observing the CR results of the origin points and the destination points during the morning peak hours, it can be found that CR1 of the origin points during the morning peak hours is greater than that of the destination points during the morning peak hours, CR2 of the destination points during the morning peak hours is greater than that of the origin points during the morning peak hours, and CR2 of the origin points during the evening peak hours is greater than that of the destination points during the evening peak hours, and CR1 of the destination points during the evening peak hours is greater than that of the origin points during the evening peak hours. It still reflects a commuting pattern between the land of residential area and life services category and the land of commercial area and companies category.

Calculate the CRi within the buffer area of the origin and destination points of Cluster 1 during the morning and evening peak hours, respectively. The result is shown in Table 4. In Cluster 1, the origin and destination points during the morning and evening peak hours are all the land of mixed categories as well, because the POI categories are relatively evenly distributed. This means that the commuting pattern of this cluster also occurs between the land of mixed categories.

7. Conclusions

Commuting pattern is one of the most important travel patterns on the road network. In this study, a framework of key commuting route mining algorithm based on LPR data is proposed. It can be migrated to any similar spatiotemporal data, such as GPS trajectory data and mobile phone signaling data. Commuting pattern vehicles are first extracted, and then, the spatiotemporal trip chains of all commuting pattern vehicles are mined. And the spatiotemporal similarity matrix is constructed by the DTW algorithm based on the spatiotemporal trip chains. Finally, the characteristics of the commuting pattern are analysed by the DBSCAN algorithm. The algorithm framework proposed in this study mainly includes two innovations. First, different from the previous research on commuting pattern analysis, this study first portraits the road network vehicles and identifies the commuting pattern vehicles and then extracts the key commuting routes. Compared with the pattern recognition of the whole sample dataset, this study can reduce the heterogeneity of sample dataset and focus on the commuting pattern without being disturbed by other travel pattern. Second, with the algorithm framework proposed in this study, key commuting routes in the city can be mined. This study uses Hangzhou City as an example to verify the algorithm framework and has successfully mined the commuting pattern characteristics and key commuting routes in Hangzhou. It is found that the main commuting flows during the morning peak hours of Hangzhou on weekdays are from the north to the south of the urban area of Hangzhou, mainly from Xihu District to the center of Hangzhou and Qianjiang New City. And the commuting flows from the south to the north of the urban area of Hangzhou are also very obvious. The key commuting routes include Yuhangtang Road, Moganshan Road, Shangtanggaojia Road, Zhonghegaojia Road, and so forth. Therefore, these two flow directions and the key commuting routes should be considered when formulating traffic demand management measures or optimizing signal control. On weekdays, the commuting characteristics during the morning and evening peak hours are just opposite in direction. As this example shows, the use of the algorithm framework proposed in this study mainly works in the following aspects: first, it exposes some problems on the road; second, it can provide support for signal control optimization; third, it helps policy formulation. Further explaining these functions, in Hangzhou, after the mining of commuting pattern characteristics and key commuting routes, it is found that the setting of tidal lane may not be very suitable, because there are large commuting flows in both directions. This corresponds to the first aspect. As for the second aspect, some signal control measures are considered to be implemented, like the green wave control of Yuhangtang Road and Moganshan Road. Besides, the boundary adjustment of travel restriction policy is also considered, corresponding to the third aspect. In addition to the functions of the three aspects mentioned above, the method proposed in this paper can also provide support for bus companies to optimize bus routes and for ride-hailing companies to formulate business strategies.

Combining multi-source data to mine commuting pattern is the focus of future research. Different data sources have their own advantages; therefore, integrating multi-source data for pattern analysis will lead to more meaningful conclusions.

Data Availability

The data supporting the results in this paper were authorized by “Hangzhou City Brain.” Thus, the database of LPR data could only be accessible with the authorization and permission of the organization.

Conflicts of Interest

The authors declare that there are no conflicts of interest in this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. 92046011).

Supplementary Materials

Attachment 1: visualization results of the first four clusters during the morning peak hours and evening peak hours on June 13 and June 17, respectively, based on the DBSCAN algorithm. Figures S1–S4: the visualization results of the first four clusters during the morning peak hours on June 13 based on the DBSCAN algorithm. Figures S5–S8: the visualization results of the first four clusters during the evening peak hours on June 13 based on the DBSCAN algorithm. Figures S9–S12: the visualization results of the first four clusters during the morning peak hours on June 17 based on the DBSCAN algorithm. Figures S13–S16: the visualization results of the first four clusters during the evening peak hours on June 17 based on the DBSCAN algorithm. (Supplementary Materials)