Abstract
Predicting spatiotemporal congestion situations of a traffic network is a prerequisite for urban traffic control. This study proposes a spatiotemporal traffic congestion situation prediction method based on the recurrent gated unit-convolutional neural network (GRU-CNN). Considering the time and space attributes of traffic data, the third-order tensor of the traffic data is extracted from the time domain, and the GRU is used to predict the traffic flow parameters of the traffic network. Then, the third-order tensor of multisource spatiotemporal traffic data is compressed into traffic data images and combined with the spatial structure. The feature extraction technology of a CNN is used to extract and identify the traffic network congestion features. Actual urban traffic network data are selected for model verification. The multistep prediction of the traffic flow parameters effectively ensures prediction accuracy. The proposed model is trained by the actual classification dataset. The prediction results of the test set demonstrate the model’s reliability. Based on predicting the traffic parameters of the network, this model can give a highly accurate judgment of the traffic situation for the entire network. Compared with other models, the proposed model further improves the accuracy of road network traffic state discrimination and has better robustness.
1. Introduction
Although the traffic flow of an urban road network changes over a certain period, its basic structure is fixed for a long period [1]. Therefore, the changes in the traffic flow in the urban road network are closely related to the choices of travelers. Identifying traffic congestion conditions can help travelers and decision-makers quantitatively grasp the changes in traffic conditions and decide on path selection, road planning, and traffic control according to the changes [2, 3]. For the development of intelligent transportation, it is necessary to expand the analysis of traffic data characteristics for a single road section to the overall traffic state characteristics at the road network level [4].
This study takes an urban traffic network as a whole and analyzes its operation. The topology of the traffic network is abstracted based on the location and correlation of the detector. As the road network structure is considered fixed over a long period of time, the defining characteristic of urban traffic network change is the time-varying traffic flow, and the specific storage object is the traffic data, which are continuously generated [5]. Due to the fixed location of the detection equipment, the acquired traffic data can match with the corresponding spatial information, and the changes in that data can depict the dynamic characteristics of the urban traffic network [6]. Therefore, considering the actual situation of the current large amount of traffic data, this study combines the constructed urban traffic network model and the description form of the traffic data tensor to determine the spatiotemporal state matrix of the dynamic road network and combines the time and space perspectives to predict traffic congestion. Considering the temporal and spatial attributes of the traffic data, third-order tensors of the traffic data are extracted from the time domain, and a GRU is used to predict the traffic flow parameters of the traffic network. Afterward, the third-order tensors of multisource spatiotemporal traffic data are compressed into traffic data images and combined with the spatial structure. A CNN is used to extract and identify traffic network congestion features. This paper combines the advantages of GRU and CNN to predict traffic parameters in the time domain and identify traffic states in the space domain and finally provide information support for traffic diversion after regional traffic congestion identification.
2. Evaluation of Traffic Congestion in an Urban Road Network
A traffic congestion status is generally a category of traffic congestion within a selected area. It is used either to reflect the subjective feelings of people stuck in traffic or to meet the requirements for decision-making of management agencies. Specific grades are divided and described by the corresponding segments of common indicator variables or traffic parameters. The Road Capacity Manual published by the Transportation Research Board divides the traffic service level into six grades according to selected parameters. Germany proposed dividing the traffic status of an expressway into five levels according to the intervals of the traffic density [7]. In 2016, the Beijing Transportation Development Research Center proposed the Evaluation Index System of Urban Road Traffic Congestion, which classified different types of traffic status based on sections and defined road network congestion [8]. In this system, the traffic classification of sections is mainly based on the average speed of the section. The congestion degree of the road network is divided into very smooth, smooth, slightly congested, moderately congested, and severely congested.
There are two problems with the current standards of traffic congestion. First, there is only one single measurement index, and the state prediction lacks comprehensiveness. Second, there is a limited number of standards for classifying the traffic status of a road network, and the daily traffic congestion index ignores the changes in rules of the road network on each statistical day [9, 10]. Therefore, the five-level division of road network congestion is chosen as the classification of traffic congestion levels for the indicator system. However, two aspects should be added to the corresponding indicators. The first is to increase the comprehensiveness of the state measurements. Three common traffic parameters are used as the basis of measurements to jointly analyze the congestion situation. The second aspect is the percentage of traffic congestion mileage every 15 minutes, as shown in Table 1. It facilitates the analysis of daily traffic congestion changes in the road traffic network.
The calculation of the percentage of road congestion mileage every 15 minutes is as follows: (1)Taking 15 minutes as the statistical interval, the operation level of each road section is judged jointly according to the three traffic parameters. Clustering is used for the classification of the above levels(2)The percentage of road section mileage running in a severely congested road network is calculated (road section mileage is included as the coefficient)(3)The percentage of road network congestion mileage in 15 minutes is calculated by weighting the vehicle mileage (select the recommended value of the proportion of vehicle mileage in the index system)
3. Image Extraction Method of Traffic Data
3.1. Construction of the Urban Traffic Network Model considering the Detector Layout
Each collected traffic dataset contains temporal and spatial attributes. Since the time attribute is the first-order attribute and the space attribute is a second-order attribute, the actual traffic data are represented as a third-order tensor.
Urban traffic is a complex, open, self-adaptive system characterized by abrupt changes [11]. Based on graph theory, the abstract network topology method for traffic feature extraction is discussed. According to the basic definition in graph theory [12], a graph consists of a set of nodes and a set of edges connecting pairs of nodes in V. If the graph contains the set of edge weights corresponding to the elements in E, the graph is called a weighted graph and denoted as .
In this paper, the traffic data are from fixed detectors. Generally, fixed traffic detectors are placed at different locations where the flow of a section changes. Common locations are the entrances of traffic flow at intersections and ramps. To facilitate the loading of traffic data, the traffic network topology constructed in this section is mainly composed of road connection points connected through sections with limited lengths; that is, the original abstract method is used to construct the urban traffic network. In actual traffic operations, the traffic flow changes at the entrances and exits of intersections and ramps, so the traffic flow varies on the street sections in the urban network. A fixed point where the traffic flow state changes obviously is regarded as the connection point between sections, which is abstracted as a node in the urban road network. The network model adopts the directed graph structure in the original abstract method, where some parameters, including the road grade, number of lanes, and length, are not considered when extracting the topology. The dynamics of the network are determined by the traffic flow on the roads, which is embodied in the dynamic traffic data of the traffic network.
The traffic data entity is described as , where represents the time at which the detection data are collected, represents the -th traffic parameter in the detector, and (the traffic detector can obtain many traffic parameters; however, only three common traffic parameters are selected). Common traffic data entities are described as {}, where the corresponding time of the data is set as , the location-averaged speed data are represented by , the traffic flow data are represented by , and the timeshare data are represented by . For the traffic network, is the set of nodes, the included element denotes the nodes, and represents the location where the actual road traffic flow state changes significantly. represents the set of road sections, the included element denotes the edges, and represents a one-way road section in the traffic network and corresponds to the road section in which the data collected by the detector , where , and . is the set of edge weights in which is the weight of edge , and represents the traffic parameter data matched by the corresponding section . To clarify the problem, three different parameters representing the traffic state are selected to form the traffic network with dynamically changing parameters. If ( is the location-averaged speed on section linkID corresponding to the sequence pair in the network within the same time segment, and represents the maximum speed limit on the section interval), the network can be expressed as . If ( is the traffic flow on section linkID corresponding to sequence pair in the network within the same time segment), the network can be expressed as . If ( is the time occupancy on section linkID corresponding to the sequence pair in the network during the same time segment), the network can be expressed as .
3.2. Traffic Data Tensor Description Based on Traffic Network Structure
A tensor is a multidimensional array of data [13]. Tensors can be understood as extensions of vectors and matrices in a multidimensional space. Scalars, vectors, and matrices are representations of tensors in low-dimensional spaces: scalars are represented as zero-order tensors, vectors are first-order tensors, and matrices are second-order tensors. Each element in a tensor is associated with multiple indexes. The expression form of a tensor () is shown in Equation (1), where is a positive integer and is called the order of the tensor. If , then is called the dimension of tensor . An -order tensor with dimensions which are real numbers is denoted as .
In order to clearly describe the tensor characteristics of the traffic data in the traffic network described above, spatial information of the traffic data is represented as a network adjacency matrix. The adjacency matrix is a matrix representing the network structure as in Equation (2), where represents the element of the adjacency matrix . In Equation (3), and are node numbers of the traffic network, , and the corresponding graphical diagram of is shown in Figure 1(a). The third-order tensor of space-time traffic data is shown in Equation (3), where and are node numbers of the traffic network, , and is the sequential number of traffic data collection time. The adjacency matrix corresponding to the urban traffic network weighted by the average road speed at a certain time is used to describe the network. Figure 1(a) is the heatmap of the adjacent matrix with 100 nodes in the network.

(a)

(b)
The time attribute of the traffic data is combined with the spatial attribute of the road section, where the data is generated by being added to the adjacency matrix reflecting the weighted data. The data collected in the urban traffic network can be represented as a third-order tensor using three coordinates: the relevant intersections (nodes) of the data generated sections are taken as the spatial row and column coordinates, and the time when the data is generated is taken as the time coordinate. Figure 1(b) shows an example of the third-order tensor representing spatiotemporal traffic data.
3.3. Tensor Compression of Spatiotemporal Traffic Data Based on Tensor Tube Fibers
Extracting the tube fibers of the traffic data tensor maintains the temporal relationship in the traffic data. The corresponding tube fibers of each one-way road section are extracted from the third-order tensor of the space-time traffic data, and then, they are tiled and reconstructed to form a second-order tensor of the space-time traffic data. The row coordinates of each data recombination are labeled according to the sequence of the tensor tube fibers, and the column coordinates of each data recombination are labeled according to the sequence of the sections on the front of the tensor. The extraction process is illustrated in Figure 2. The front section of the third-order tensor of the spatiotemporal traffic single-parameter data is an image representation of the adjacency matrix corresponding to the traffic network. The elements of the adjacency matrix are shown in Equation (2). To extract the second-order tensor of the spatiotemporal traffic data, it is assumed that represents the number of ordinal couples in in the directed traffic network and represents the total number of frontal slices of the third-order tensor of the spatiotemporal traffic data. The expression of elements in the second-order tensor of the space-time traffic data is shown in Equation (4). The traffic data tensor is essentially a way of describing traffic data in both time and space dimensions. To extract the time characteristics of the traffic data, the data retaining the time sequence relationship are extracted first while ignoring the spatial relationship from the time dimension, and then, the extraction results are converted into the spatiotemporal traffic data tensor compression matrix.

In Equation (4), section corresponds to the number of ordinal couples ; that is, according to the order of the edge in the adjacency matrix of the traffic network, the main order is the row, and the order is . represents the frontal slice number of the third-order tensor of the corresponding spatiotemporal traffic data, . To facilitate image processing, is standardized, as shown in Figure 2.
3.4. Extraction of Traffic Data Tensor considering the Combination of Multiparameter Data
This section selects three parameters to describe the temporal and spatial traffic data together according to the actual detection situation and computational complexity. According to the extraction method described in the previous section, each parameter can be described as a second-order tensor of the space-time relationship and , where the row vector is extracted at each time . Considering the high performance of CNN in deep learning theory for image processing, a matrix of traffic data based on time is constructed, and the matrix data are provided by vector . If the value of of ordered pairs in the original graph is less than the number of elements in the matrix, the corresponding zero-complement operation will be performed at the last position of the matrix.
The analysis of the traffic state mainly consists of the identification and evaluation of single-parameter data, such as the daily traffic congestion index. This data mainly determines the operation level of each section in the road network through the average travel speed at the section. Evaluation of single-parameter data is simple with low computational cost; however, the identification of the traffic state is not reliable. For example, if the actual state of the traffic flow on the road is free, the traffic state determined by the subjective driving speed of a driver is inconsistent with reality. Therefore, it is more reasonable to evaluate the traffic state by combining multiple commonly used traffic parameters. The image extracted from the time-based multiparameter traffic data is , where represents the RGB color channel of the image: represents the red channel (corresponding to the average traffic speed parameter ), represents the green channel (corresponding to the traffic flow parameter ), and represents the blue channel (corresponding to the traffic time occupancy parameter ). An example of an RGB image is shown in Figure 3. The spatial traffic data in red, green, and blue are allocated according to the different traffic parameters, as shown in Figures 3(a)–3(c). The superposition of the three channels is shown in Figure 3(d). The “stacking” of multiple traffic parameters is realized by the three-layer channel stacking of the RGB images. This data representation is useful for identifying the network traffic state represented by multiple traffic parameters through image processing in subsequent steps.

(a)

(b)

(c)

(d)
4. Spatiotemporal Congestion Prediction for Road Networks Based on Deep Learning
A large amount of traffic data generated every day also plays a key role in improving the generalization ability of deep learning models [14]. With recent advances in architecture design, deep learning has demonstrated the capabilities of fitting complex functions in various applications [15].
Deep learning is a subtype of machine learning developed based on neural networks. With a “deep” neural network structure, it overcomes multiple drawbacks of traditional machine learning techniques. The hierarchical structure of deep learning is composed of several layers between the input layer and the output layer, and a nonlinear information processing unit formed by these hierarchical structures can realize feature learning [16]. Considering this characteristic, deep learning extracts features from the original data through a deep neural network and eliminates the empirical setting of the original method. Many case studies have shown that high-level features extracted by deep neural networks are highly effective if sufficient training data exist.
Based on the prediction of traffic network parameters by a cyclic neural network, the congestion state of the spatial traffic network is identified to predict the overall situation of the traffic network. Figure 4 presents the temporal and spatial congestion situation and prediction model. The model consists of two parts: the time feature extraction of the traffic network using GRU and the congestion feature extraction of the traffic network using CNN.

5. GRU Model for Multistep Prediction of Traffic Flow
The first part of the prediction model uses the GRU of the neural networks to predict the network traffic parameters. Due to the congestion of the road network, more common evaluations of the road network traffic state parameter should be considered. Thus, a tensor is extracted from multiple parameters based on the road traffic data in time and space. And the corresponding parameters of the time series and the parallel GRU-assisted multistep iterative method are used to extract the time characteristics of the data for multistep prediction of the traffic parameters. The predicted traffic parameters will be used to identify the congestion state of the traffic network.
GRU is a variant of the long short-term memory (LSTM) neural network. It maintains the effectiveness of LSTM while having a simplified structure [17]. It merges the input gate and the forget gate in LSTM to form an update gate. GRU only contains an update gate and a reset gate. The update gate mainly determines the amount of hidden layer information from the previous time step that can be transferred directly to the current time step. The reset gate determines how much hidden layer information from the previous time step contributes to generating the current storage. Like LSTM, GRU contains a gated unit that regulates the information flow inside the unit. GRU replaces the original self-updating storage state unit with a hidden state, which makes GRU more effective in data training.
Let the data be , where represents the input at time and is the length of the time series. The calculation of GRU is mainly to output the hidden state of each unit and the hidden state of the -th GRU at time t, as shown in Equation (5), where is a linear transformation of the input vector . The GRU algorithm contains four important parts.
Considering the basic time interval of the congestion evaluation and the time rule of the continuous expansion of traffic congestion, the time delay of the model is determined. In addition, for the values of , , and in the predicted congestion state, the three time steps , , and correspond to the prediction results in this part; that is, the traffic parameter values of time steps , , and after the current time step can be predicted using historical data.
Details of the algorithm are as follows: (Step 1)Identify the traffic situation according to three common parameters: the average speed, flow, and time occupancy.(Step 2)Extract the tube fibers (i.e., traffic parameter time series) of each among sections from the third-order tensors corresponding to the three traffic parameters, determine the GRU model parameters , and group the time series of each section according to the model delay .(Step 3)Extract the front section of each time interval from the third-order tensors corresponding to the three types of traffic parameters. (1)Step 3.1: with a statistical interval of 15 minutes, judge the operation level of each section jointly according to the three traffic parameters.(2)Step 3.2: set , and calculate the mileage percentage of road sections running at grade 5 in the frontal sections.(3)Step 3.3: calculate the percentage of road network congestion mileage in 15 minutes by weighting the number of car kilometers.(4)Step 3.4: according to the rules in Table 1, map out the corresponding road network traffic status of each group in the historical data.(Step 4)Use the GRU traffic parameters to predict, and estimate a group of data test sets for each section of the road network.(1)Step 4.1: initialize the number of units and network structure, , and the input vector in each group of data in the training set.(i)Step 4.1.1: determine the input of the -th GRU.(ii)Step 4.1.2: calculate and execute the reset gate in the -th cell according to Equation (6), where represents the input weight vector, represents the cyclic weight vector of the reset gate, represents the bias vector, represents the input vector at time t, and represents the sigmoid activation function .(iii)Step 4.1.3: calculate and execute the candidate state in the -th unit according to Equation (7), where tanh represents the hyperbolic tangent function () and represents vector multiplication.(iv)Step 4.1.4: calculate and execute the update gate in the -th GRU according to Equation (8), where represents the input weight vector, represents the cyclic weight vector of the update gate, represents the bias vector, represents the input vector at time , and represents the sigmoid activation function.(v)Step 4.1.5: obtain the output of the j-th GRU using the following equation:(vi)Step 4.1.6: set ; if , repeat Step 4.1.2.(2)Step 4.2: establish the training model according to Step 4.1, and obtain the estimated value from the group execution in the test set.(3)Step 4.3: repeat Step 4.1 twice and obtain the corresponding estimate for each parameter.
Obviously, traffic data are time-dependent, and previous traffic flow states may still have a long-term influence on the current state. In order to have a prior grasp of the traffic congestion situation, it is necessary to determine the deployment of emergency plans some time ahead of the forecast time. To ensure the reservation time of decision and the prediction accuracy, a multistep iteration method is adopted, as shown in Equation (10), where represents the embedding dimension of the data, the left side of each equation is the GRU input, the right side represents the predicted output value, and represents the -th row vector of matrix , . Because traffic parameters contain many types, represent the average speed, traffic, and time occupancy data, respectively.
The forward propagation output in model training is shown in Equation (5). The error term of each gated unit is calculated using backpropagation. When , the error of the GRU unit is reversely transmitted as , and when , can be expressed as in the following equation: where E is the sample loss, which can be obtained from the loss function at all times.
According to Equation (11) and the gating unit process, the error at each time-step can be computed as in Equation (12), where , .
Equations (13), (14), and (15) can be computed by accumulating the gradient at each time step, and the updated weight and bias gradient can be obtained.
The posterior transfer error term is calculated by the partial derivative of the loss function, as shown in
6. CNN Model to Identify Regional Traffic Congestion
Since the traffic data tensor is compressed and extracted as an image, CNN should be a good choice as they are suitable for images. The principle of CNN is inspired by the human visual nervous system. Their basic structure includes an input layer, convolutional layer, pooling layer, fully connected layer, and output layer.
The second part mainly identifies the traffic states of the road network and refines the classification of the daily traffic congestion levels according to the original daily evaluation rules. Taking a time unit of 15 minutes as the investigation benchmark, the temporal and spatial traffic jam state characteristics of the whole road network are identified based on the predicted parameters to predict the temporal and spatial traffic jam situation. Multiparameter traffic data images are formed through the prediction data, and red, green, and blue spatial traffic data are allocated according to different traffic parameters. The three channels are superposed to form RGB images. It is beneficial to use the CNN method for images, particularly to predict the network traffic state represented by multiple traffic parameters. According to the classification of the traffic network congestion state corresponding to the 15-minute traffic congestion index in Section 2, the category which is identified by is set to 5.
Since the traffic data tensor is compressed and extracted as an image, CNN should be a good choice as they are suitable for images. The principle of CNN is inspired by the human visual nervous system. Their basic structure includes an input layer, convolutional layer, pooling layer, fully connected layer, and output layer. (1)Convolution layer: the convolution operation mainly carries out feature abstraction of the input image and iterates some features extracted by every single layer through the multilayer superposition to obtain the complex features of the image. In the CNN, each convolution layer is composed of several convolution units, and the parameters of each convolution unit are optimized using the backpropagation algorithm. The basic formal expression of the convolution layer is shown in Equation (17), where is an activation function, represents the current layer, represents the corresponding convolution window checked at the -th convolution, represents the convolution kernel, and is the bias of the current layer.(2)Pooling layer: pooling operation performs the aggregation of space or feature types and reduces spatial dimension; that is, in a small area, a specific sample value is taken as the input value, as shown in Equation (18), where is a sampling function, represents the multiplier parameter of the current layer, and is the bias of the current layer. Generally, features with large dimensions are obtained after the convolution layer. The features are split into several regions, and their maximum or average values (i.e., maximum pooling or average pooling, corresponding to the sampling function) are taken to obtain new features with small dimensions to reduce the parameters of the whole neural network.(3)Fully connected layer: each fully connected node is connected with all the features of the previous layer, and all the extracted features are integrated. The full link layer based on the classification task is mainly responsible for training a classifier. The learned features are used as the inputs, and the output is the classification results.
The proposed CNN-based algorithm is as follows: (Step 1)Calculate and execute the estimated value of the road network prediction according to Equation (4), extract the road section form as matrix , and superpose the matrix corresponding to the three parameters to obtain the traffic state image of the road network.(Step 2)Use the superimposed traffic state images obtained from the test set data grouping to identify the traffic state of the road network using CNN.(1)Step 2.1: initialize the weights of the network, determine the network hierarchy, and set the number of the labels (identified categories) to 5 according to the traffic network congestion classification.(i)Step 2.1.1: according to the network hierarchy, pass the input data through the convolution layer, the pooling layer, the fully connected layer, and the softmax layer to obtain the output value.(ii)Step 2.1.2: calculate the error between the output value of the network and the target value.(iii)Step 2.1.3: when the error is greater than the expected value, transmit the error back to the network, and successively obtain the error of each layer; if the error is equal to or less than the expected value, go to Step 2.2.(iv)Step 2.1.4: update the weights according to the obtained errors and return to Step 2.1.2.(2)Step 2.2: obtain the training model according to Step 2.1, and execute the superimposed traffic state images derived from the test set data grouping to classify the road network traffic state.(Step 3)Output the road network traffic state classification; that is, obtain the road network traffic states through the forecast data.
7. Evaluation of the Algorithm
This section describes the implementation details and result analysis of the proposed method.
7.1. Instance Dataset
The traffic detection dataset of a medium-sized city in China is selected for model verification. The dataset contains the traffic detection data for each section from December 1 to December 31, 2014 (2763066 detection records). There are traffic detection records of 484 sections (one-way driving) in the road network. The overall topology of the road network is shown in Figure 5. First, the example dataset required by the experiment is extracted from the original traffic database using MySQL software. The dataset records, including the section ID, collection time, associated intersection, associated direction, average speed, traffic flow, and time occupancy, are obtained by screening. The instance dataset is divided into two parts: a training set and a test set. The dataset of the first 25 days of the instance dataset is used as the training set, and the dataset of the other six days is used as the test set.

Each type of traffic parameter in the data record is extracted from the actual data and stored in an independent data file. Each independent parameter dataset extracted is screened and eliminated by the verification method in traffic data preprocessing. Considering the influence of traffic lights on urban road intersections, the time interval for fetching the fusion data when preprocessing the dataset is 5 minutes. The changes in the weighted adjacency matrix of the traffic flow parameters of the urban transportation network are shown in Figures 6–8. In the selected experimental dataset, the average traffic speed, traffic flow, and time occupancy rate changes at 50 intersections on December 1 were obtained from 6:00 to 7:30.



7.2. Experimental Programme Design
To evaluate the effectiveness of the temporal and spatial congestion prediction model based on the proposed GRU-CNN method, the model is verified according to the two structural parts in the model design.
The first part of the model predicts the parameters of each section of the traffic network, and the accuracy of the prediction determines the reliability of the subsequent traffic jam state prediction. To verify the accuracy of the prediction part in the model, the model is compared with common prediction methods, such as the autoregressive integrated moving average (ARIMA) method, the support vector machines (SVM), and the recurrent neural networks (RNN). For all the methods, considering the basic time interval of the congestion evaluation and the time law of the continuous expansion of traffic congestion, the changes in the traffic parameters in the first 30 minutes of the predicted time are used to predict the future values. In the evaluation of the algorithm, the average absolute percent error is selected to evaluate the prediction performance, mainly to compare the average absolute percent error of the predicted traffic flow parameter data for each group of data in the test set. The average absolute percent error is the average value of the absolute percent error, as shown in Equation (19), which can better reflect the actual proportion of the predicted difference. where represents the true value of the -th road section corresponding to the test data group, represents the parameter predicted value of the -th road section corresponding to the test data group, and is the number of road sections in the experimental road network.
The second part of the model mainly identifies the overall state of the transportation network, extracts the images with multiparameter predicted values, and uses each group of the corresponding images as the input. To evaluate the classification accuracy, the model is compared with other common classification methods, such as the fuzzy clustering means (FCM), -nearest neighbors (KNN), and support vector machines (SVM).
In addition to accuracy, precision, sensitivity, and specificity are also selected for model evaluation. The relevant definitions are shown in Equations (20)–(23). Among them, represents the number of samples that are classified as ; is the number of samples classified as not that are identified as the number of samples that are classified as not ; represents the number of samples classified as not that are identified as being classified as the number of samples of ; is the number of samples classified as that are recognized as samples of other classifications.
7.3. Model Parameters
Python and Tensorflow are used to train and test the proposed framework. In the first part of the model, since the concentration time interval of the selected experimental data is 5 minutes, the time lag is set to based on the extraction of the historical data. According to the model setting , since the data time interval is 5 minutes, the traffic parameters are predicted 5, 10, and 15 minutes after the input time. The training set and test set are grouped into one group every 30 minutes from the daily data; i.e., 283 groups of data are extracted for the daily traffic parameters of the network structure.
In the second part of the model, according to the actual road network extraction data, the first layer input in the convolution process is determined from the traffic data tensor (RGB image) of the multiparameter data combination. The settings of the convolution process are shown in Table 2.
The training process of the model relies on a forward propagation and a backpropagation step to achieve optimization. The forward propagation transmits hierarchical feature information, and the backpropagation updates the weights and bias vectors. The forward propagation computes the output of the GRU neural network (Equation (6)). The backpropagation of the GRU neural network uses the partial derivative of the loss function to calculate the backward passing error term and the weight gradients and uses the weight gradients to update the weights using the gradient descent method. The forward propagation of the convolutional layer in the CNN model outputs . There is no activation function in the pooling layer. In forward propagation, this model selects the maximum pooling to compress the input. In the forward propagation of the output layer, the softmax activation function is used to calculate the classification probability. The backpropagation of the CNN is the same as above. The loss function measures the deviation between the network predictions and the actual values. The smaller the loss function is, the more robust the model is. In the proposed method, the cross-entropy cost function is used as the loss function, and the gradient descent method is used to modify the optimized parameter values for the loss function so that the loss function tends to lower the error during the training process.
7.4. Experimental Results
The spatiotemporal traffic data tensors based on tube fibers are used to calculate the data of the training and test sets. Figure 9 shows the space traffic data tensor image for all the sections in the urban traffic network on December 1 for a duration of 24 hours. Figure 9(a) shows the average velocity data corresponding to the red channel of the image. Figure 9(b) shows the traffic flow data corresponding to the green channel of the image. Figure 9(c) shows the shared data corresponding to the blue channel image transportation time. Figure 9(d) shows the spatiotemporal traffic data tensor image of the three channels.

(a)

(b)

(c)

(d)
According to the scheme design in this section, the data of 484 one-way traffic sections in the experimental dataset are first verified for the prediction model of the traffic flow parameters. The prediction effects of the proposed model are compared with the ARIMA, SVM, RNN, and LSTM output for the same experimental data. According to the test set data for each type of traffic parameter, the corresponding traffic parameter prediction values of 5 min, 10 min, and 15 min are predicted, and the average absolute percent error of each group of data for each type of traffic parameter (including the average speed, traffic flow, and time occupancy according to the model design) is calculated.
The MAPE boxplot diagram of the experimental statistical data of each parameter is shown in Figure 10. The subfigures in the three rows in Figure 10 present the linear box diagram of the average absolute percent error of the predictions on the test set data for each section for the average speed, traffic flow, and time occupancy, respectively. In the evaluation of each traffic parameter, the predicted and evaluated MAPE values are counted according to the prediction results at 5 min, 10 min, and 15 min. Figures 10(a), 10(d), and 10(g) correspond to the linear box diagram of the average absolute percent error of the 5 min prediction results. Figures 10(b), 10(e), and 10(h) correspond to the linear box diagram of the average absolute percent error of the prediction result at 10 minutes. Figures 10(c), 10(f) and 10(i) correspond to the linear box diagram of the average absolute percent error of the prediction result at 15 minutes.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)
It can be found from the comparison of the statistical graphs that the experimental overall error fluctuation of the cyclic neural network-based method is less than that of the ARIMA and SVM methods, and the overall experimental error of the 5 min prediction is lower than that of the longer prediction times. The following is a detailed analysis of the effects of the cyclic neural network models. The experimental data of the RNN, LSTM, and GRU methods are compared. The MAPE data of the 5 min prediction show that there is no significant difference among the three types of traffic parameter data of the cyclic neural network models when they are predicted using the three methods. In fact, most of the methods in the cyclic neural network class are applicable. Compared with the MAPE data at 10 min and 15 min, the cyclic neural network method has better prediction stability than the RNN method when using the three methods to predict the three traffic parameters, which means both LSTM and GRU have advantages in prediction. From the perspective of the forecast duration and the category of the predicted traffic parameters, the method used in this model has the most stable fluctuation range of the average absolute percent error predicted by the actual traffic time series data, and thus, the proposed method is practical.
The CNN model is trained using the training set. It is then used on the test set to predict and extract the multiparameter traffic data image. The road network traffic state is predicted by the CNN image prediction algorithm. The images extracted from the predicted multiparameter traffic data are shown in Figure 11. In that figure, 144 groups of images with the size of are composed of the superposition of the 15 min prediction parameters corresponding to groups of data on December 29 in the test set.

The experiment is designed according to the hierarchical parameters of the model, and the images extracted from the multi-parameter traffic prediction data are used to predict the states of the traffic congestion types. The accuracy, precision, sensitivity, and specificity of each state prediction are evaluated. The second part evaluates the results of the first part of the model. This method is combined with the FCM, KNN, and SVM methods to predict the traffic status instantly. Using the FCM model in [18], the clustering centers of the flow [19 34 52 42 37] (Veh/min), speed [22 41 34 23 13] (km/h), and time occupancy [7 15 30 41 72] (%) are selected according to the experimental data. The membership degrees of the corresponding five types of road network traffic states are [0.72 0.1 0.09 0.06 0.03], [0.24 0.6 0.1 0.04 0.02], [0.11 0.21 0.55 0.1 0.03], [0.02 0.08 0.1 0.65 0.15], and [0.02 0.14 0.09 0.19 0.56]. The KNN model in [19] is applied to set the number of adjacent points between 5 and 20. Based on the experimental data, the number of adjacent points is set to 15. The SVM model in [20] is used with the optimized RBF kernel function parameter of 2.4 and the penalty coefficient of 11.7. Figures 12(a)–12(d) present the confusion matrix corresponding to the test set for the CNN model, the FCM model, the KNN model, and the SVM model, respectively.

(a)

(b)

(c)

(d)
The confusion matrix reflects the classification performance of the model. The diagonal in the matrix is highlighted in cyan, where the values indicate the numbers of correct classifications and the corresponding percentages. The rows and columns corresponding to the other labels are the numbers and percentages of the row classifications assigned to the column classification, respectively. The discriminant rate and misjudgment rate of each classification correspond to the grey cells in each row and column. The data in the lower right corner of the matrix are the overall accuracy and misjudgment rate of all state classifications of the model. Taking Figure 12(a) as an example, the numbers of the correctly classified traffic conditions of the five road networks are 730, 333, 257, 136, and 88, with the corresponding percentages of 43.0%, 19.6%, 15.1%, 8.0%, and 5.2%, respectively. In this section, the accuracy rate and misjudgment rate of the traffic status judgment of the whole road network using the model are 90.9% and 9.1%, respectively.
By comparing all the confusion matrices, the overall accuracy of the KNN and SVM models is 1.3%, and that of the FCM and SVM models is 0.3%, while the CNN model produces the highest overall accuracy. In the comparison of the misjudgment rate of road network traffic conditions, although the number of misjudgment types accounts for a small proportion, the misjudgment types of the CNN and FCM models are close to the original categories. In contrast, the misjudgment types of the KNN and SVM models are more scattered, indicating serious category misjudgment. This shows that the model proposed herein achieves better stability in the identification of road network traffic states. Among the comparison results of the four models, in the classification of moderate congestion and severe congestion, the misjudgment rate obtained by each model is high, which is due to the relatively small proportion of the training set data in this category in the actual network classification.
8. Conclusions
The prediction results of the overall experimental parameters show that the spatiotemporal congestion situation prediction model based on GRU-CNN can predict the traffic parameters of the whole network with a highly accurate judgment of the traffic situation. Compared with other models, the accuracy of the road network traffic state (i.e., the road network traffic situation) of the proposed model is improved, and the proposed model is more robust. Due to the limitation of collecting data samples, the current model is developed with insufficient training data. Its performance should be improved when more data are collected.
This manuscript is part of the supporting technology for the prediction of spatiotemporal traffic data in the analysis of urban traffic network dynamics. This study takes the whole urban traffic network as the analysis object and uses deep learning technology to extract the characteristics of the network to predict the congestion situation.
Using an urban traffic network model and traffic data tensors, this study determines the space-time state matrix of the dynamic road network and predicts the road network congestion situation from the perspectives of time and space. From the point of view of the time dimension, there is a strong correlation in the traffic time series, and the gated cycle unit neural network is used to predict the traffic network parameters using historical data. From the perspective of space, the locations where congestion often occurs are fixed, so the CNN is used to identify the congestion state of the spatial traffic network based on the predicted values of the overall situation. Effectively predicting the trend of traffic network operation is helpful for planning the deployment of traffic control and releasing early warnings and guidance information.
The study is mainly based on the specific road network data and traffic detection data of a medium-sized city, so the main results of the research are limited to the road network of medium-sized cities. There are a large number of complex interchanges in the road network structure in large cities. Since there is no data information corresponding to large-scale cities, the conclusions of this study cannot be extended to the traffic problems of large-scale cities. In the next stage, we will combine the previous research conclusions, aiming at the prediction of traffic congestion areas and fully considering the congestion avoidance mechanism when providing corresponding route recommendations to travelers. This is to alleviate the congestion level of the overall traffic network and improve the efficiency of the traffic flow in road network operation.
Data Availability
The traffic detection dataset of a medium-sized city in China is selected for model verification. The dataset is provided by the project sponsor, and it is confidential. The dataset contains the traffic detection data of each road section from December 1 to December 31, 2014 (2763066 detection records), with 484 sections (one-way driving) in the urban road network.
Conflicts of Interest
The authors declare no conflict of interest.
Acknowledgments
This research has been jointly supported by the Science and Technology Development Plan Project of Jilin Province (Natural Science Foundation of Jilin Province, Grant No. 20210101416JC).