CLSTAN: ConvLSTM-Based Spatiotemporal Attention Network for Traffic Flow Forecasting

Xiong, Liyan; Ding, Weihua; Huang, Xiaohui; Huang, Weichun

doi:https://doi.org/10.1155/2022/1604727

Mathematical Problems in Engineering

On this page

Abstract Introduction Related Works Preliminaries Conclusion and Discussion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 1604727 | https://doi.org/10.1155/2022/1604727

CLSTAN: ConvLSTM-Based Spatiotemporal Attention Network for Traffic Flow Forecasting

Liyan Xiong,¹Weihua Ding,¹Xiaohui Huang,¹and Weichun Huang²

Academic Editor: Jie Hu

Received07 May 2022

Revised21 Jun 2022

Accepted24 Jun 2022

Published11 Jul 2022

Abstract

Traffic flow forecasting is the essential part of intelligent transportation sSystem (ITS), which can fully protect traffic safety and improve traffic system management capability. Nevertheless, it is still a challenging problem, which is influenced by many complex factors, including regional distribution and external factors (e.g., holidays and weather). To combine various factors to forecast traffic flow, we presented a novel neural network structure called ConvLSTM-based Spatiotemporal Attention Network (CLSTAN). Specifically, our proposed model is composed of four modules: a preliminary feature extraction module, a spatial attention module, a temporal attention module, and an information fusion module. The spatiotemporal attention module can efficiently learn the complex spatiotemporal patterns of traffic flow through the attention mechanism. The spatial attention module uses a series of initial traffic flow maps as input and obtains the weights of the various regions through a ConvLSTM. The temporal attention module uses the spatially weighted traffic flow map as input and acquires the complex spatiotemporal patterns of traffic flow by a ConvLSTM that introduces an attention mechanism. Finally, the information fusion module integrates spatiotemporal information from multiple time dimensions to forecast future traffic flow. Moreover, to confirm the validity of our method, our experiments were conducted extensively on the TaxiBJ and BikeNYC datasets, and ultimately, CLSTAN performed better than other baseline experiments.

1. Introduction

Intelligent transportation system [1] is very important for the construction and development of modern cities. Traffic flow forecasting [2, 3], as an indispensable part of the intelligent transportation system, can be used as an index to evaluate the road state. Through traffic flow forecasting, the government can better conduct urban management [4] as well as social security management. According to related reports, the average monthly speed in Shanghai, China, is 23 km/h. Severe traffic congestion often causes a lot of inconveniences for people to travel. If traffic flow can be forecasted, early warnings can be made based on the results of traffic flow forecasting and measures can be taken in advance, thus avoiding the occurrence of city-wide or even intercity congestion. Therefore, in recent years, traffic flow forecasting has attracted extensive research interest from academia and industry.

Although this work has been extensively studied in recent years, however, trying to accurately forecast traffic flow is still a challenging task since it is often affected by many factors, such as region distribution, weather, traffic accident, and other external factors.

The main challenges are as follows:(1)Temporal correlation: as shown in Figure 1, the temporal correlation is composed of sequentiality and periodicity. In detail, sequentiality indicates that the traffic flow changes smoothly between adjacent time intervals, as shown in Figure 1(a). For instance, the traffic flow changes smoothly from 2 pm to 3 pm. And then, periodicity indicates that traffic flow usually repeats at a certain frequency. For instance, the traffic flow at 11 am today is similar to the flow at 11 am yesterday, as shown in Figure 1(b).(2)Spatial correlation: spatial correlation refers to the characteristic that traffic flow data changes may show different trends in different regions. As shown in Figure 2, the red area has serious traffic congestion due to more vehicles, while the green area has good traffic conditions. Therefore, compared with other areas, the red area traffic flow may change more drastically.(3)External factors: in a transportation system, different environmental factors have different degrees of influence to the traffic flow, such as holidays, weather, and events.

(a)

(b)

To resolve the challenges mentioned above, we propose an innovative deep learning model CLSTAN to combine various factors to forecast traffic flow. This model consists of four main modules: (1) the preliminary feature extraction module, (2) the spatial attention module, (3) the temporal attention module, and (4) the information fusion module. In subsequent sections, we will describe the implementation details of each module.

In conclusion, our main contributions are as follows:(1)We propose the spatial attention module and the temporal attention module based on ConvLSTM to enhance the capture of spatial and temporal correlations in spatiotemporal data.(2)Based on the spatial attention module and temporal attention module, we then introduced the preliminary feature extraction module and the information fusion module to design an innovative neural network called CLSTAN. The model integrates temporal correlation, spatial correlation, and external factors for traffic flow forecasting.(3)We conducted numerous experiments on two publicly available datasets (TaxiBJ and BikeNYC). The results of the experiments demonstrate the effectiveness of our proposed model.

The outline of the remaining sections is as follows. Section 2 describes the related work. Section 3 describes the concepts related to traffic flow and defines traffic flow forecasting. Section 4 details our proposed model. Section 5 describes the study of our experiments. In the end, we summarize our work and look forward to the future in Section 6.

2.1. Traffic Flow Forecasting

Traffic flow forecasting has been extensively investigated over the past, and researchers have achieved numerous results. The study of traffic flow forecasting is mainly divided into three areas.

Statistical models used for traffic flow forecasting include HA, ARIMA, and VAR [5–7]. The most representative is the autoregressive integrated moving average (ARIMA) and its variants [8–10]. Alghamdi et al. [11] propose a method to model traffic congestion using ARIMA. Compared to other models such as ARCH [12] and its variants [13], ARIMA ignores the spatial dependence. Ding et al. integrated the ARIMA and GARCH algorithms to propose ARIMA-GARCH [14] to make short-term traffic forecasts. These methods require data that meet certain assumptions, but traffic flow data are so complex that they do not meet these assumptions, so they often do not work well in practice.

Later, with the research breakthroughs in machine learning methods, it is widely used in various fields. For example, Hu et al. [15–18] proposed various machine learning models for iron ore sintering process based on Fuzzy C-Means Clustering and Differential Evolution algorithms. These models are able to perform carbon efficiency prediction under different conditions and greatly improve the prediction of carbon efficiency. Therefore, machine learning methods were also applied to traffic flow forecasting, such as KNN [19] and SVM [20]. Castro-Neto et al. [21] proposed a machine learning model based on online support vector machines to make short-term traffic forecasts. Sun et al. [22] proposed a machine learning method based on the Bayesian network to make short-time traffic forecasts, and the traffic flow between adjacent road links to the traffic network is modeled as the Bayesian network. These methods can model more complex data, but these methods do not work well on larger datasets.

Recently, DNN-based methods are widely used in different areas with many achievements. Inspired by these studies, many researchers have tried to employ deep learning algorithms to solve traffic flow forecasting problems. For instance, Zhang et al. [23] proposed a DNN-based model DeepST for extracting spatiotemporal attributes from traffic flow data. It designed the spatiotemporal component to be able to model both spatial near and distant dependencies. And later, they improved a deep learning framework ST-ResNet [24] based on ResNet [25] and used various temporal attributes (proximity, periodicity, and trend) in traffic flow for city-wide traffic flow forecasting. Yao et al. [26] presented a deep multiview spatiotemporal network for traffic flow forecasting, which takes a similar approach to graph convolution to obtain the spatial dependence. Liu et al. [27] proposed a component called Attentive Traffic Flow Machine (ATFM) and was able to efficiently extract spatiotemporal information from the traffic flow. Lin et al. [28] proposed a model called SpAE-LSTM, which extracts spatial features by sparse autoencoder and temporal features by LSTM. Yao et al. [29] proposed a traffic gating mechanism to extract the dynamic correlation between different regions and proposed a periodic attention mechanism to handle long-term time-series data. Ma and Song et al. et al. [30–33] proposed a series of deep learning model for daily traffic flow forecasting. These methods focus on mining the relationship between traffic flow patterns and contextual factors. Experiments demonstrate that methods combining contextual factors and traffic patterns can improve prediction performance. Although all of the above studies have yielded good achievements, they all have areas for improvement.

2.2. Deep Learning

Convolutional neural network (CNN) [34], as a classical deep learning method [35], can extract features of images by different receptive fields so that it can be used to extract spatial characteristics of traffic flow. However, it cannot be used for feature extraction of time series. Recurrent neural network (RNN) [36] can be used to extract time-series features. Based on the recurrent neural network, researchers have improved various variants, such as the long short-term memory network (LSTM) [37] and the gated recurrent unit (GRU) [38]. Experimentally, these variants were shown to better model time series and used to explore the temporal relationship of traffic flow. Shi et al. [39] combined the above approaches and presented the convolutional long short-term memory network (ConvLSTM). And then, Xiong et al. [40] employed the convolutional long and short-term memory network for spatiotemporal modeling of traffic flow. It was demonstrated that the convolutional long and short-term memory network can extract the spatiotemporal information of traffic flow effectively.

2.3. Attention Mechanism

In recent studies, attention mechanisms have been widely used in different tasks such as natural language processing [41, 42], image caption [43], and speech recognition [44, 45]. Xu et al. [46] proposed two attention mechanisms in an image recognition task and used visualization to graphically demonstrate the effects of the attentional mechanisms. V elickovic et al. [47] presented a network structure with attention mechanism and experimented on graph-structured data, which showed that they could notice the most critical parts of the graph-structured data. Liang et al. [48] presented a multilevel attention mechanism network to predict time series with excellent results.

3. Preliminaries

In this section, we will introduce some relevant definitions of traffic flow forecasting.

3.1. Traffic Networks

In previous studies [49, 50], researchers have used a variety of methods to split a city into areas, such as zip codes or latitude and longitude. In this study, we will split the city into square grid maps according to latitude and longitude. Each grid represents a different geographical location of the city. Specifically, we represent each grid as .

3.2. Traffic Flow Map

In the real world, we are able to obtain a large number of tracks of taxis and bicycles through cellphone signals and GPS signals. By using these tracks, we can get the amount of bikes or taxis entering and leaving a certain area in a certain time interval. In this study, we denote the amount of vehicles entering and leaving a given area in a given time interval as inflow and outflow. Specifically, we refer to the traffic flow map at time interval on day as , where the first channel is inflow and the second channel is outflow. Figure 3 shows an example of inflow and outflow.

(a)

(b)

3.3. External Factors

Zhang et al. [24] demonstrated that traffic flow is influenced by complicated external factors. For instance, an unexpected downpour will cause a sudden traffic congestion in a certain area, or people may congregate in a busy commercial area on a holiday, causing a large increase in traffic flow in the area compared to normal. In this study, we mainly focus on the impact of weather and holidays on traffic flow forecasting. We encode the weather and holiday information by the One-Hot Encoding method and connect all the external factors to a tensor. Specifically, we denote the external factors for the time interval on day as .

3.4. Convolutional LSTM Network

Shi et al. first proposed the ConvLSTM and used it to make short-time precipitation forecasting. They defined the short-time precipitation forecasting as spatiotemporal sequence forecasting problem. However, LSTM needs to expand the spatiotemporal data into 1D vector when solving the spatiotemporal sequence forecasting problem, which makes the spatial information is lost. To solve this problem, Shi et al. replace the fully connected layer of each gate in LSTM with CNN. Therefore, ConvLSTM not only can model the temporal relationship like LSTM but also can extract spatial features such as CNN. Experiments demonstrate that ConvLSTM can better capture spatiotemporal correlations compared to LSTM.

Traffic flow is also a typical spatiotemporal data, and ConvLSTM is suitable for processing spatiotemporal data. So, we propose our spatial attention module and temporal attention module based on ConvLSTM. The structure of ConvLSTM is similar to that of the traditional LSTM we define the computational procedure of the ConvLSTM as follows.

For ConvLSTM cell in a layer, the input consists of the past cell state , the past hidden state , and input . The output is the updated hidden state and the updated cell state . The cell state is determined by the gating mechanism (, , and ). The input gate determines what degree new information is recorded into the cell state, the forget gate determines how much the previous cell state will be forgotten, and the output gate determines what degree information about is transferred to the hidden state . Then, the updating formulas for ConvLSTM are given below:where ^∗ denotes the convolution operator and denotes the Hadamard product and are all learnable parameters.

3.5. Traffic Flow Forecasting

Given a series of past traffic flow maps up to the time interval on day and external factors, our goal is to forecast the future traffic flow map for the time interval on day :

4. ConvLSTM-Based Spatiotemporal Attention Network

In this section, we detail our proposed ConvLSTM-based Spatiotemporal Attention Network (CLSTAN), i.e., our forecasting function . In the previous description, we believe that traffic flow forecasting is influenced by periodicity and sequentiality in temporal dimensions. So, we set the structure in the form of two channels to learn periodicity and sequentiality, respectively. And then, finally e fuse the prediction results of two channels to complete traffic flow forecasting. Figure 4 illustrates the structure of our presented model. Our presented model consists of four main components: a preliminary feature extraction module (PFE), a spatial attention module (SAM), a temporal attention module (TAM), and an information fusion module (Fusion).

4.1. Preliminary Feature Extraction Module

In this section, we describe in detail how this module performs feature extraction for traffic flow and external factors.

For the traffic flow maps, we use two convolutional layers and multiple residual network units to obtain feature embedding from a given set of traffic flow maps , as shown in Figure 5(a). Each residual unit consists of two convolutional layers; the specific structure is shown in Figure 5(b). By feeding the traffic flow map into the preliminary feature extraction module, we can obtain the extracted traffic flow features .

(a)

(b)

For the external factors, we employ two fully connected layers to extract features from the given external factors , as shown in Figure 6. Since the subsequent work needs to fuse the extracted traffic flow features with external factors features , we reshape the obtained external factor features into .

After the preliminary extraction of traffic flow and external factors’ features, we fuse these two extracted features to generate a new feature and denote as follows:where is a combination of traffic flow features and external factors at a particular time, which will be fed to subsequent modules for learning their features in spatial and temporal dimensions.

4.2. The Spatial Attention Module

For a specific time, traffic flow changes in different regions are different. For instance, during the morning rush hour, the change in traffic flow in residential areas and industrial parks is undoubtedly huge compared to other areas such as commercial areas. Therefore, as shown in Figure 7, we believe that, to make traffic flow forecasting more accurate, we need to assign higher weights to areas with more dramatic traffic flow changes. Therefore, we propose the spatial attention module for inferring the spatial weights of each region and assigning the obtained spatial weights to the original traffic flow data.

The specific structure of the spatial attention module is shown in Figure 8. The spatial attention module uses the hidden output state of a ConvLSTM with the current input to deduce the spatial weights of each region. And then, the obtained spatial weights of each region are assigned to the current input and used as the input of the temporal attention module.

Specifically, through ConvLSTM, we can obtain the future state of the traffic flow map :

And then, combine the obtained future state with the current input to obtain the spatial weights of each region :where denotes the concatenation operation and denotes the convolution operation with the convolution kernel of .

Finally, we multiply the obtained spatial attention weights with the current input , according to the element positions to get the spatially weighted traffic flow data :where denotes the Hadamard product.

4.3. The Temporal Attention Module

When temporal relationships need to be modeled, we advocate the use of LSTM as the main part of the temporal attention module, and considering the particularity of traffic flow maps, ConvLSTM is better able to perform this task. However, traditional ConvLSTM focuses on the extraction of temporal information, ignoring the importance of different time intervals is different for the subsequent time-series prediction. Therefore, we choose to introduce the attention mechanism into the traditional ConvLSTM as our temporal attention module.

Figure 9 illustrates the specific structure of the temporal attention module. It takes a series of spatially weighted traffic flow as input and then feeds them into ConvLSTM to obtain a series of outputs. Finally, this series of output is multiplied with the temporal attentive score to obtain the spatiotemporal weighted outputs.

Specifically, by feeding spatially weighted traffic flow to ConvLSTM, we can obtain the series of outputs :

And then, we randomly initialized a series of vectors , in which is the query vector of . With and , we can get the corresponding attention score . This step can be represented by the following equation:

Finally, the prediction can be obtained by an operation of weighed sum and :

4.4. The Information Fusion Module

In the previous description, we believed that traffic flow forecasting is affected by periodicity and sequentiality in temporal dimensions. How accurately these two properties are weighed is important for the forecast performance. To address this issue, we introduce an information fusion module. Specifically, the structure of the information fusion module is shown in Figure 10. This module can dynamically learn the weights of these two properties from external factors. These weights are used as fusion weights to fuse the information in two time dimensions and finally obtain the prediction results.

Specifically, we define the periodic prediction results and sequential prediction results as , respectively. Through the information fusion module, we obtain the fusion weights of the two prediction results as . The periodic and sequential predictions are fused according to the fusion weights to obtain the final prediction , denoted as follows:

5. Experiments

In this section, we verify the validity of CLSTAN on two publicly available datasets, TaxiBJ and BikeNYC. We will then describe the experiments in detail.

5.1. Dataset

In this study, we select two representative publicly available datasets for city-wide traffic forecasting, including the TaxiBJ dataset and the BikeNYC dataset. The two datasets are publicly accessible and different comparison algorithms can be fairly compared on the same dataset. The summary of TaxiBJ and BikeNYC is shown as follows.

TaxiBJ dataset: this dataset collected over 34,000 taxis’ GPS data and external factors for over 16 months from 2013 to 2016. External factors include holidays, temperature, weather, and wind speed. Specifically, the first fifteen months of data are divided into the training set and the remaining data are divided into the test set.

BikeNYC dataset: this dataset collected over 4,300 bicycles rental data and external factors from April to October 2014. The external factors include 20 types of holidays. Similar to the TaxiBJ dataset, specifically, the first 172 days of data are divided into the training set and the last 10 days of data are divided into the test set.

5.2. Evaluation Metric

For evaluating the model, we compare Root Mean Square Error (RMSE) between the baseline methods and our methods, which is calculated as follows:where and denote the true and predicted values of traffic flow, respectively. denotes the amount of samples employed for validation.

5.3. Methods for Comparison

HA: historical average method forecasts traffic flow by using historical average.

ARIMA: autoregressive integrated moving average (ARIMA) is a classical time-series forecasting model that combines moving average and autoregressive components to model time series.

SARIMA [51]: seasonal ARIMA is a variation of ARIMA, which considers seasonal effects.

VAR [52]: vector autoregression (VAR) is a classical random processing method that captures the linear relationship between several time series.

DeepST: DeepST is a model based on deep learning that is the first one to capture spatial information by convolution.

ST-ANN [53]: it extracts spatial features by the values of 8 adjacent regions and temporal features by the values of the prior 8 time intervals and uses the extracted spatiotemporal features for traffic flow forecasting.

ST-ResNet [53]: ST-ResNet is also a traffic forecasting method based on deep learning. This method combines density, period, trend data, and external factors for traffic flow forecasting.

VPN [54]: video pixel networks (VPN) are a model of probabilistic video for multiframe forecasting.

PredNet [55]: PredNet is a CNN-based method for modeling the dependencies among successive video inputs and subsequent frames.

PredRNN [56]: PredRNN is a method used to generate subsequent frames of video sequences by capturing spatiotemporal features of the input frames through recurrent neural networks.

5.4. Performance Comparison

5.4.1. Comparison with Baseline Methods

Table 1 shows the results of our presented method compared with the baseline methods. Among all methods, CLSTAN achieved the smallest RMSE of 15.23 and 5.65 on the TaxiBJ and BikeNYC datasets, respectively, improving the performance by 6.79% and 5.68% over the best of the baseline methods, respectively. Specifically, classic time-series methods (such as HA, VAR, and ARIMA) have poor results on these two datasets. For instance, HA has RMSE of 57.79 and 21.57 on these two datasets, respectively, because they rely exclusively on historical values for forecasting and do not explore the complex spatiotemporal patterns in the data. Because of the emergence of deep learning methods, specifically, CNN-based methods (such as ST-ANN, DeepST, and ST-ResNet) are improving the accuracy of traffic flow forecasting to a certain extent. For example, DeepST reduces the RMSE to 18.18 and 7.43 on the TaxiBJ and BikeNYC datasets. However, using CNN only does not fully explore the temporal patterns of the data. When using RNN to explore the temporal relationship of traffic flow, specifically, RNN-based methods (such as VPNs, PredNet, and PredRNN) can solve a part of the problems faced by CNN-based models. Nonetheless, these models ignore the spatial relationship of traffic flow changes and ignore the issue that different sequential times are of different importance for subsequent time-series predictions. In contrast, our method explores the weight of each time interval and each region for subsequent traffic flow forecasting through the spatiotemporal attention module, by which it can more accurately explore the complex spatiotemporal patterns in the traffic flow data, thus further improving the efficiency of the model. The results of the experiment also demonstrate that our method improves the prediction accuracy and outperforms other methods.

5.4.2. Comparison with Different Variants of Our Model

In our experiments, we conducted ablation experiments on our model to verify the effectiveness of different components on the TaxiBJ dataset and the BikeNYC dataset. Specifically, there are five types of our model and its variants:

CLSTAN: the complete model we proposed.

CLSTAN + ConvGRU: replace ConvLSTM in the CLSTAN model with ConvGRU.

CLSTAN-SA: remove the spatial attention module in the CLSTAN model.

CLSTAN-TA: remove the temporal attention module in the CLSTAN model.

CLSTAN-STA: remove both spatiotemporal attention module in the CLSTAN model.

We conducted experiments for the above five cases, and the results are shown in Table 2. Firstly, as can be seen in Table 2, CLSTAN with both spatiotemporal attention modules applied shows the best performance. Using both spatiotemporal attention modules, we can better capture the spatiotemporal relationship of traffic flow, thus improving the accuracy of forecasting. Then, using only the spatial attention module or only the temporal attention module, we can also reduce RMSE to some extent, which demonstrates the effectiveness of these two modules. Finally, as a variant of ConvLSTM, ConvGRU also achieves good results, but it is slightly inferior to ConvLSTM. Therefore, we choose ConvLSTM as the main structure of the spatiotemporal attention module.

5.4.3. Comparison with Different Attentional Time Step Sizes

Furthermore, we also investigated the effect of different attentional time steps on the final prediction results. In our experiments, we set from 0 time steps to 6 time steps, and the results of the experiment are presented in Figure 11.

(a)

(b)

Figures 11(a) and 11(b) show the prediction results of different attentional time steps on the two datasets. As can be seen from the figures, our proposed model obtains the best prediction performance when the attention time step size is set up to 6. Compared with the worst results, the RMSE of the model with an attentional time step of 6 decreases by 4.88% on the BikeNYC dataset and by 5.17% on the TaxiBJ dataset. Therefore, we set the attentional time step sizes to 6.

5.4.4. Training Process

The change in RMSE of our model on the BikeNYC dataset and the TaxiBJ dataset is presented in Figure 12. As the amount of training epochs increases, the accuracy of the model steadily improves and eventually stabilizes. When the epoch approaches 150 in BikeNYC and 400 in TaxiBJ, the RMSE change decreases slowly and stabilizes. Therefore, we set the number of early stopping steps to 20 in the BikeNYC dataset and 50 in the TaxiBJ dataset to avoid the overfitting problem.

(a)

(b)

6. Conclusion and Discussion

We present an innovative spatiotemporal attention neural network based on ConvLSTM for traffic flow forecasting. Our approach focuses on designing a temporal attention module and a spatial attention module. These two modules dynamically capture the complicated spatiotemporal relationships within the traffic flow data to better forecast the future traffic flow. Specifically, the spatial attention module aims to explore those areas where future traffic flow changes will be more dramatic so that our model can focus more on these areas when predicting. And the temporal attention module aims to discover those time intervals that will have more impact on future traffic flow changes so that our model can focus more on those time intervals when predicting. We conducted experiments on both the TaxiBJ dataset and the BikeNYC dataset, and the experimental results showed that CLSTAN outperformed other baseline experimental methods. Furthermore, the ablation experiment once again validated the performance of the proposed spatiotemporal attention module.

In recent years, GCN-based methods have begun to be extensively employed in traffic flow forecasting, and they can better extract spatial relationships. In the future work, we will combine our model with GCN-based methods and design a suitable network to achieve more accurate prediction results.

Data Availability

The datasets for this research were obtained from the study “Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction.”

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (nos. 62067002, 61967006, and 62062033), the Science and Technology Project of Transportation Department of Jiangxi Province (nos. 2021X0011 and 2022X0040), and the Natural Science Foundation of Jiangxi Province (no. 20212BAB202008).

References

J. Zhang, F. Y. Wang, K. Wang, W. H. Lin, X. Xu, and C. Chen, “Data-driven intelligent transportation systems: a survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 12, no. 4, pp. 1624–1639, 2011.
View at: Publisher Site | Google Scholar
N. G. Polson and V. O. Sokolov, “Deep learning for short-term traffic flow prediction,” Transportation Research Part C: Emerging Technologies, vol. 79, pp. 1–17, 2017.
View at: Publisher Site | Google Scholar
C. Wu, T. Yin, S. Ge, and K. Yu, “Ensemble learning for crowd flows prediction on campus,” in Proceedings of the International Conference on Smart Computing and Communication, pp. 103–113, Springer, Berlin, Germany, 2017, December.
View at: Google Scholar
Y. Zheng, L. Capra, O. Wolfson, and H. Yang, “Urban computing,” ACM Transactions on Intelligent Systems and Technology, vol. 5, no. 3, pp. 1–55, 2014.
View at: Publisher Site | Google Scholar
X. Song, Q. Zhang, Y. Sekimoto, and R. Shibasaki, “Prediction of human emergency behavior and their mobility following large-scale disaster,” in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 5–14, 2014, August.
View at: Publisher Site | Google Scholar
Z. Fan, X. Song, R. Shibasaki, and R. Adachi, “Citymomentum: an online approach for crowd behavior prediction at a citywide level,” in Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pp. 559–569, 2015, September.
View at: Google Scholar
M. Lippi, M. Bertini, and P. Frasconi, “Short-term traffic flow forecasting: an experimental comparison of time-series analysis and supervised learning,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 2, pp. 871–882, 2013.
View at: Publisher Site | Google Scholar
X. Li, G. Pan, Z. Wu et al., “Prediction of urban human mobility using large-scale taxi traces and its applications,” Frontiers of Computer Science, vol. 6, no. 1, pp. 111–121, 2012.
View at: Google Scholar
J. Guo, W. Huang, and B. M. Williams, “Adaptive Kalman filter approach for stochastic short-term traffic flow rate prediction and uncertainty quantification,” Transportation Research Part C: Emerging Technologies, vol. 43, pp. 50–64, 2014.
View at: Publisher Site | Google Scholar
L. Moreira-Matias, J. Gama, M. Ferreira, J. Mendes-Moreira, and L. Damas, “Predicting taxi-passenger demand using streaming data,” IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 3, pp. 1393–1402, 2013.
View at: Publisher Site | Google Scholar
T. Alghamdi, K. Elgazzar, M. Bayoumi, T. Sharaf, and S. Shah, “Forecasting traffic congestion using ARIMA modeling,” in Proceedings of the 2019 15th international wireless communications & mobile computing conference (IWCMC), pp. 1227–1232, IEEE, Tangier, Morocco, 2019, June.
View at: Publisher Site | Google Scholar
R. F. Engle, “Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation,” Econometrica, vol. 50, no. 4, pp. 987–1007, 1982.
View at: Publisher Site | Google Scholar
T. Bollerslev, “Generalized autoregressive conditional heteroskedasticity,” Journal of Econometrics, vol. 31, no. 3, pp. 307–327, 1986.
View at: Publisher Site | Google Scholar
C. Ding, J. Duan, Y. Zhang, X. Wu, and G. Yu, “Using an ARIMA-GARCH modeling approach to improve subway short-term ridership forecasting accounting for dynamic volatility,” IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 4, pp. 1054–1064, 2017.
View at: Google Scholar
J. Hu, M. Wu, X. Chen et al., “A multilevel prediction model of carbon efficiency based on the differential evolution algorithm for the iron ore sintering process,” IEEE Transactions on Industrial Electronics, vol. 65, no. 11, pp. 8778–8787, 2018.
View at: Publisher Site | Google Scholar
J. Hu, M. Wu, L. Chen, and W. Pedrycz, “A novel modeling framework based on customized kernel-based fuzzy C-means clustering in iron ore sintering process,” IEEE, vol. 27, no. 2, pp. 950–961, 2021.
View at: Google Scholar
J. Hu, M. Wu, L. Chen, K. Zhou, P. Zhang, and W. Pedrycz, “Weighted kernel fuzzy c-means-based broad learning model for time-series prediction of carbon efficiency in iron ore sintering process,” IEEE Transactions on Cybernetics, vol. 52, 2020.
View at: Google Scholar
J. Hu, M. Wu, X. Chen, W. Cao, and W. Pedrycz, “Multi-model ensemble prediction model for carbon efficiency with application to iron ore sintering process,” Control Engineering Practice, vol. 88, pp. 141–151, 2019.
View at: Publisher Site | Google Scholar
T. Abeywickrama, M. A. Cheema, and D. Taniar, “K-nearest neighbors on road networks: a journey in experimentation and in-memory implementation,” 2016, arXiv preprint arXiv:1601.01549.
View at: Google Scholar
A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statistics and Computing, vol. 14, no. 3, pp. 199–222, 2004.
View at: Publisher Site | Google Scholar
M. Castro-Neto, Y. S. Jeong, M. K. Jeong, and L. D. Han, “Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions,” Expert Systems with Applications, vol. 36, no. 3, pp. 6164–6173, 2009.
View at: Publisher Site | Google Scholar
S. Sun, C. Zhang, and G. Yu, “A Bayesian network approach to traffic flow forecasting,” IEEE Transactions on Intelligent Transportation Systems, vol. 7, no. 1, pp. 124–132, 2006.
View at: Publisher Site | Google Scholar
J. Zhang, Y. Zheng, D. Qi, R. Li, and X. Yi, “DNN-based prediction model for spatio-temporal data,” in Proceedings of the 24th ACM SIGSPATIAL international conference on advances in geographic information systems, pp. 1–4, 2016, October.
View at: Publisher Site | Google Scholar
J. Zhang, Y. Zheng, and D. Qi, “Deep spatio-temporal residual networks for citywide crowd flows prediction,” in Proceedings of the 31st AAAI conference on artificial intelligence, 2017, February.
View at: Publisher Site | Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, Las Vegas, NV, USA, June 2016.
View at: Publisher Site | Google Scholar
H. Yao, F. Wu, J. Ke et al., “Deep multi-view spatial-temporal network for taxi demand prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. No. 1, 2018 April.
View at: Publisher Site | Google Scholar
L. Liu, J. Zhen, G. Li et al., “Dynamic spatial-temporal representation learning for traffic flow prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 11, pp. 7169–7183, 2020.
View at: Google Scholar
F. Lin, Y. Xu, Y. Yang, and H. Ma, “A spatial-temporal hybrid model for short-term traffic prediction,” Mathematical Problems in Engineering, vol. 2019, Article ID 4858546, 12 pages, 2019.
View at: Publisher Site | Google Scholar
H. Yao, X. Tang, H. Wei, G. Zheng, and Z. Li, “Revisiting spatial-temporal similarity: a deep learning framework for traffic prediction,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 5668–5675.
View at: Publisher Site | Google Scholar
D. Ma, X. B. Song, J. Zhu, and W. Ma, “Input data selection for daily traffic flow forecasting through contextual mining and intra-day pattern recognition,” Expert Systems with Applications, vol. 176, Article ID 114902, 2021.
View at: Publisher Site | Google Scholar
X. Song, W. Li, D. Ma, D. Wang, L. Qu, and Y. Wang, “A match-then-predict method for daily traffic flow forecasting based on group method of data handling,” Computer-Aided Civil and Infrastructure Engineering, vol. 33, no. 11, pp. 982–998, 2018.
View at: Publisher Site | Google Scholar
D. Ma, X. Song, and P. Li, “Daily traffic flow forecasting through a contextual convolutional recurrent neural network modeling inter-and intra-day traffic patterns,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 5, pp. 2627–2636, 2020.
View at: Google Scholar
L. Qu, W. Li, W. Li, D. Ma, and Y. Wang, “Daily long-term traffic flow forecasting based on a deep neural network,” Expert Systems with Applications, vol. 121, pp. 304–312, 2019.
View at: Publisher Site | Google Scholar
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
View at: Publisher Site | Google Scholar
D. Bau, J. Y. Zhu, H. Strobelt, A. Lapedriza, B. Zhou, and A. Torralba, “Understanding the role of individual units in a deep neural network,” Proceedings of the National Academy of Sciences, vol. 117, no. 48, pp. 30071–30078, 2020.
View at: Publisher Site | Google Scholar
Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157–166, 1994.
View at: Publisher Site | Google Scholar
R. Yu, Y. Li, C. Shahabi, U. Demiryurek, and Y. Liu, “Deep learning: a generic approach for extreme condition traffic forecasting,” in Proceedings of the 2017 SIAM international Conference on Data Mining, pp. 777–785, 2017 June.
View at: Publisher Site | Google Scholar
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” 2014, arXiv preprint arXiv:1412.3555.
View at: Google Scholar
X. Shi, Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo, “Convolutional LSTM network: a machine learning approach for precipitation nowcasting,” Advances in Neural Information Processing Systems, vol. 28, 2015.
View at: Google Scholar
F. Xiong, X. Shi, and D. Y. Yeung, “Spatiotemporal modeling for crowd counting in videos,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 5151–5159, Venice, Italy, October 2017.
View at: Publisher Site | Google Scholar
L. Chen, H. Zhang, J. Xiao et al., “Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5659–5667, Honolulu, HI, USA, July 2017.
View at: Publisher Site | Google Scholar
J. Lu, C. Xiong, D. Parikh, and R. Socher, “Knowing when to look: adaptive attention via a visual sentinel for image captioning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 375–383, Honolulu, HI, USA, July 2017.
View at: Publisher Site | Google Scholar
A. Vaswani, N. Shazeer, N. Parmar et al., “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
View at: Google Scholar
D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, and Y. Bengio, “End-to-end attention-based large vocabulary speech recognition,” in Proceedings of the 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4945–4949, IEEE, Shanghai, China, 2016, March.
View at: Publisher Site | Google Scholar
J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, “Attention-based models for speech recognition,” Advances in Neural Information Processing Systems, vol. 28, 2015.
View at: Google Scholar
H. Xu and K. Saenko, “Ask, attend and answer: exploring question-guided spatial attention for visual question answering, Computer Vision - ECCV 2016,” in Proceedings of the European Conference on Computer Vision, pp. 451–466, Springer, Berlin, Germany, 2016, October.
View at: Publisher Site | Google Scholar
P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph Attention Networks,” 2017, arXiv preprint arXiv:1710.10903.
View at: Google Scholar
Y. Liang, S. Ke, J. Zhang, X. Yi, and Y. Zheng, “Geoman: multi-level attention networks for geo-sensory time series prediction,” IJCAI, vol. 2018, pp. 3428–3434, 2018, July.
View at: Publisher Site | Google Scholar
D. Deng, C. Shahabi, U. Demiryurek, L. Zhu, R. Yu, and Y. Liu, “Latent space model for road networks to predict time-varying traffic,” in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1525–1534, 2016 August.
View at: Publisher Site | Google Scholar
L. Liu, Z. Qiu, G. Li, Q. Wang, W. Ouyang, and L. Lin, “Contextualized spatial-temporal network for taxi origin-destination demand prediction,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 10, pp. 3875–3887, 2019.
View at: Publisher Site | Google Scholar
B. M. Williams, P. K. Durvasula, and D. E. Brown, “Urban freeway traffic flow prediction: application of seasonal autoregressive integrated moving average and exponential smoothing models,” Transportation Research Record: Journal of the Transportation Research Board, vol. 1644, no. 1, pp. 132–141, 1998.
View at: Publisher Site | Google Scholar
S. Johansen, “Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models,” Econometrica, vol. 59, no. 6, pp. 1551–1580, 1991.
View at: Publisher Site | Google Scholar
J. Zhang, Y. Zheng, D. Qi, R. Li, X. Yi, and T. Li, “Predicting citywide crowd flows using deep spatio-temporal residual networks,” Artificial Intelligence, vol. 259, pp. 147–166, 2018.
View at: Publisher Site | Google Scholar
N. Kalchbrenner, A. Oord, K. Simonyan et al., “Video pixel networks,” International Conference on Machine Learning, pp. 1771–1779, 2017.
View at: Google Scholar
W. Lotter, G. Kreiman, and D. Cox, “Deep predictive coding networks for video prediction and unsupervised learning,” arXiv preprint arXiv:1605.08104, 2016.
View at: Google Scholar
Y. Wang, M. Long, J. Wang, Z. Gao, and P. S. Yu, “Predrnn: recurrent neural networks for predictive learning using spatiotemporal lstms,” Advances in Neural Information Processing Systems, vol. 30, 2017.
View at: Google Scholar

Copyright

Copyright © 2022 Liyan Xiong et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Mathematical Problems in Engineering

CLSTAN: ConvLSTM-Based Spatiotemporal Attention Network for Traffic Flow Forecasting

Abstract

1. Introduction

2. Related Works

2.1. Traffic Flow Forecasting

2.2. Deep Learning

2.3. Attention Mechanism

3. Preliminaries

3.1. Traffic Networks

3.2. Traffic Flow Map

3.3. External Factors

3.4. Convolutional LSTM Network

3.5. Traffic Flow Forecasting

4. ConvLSTM-Based Spatiotemporal Attention Network

4.1. Preliminary Feature Extraction Module

4.2. The Spatial Attention Module

4.3. The Temporal Attention Module

4.4. The Information Fusion Module

5. Experiments

5.1. Dataset

5.2. Evaluation Metric

5.3. Methods for Comparison

5.4. Performance Comparison

5.4.1. Comparison with Baseline Methods

5.4.2. Comparison with Different Variants of Our Model

5.4.3. Comparison with Different Attentional Time Step Sizes

5.4.4. Training Process

6. Conclusion and Discussion

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright