Abstract
With the continuous development of deep learning, more and more huge deep learning models are developed by researchers, which leads to an exponential increase of the parameters of models. Therein, the convolutional recurrent network as a type of widely used deep learning method is often employed to handle spatiotemporal data, e.g., traffic data. However, because of the large number of parameters in the model, the convolutional recurrent network needs to consume a lot of computing resources and time in the training process. To reduce the consumption of resources, we propose a sparse convolutional recurrent network with a sparse gating mechanism that is able to reduce the complexity of the network by an improved gate unit while keeping the performance of the model. We evaluate the performance of our proposed network on traffic flow datasets, and the experimental results show that the parameters of the model are significantly reduced under the condition of similar prediction accuracy compared with the traditional convolutional recurrent network.
1. Introduction
Deep learning [1] has achieved great success in various fields, e.g., computer vision [2, 3] and natural language processing [4], and it has also been introduced into the field of traffic flow prediction by some scholars [5–7]. Meanwhile, with the development of the economy, the transportation network in recent years has also achieved rapid expansion. However, traffic congestion is becoming an increasingly serious issue. The traffic flow prediction of different key points in the transportation network is the basis for solving the issue. To improve the accuracy of traffic flow prediction, various complex deep networks are developed, which leads to an overwhelming increase of parameters and resource consumption. Convolutional long short-term memory (ConvLSTM) network was proposed by Shi et al. [8] for precipitation nowcasting, which introduced the convolutional operation into the fully connected LSTM [9]. Ballas et al. [10] proposed a convolutional gated recurrent unit (ConvGRU) for video representation, which uses a convolutional structure to capture spatial features. Then, they are applied to traffic flow prediction by many researchers. For example, Zonoozi et al. [11] proposed a ConvGRU method based on the periodic characteristics of the traffic flow data, which employs ConvGRU to extract the spatiotemporal representation of input data. Zang et al. [12] developed a deep learning approach for traffic speed prediction, which uses the ConvLSTM network to extract the temporal and spatial dependence of historical traffic data.
Since convolutional recurrent network has great advantages in extracting temporal and spatial information hidden in spatiotemporal data simultaneously, the method has been naturally applied to solve the prediction problem of traffic flow. However, the convolutional recurrent network has a large number of parameters and requires more resources in the training and testing process. Therefore, it is necessary to compress the convolutional recurrent network by reducing the amount of its parameters while the model is utilized to solve the prediction problem of traffic flow, especially on some resource-limited devices. In this paper, we propose a sparse convolutional recurrent network by using the sparse gate in ConvLSTM and ConvGRU to reduce the resource requirement. The contributions of this paper are summarized as follows:(i)Based on ConvLSTM and ConvGRU, we develop four sparse convolutional recurrent networks: SConvLSTM, SConvLSTM+, SConvGRU, and SConvGRU+. The numbers of the parameters of our proposed networks are significantly reduced as opposed to their original versions (ConvLSTM and ConvGRU) on the basis of keeping the performance of the algorithms.(ii)In the original gating unit of ConvGRU and ConvLSTM, we design a sparse gating mechanism, which reduces the parameters of the networks (ConvGRU and ConvLSTM) by reducing the input data of the gating unit.(iii)We evaluate our proposed network on three real traffic flow datasets. Compared with the baseline model, our proposed network reduces the resource consumption and saves the training time under the condition of competitive prediction accuracy.
The outline of this paper is as follows: section 2 describes the related work of traffic flow prediction and sparse neural network. Then, the notation and problem definition are given in Section 3. Section 4 describes our proposed model (sparse convolutional recurrent network) in detail. In addition, Section 5 gives specific parameter settings and experimental analysis based on three real traffic flow datasets. Finally, in Section 6, we summarize the work of this paper and make an outlook for future work.
2. Related Work
2.1. Traffic Flow Prediction
In essence, traffic flow prediction is a time series forecasting problem, which predicts the future traffic value based on historical observations. Generally speaking, traffic flow forecasting methods can be divided into two categories: traditional prediction approaches and deep learning-based prediction approaches.
The first category is the traditional forecasting methods. HA [13] is a classic traffic flow prediction method, which uses the historical average as the prediction value of the next time interval. However, it can only estimate the traffic data of future time intervals and cannot capture the correlation between different time intervals. ARMA [14] prediction is a time series forecasting method for processing stationary series data, which builds the model to predict future values based on the autoregression (AR) model and the moving average (MA) model. ARIMA [15] is a traditional time series prediction approach used to deal with nonstationary series data. Firstly, the input data is converted into stationary series data by the order difference, and then the ARMA method is used to build a prediction model. The disadvantage of them is that they cannot capture the nonlinear characteristics of traffic data. VAR [16] was proposed by Sims et al. in 1980 for time series prediction, which uses the form of simultaneous equations to obtain the relationship between different traffic flows. In addition, the variant methods derived from them are SARIMA [17], SARIMAX [18], VARMA [19], and so on. The limitation of all these traditional methods is that it is difficult to capture the complex spatial and temporal features of traffic flow data because of the limited capacity of the model.
The second category is deep learning-based prediction approaches. Deep learning-based methods have a strong ability in dealing with nonlinear structural data and complex spatiotemporal series data. Therefore, these methods are introduced into predicting traffic flow by many researchers. Fu et al. [20] first employed the LSTM and GRU networks in the field of traffic flow forecasting, and the experimental results on the PeMS dataset prove that the prediction performance is better than the ARIMA model. To capture the hidden patterns of traffic flow data, Dai et al. [21] proposed a DeepTrend model based on a fully connected neural network and an LSTM network. Kang et al. [22] used the LSTM network to analyze the impact of different forms of traffic data on the prediction results, and the experimental results show that more external information is helpful to improve the performance of the model. Luo et al. [23] developed a deep learning method based on K-Nearest Neighbors (KNN) model [24] and the LSTM network. The experimental results on the real-time traffic flow dataset show that it achieves better performance than the existing prediction model. To capture the proximity, periodicity, and trend of traffic data, Wang et al. [25] proposed a deep learning model based on a convolutional recurrent network, which can effectively extract temporal dependence and spatial dependence. Chen et al. [26] developed a deep learning method based on Convolutional Neural Network (CNN) [27], LSTM network, and a fully connected neural network. The results show that the prediction accuracy is improved compared with LSTM and its variants. To improve the accuracy of taxi demand prediction, Li et al. [28] proposed a model based on deep learning, which uses a ConvLSTM network to capture spatiotemporal features.
2.2. Sparse Method of Neural Network
The deepening in the number of layers of the neural network brings about an increase in parameters and resource consumption. Therefore, it is necessary to compress the neural network under the premise of no significant performance degradation.
To improve the training speed and generalization ability of the network, Louizos et al. [29] developed a sparse method using regularization, which reduces the model complexity by pruning parameters. Liu et al. [30] compressed the model by reducing the number of redundant parameters. The experimental results prove that the speed of the training is effectively improved under the condition of minimizing the loss of accuracy. To improve computational performance and reduce the transmission of redundant data, Mukkara et al. [31] proposed a sparse convolutional neural network, which uses a strategy of pruning zero-valued weights to compress parameters. Dettmers et al. [32] proposed a sparse momentum to improve the training speed of deep neural networks. The experimental results on MNIST, CIFAR-10, and ImageNet datasets show that the sparse momentum achieves the most advanced sparse performance. To solve the problem that performance is limited by resources, Alford et al. [33] developed a pruning-based sparse neural network, which prunes the low weight of the trained densely connected network. Luo et al. [34] proposed a compression method for deep neural networks, which reduces the parameters of the network by pruning filters. In this paper, we develop a sparse gating mechanism for convolutional recurrent neural networks, which compresses the model by reducing the input data of the gating mechanism.
3. Notations and Problem Definition
3.1. Grid Map
As shown in Figure 1, we partition a city as an grid map based on longitude and latitude [35], where a grid indicates that this area is located in the row and the column in the map.

3.2. Inflow/Outflow Matrix
Given the time interval, for the region , the inflow is defined as the sum of the crowd flow from other regions to the region at the time interval. The outflow is defined as the sum of the crowd flow from the region to other regions at the time interval. In this way, we can obtain the values of inflow and outflow of the region at the time interval. Then, using the same method for all areas, the values of inflow and outflow of the time interval can be obtained [33]. Figure 2 shows the inflow and outflow of all areas in a certain time interval in Beijing. Therefore, the inflow and outflow matrices at any time interval can be expressed as a tensor.

(a)

(b)
We use to represent all the crowd trajectories at the time interval , and each trajectory is represented by a tuple. which means that this trajectory starts at the region at time interval , and ends at the region at time interval . Then, the computing formula for inflow and outflow of the region can be defined as follows:where is a trajectory. denotes the inflow/outflow in the region at the time interval. represents the start time of the time interval. represents the end time of the time interval, and denotes the cardinality of a set.
3.3. Inflow/Outflow Prediction Problem
As shown in Figure 3, traffic flow prediction is to predict the values of the traffic flow of the next time intervals based on a series of the historical data of the previous time intervals. The formula is as follows:where denotes a flow prediction model, represents the observation values of the previous time interval, and represents the predicted values of time intervals in the future.

4. Sparse Convolutional Recurrent Neural Network
In this section, we introduce the details of the sparse convolutional recurrent neural network. There are two popular forms of convolutional recurrent neural networks: convolutional long short-term memory network and convolutional gated recurrent unit network. Based on the idea of eliminating redundant parameters in the sparse neural network methods, for convolutional long short-term memory network, we develop the sparse convolutional LSTM network (SConvLSTM) and SConvLSTM+. For the convolutional gated recurrent unit network, we develop the sparse convolutional GRU network (SConvGRU) and SConvGRU+.
4.1. Inflow/Outflow Prediction Problem
Figure 4(a) is the convolutional long short-term memory unit proposed by Shi et al. [8], which is composed of a main line part and some gating units. The input of the ConvLSTM unit is the cell state and the hidden state of the previous time interval, and the feature matrix of the current time interval. The output is the cell state and the hidden state of the current time interval. The main line part is mainly about the update of the cell state and hidden state, and the specific update equation is as follows:where represents the Hadamard product. represents the convolutional operation. is the updated equation about the cell state. is the updated equation about the input state. is the updated equation about the hidden state. represents the Tanh function. represent the forget gate, the input gate, the output gate, respectively. denotes the input data of the time interval. are the learning parameters.

(a)

(b)
The gating unit is made of a forget gate, an input gate, and an output gate. The forget gate determines the degree of the forgetting of . The input gate controls the degree of the input state to the cell state, and the output gate determines the degree of the cell state to the output state . The gating mechanism of the ConvLSTM unit uses the hidden state of the previous time interval, the input of the current time interval, and the bias as input. Then, the updated formula of the gating mechanism is as follows:where denotes the Sigmoid function. represent the learning parameter.
The SConvLSTM unit is shown in Figure 4(b). We introduce a sparse gating mechanism to cut out the redundant parameters of the network. The main line part of the SConvLSTM unit is consistent with the ConvLSTM unit. The input of the gating mechanism in the ConvLSTM unit includes and the bias , however, the input of the gating mechanism in the SConvLSTM unit is and the bias . In this way, we can effectively reduce the complexity of the model while ensuring that is propagated to subsequent sequences in the input update formula .
In Section 5, our experimental results on three traffic datasets verify this conclusion. The update formula of the gating mechanism in the SConvLSTM unit is as follows:
The SConvLSTM + unit removes the bias based on the SConvLSTM unit. The amount of network parameters is further reduced. The update formula of the gating mechanism in the SConvLSTM + unit is as follows:
4.2. Sparse Convolutional GRU (SConvGRU) Network
As shown in Figure 5(a), the ConvGRU unit is proposed by Ballas et al. [10] in 2016 to solve the problem of video representation. It is also composed of a main line part and gating units. Compared with ConvLSTM, it has fewer gating mechanisms and parameters, and it reduces the transmission of cell state in subsequent networks. The input of the ConvGRU unit includes the input of the current time interval and the hidden state of the previous time interval, and the output is the hidden state of the current time interval. The main line part is about the update of the hidden state. The updated formula is as follows:where represent the updating gate and the reset gate, respectively. is the updating the hidden state. represents the input update. and b are the learning parameters.

(a)

(b)
The gating units are composed of an update gate and a reset gate. The reset gate determines the importance of the hidden state of the previous time interval to the input update , and the update gate determines the influence of the current input state on . Like ConvLSTM, the input of the gating mechanism of the ConvGRU unit includes the input state of the current time interval, the hidden state of the previous time interval, and the bias $b$. The specific update equation is as follows:where , U, and b are the learning parameters.
The SConvGRU unit is shown in Figure 5(b). Like the SConvLSTM unit, we introduce a sparse gating mechanism to reduce the parameters of the reset gate and the update gate. The main line part of SConvGRU is consistent with ConvGRU, and the input of the gating mechanism is composed of the hidden state of the previous time interval and bias b. The current input state is brought into the subsequent spatiotemporal sequence propagation by the input update formula. Compared with ConvGRU, SConvGRU can help to effectively reduce the number of parameters and the training time of the network while keeping a similar prediction accuracy. The updated equation of the gating mechanism in the SConvGRU unit is as follows:where U and b are the learning parameters.
The SConvGRU + unit is based on the SConvGRU unit, and we further remove the bias b to reduce the amount of network parameters. The updated formula of the gating mechanism in the SConvGRU + unit is as follows:where U is the learning parameter.
4.3. The Overall Framework of Traffic Flow Prediction
In this paper, we use the four algorithms: SConvLSTM, SConvLSTM+, SConvGRU, and SConvGRU + to forecast traffic flow. The overall framework of traffic flow prediction is shown in Figure 6, in which the input of the model is the historical traffic flow matrix. Firstly, we use a multilayer convolutional neural network to extract the spatial dependence of the traffic data. Then, a multilayer network built by the above-mentioned units (SConvLSTM unit, SConvLSTM + unit, SConvGRU unit, and SConvGRU + unit) is used to capture the time dependence while further extracting spatial features. In addition, a multilayer deconvolutional neural network is employed to obtain a prediction matrix with the same dimension as the input.

5. Experiment
In this section, we evaluate our proposed method based on three real-world traffic flow datasets (TaxiBJ, BikeNYC, and TaxiNYC). Then, we describe our experimental process in detail from the aspects of datasets description, model comparison, evaluation metric, hyperparameter setting, and experimental analysis.
5.1. Datasets Description
As shown in Table 1, The TaxiBJ dataset is collected from Beijing taxi GPS data from 07/01/2013 to 10/30/2013. Before the experiment, we, firstly, divide the city into areas, set the time interval to 30 minutes, and count the traffic flow matrix within each 30 minutes. Furthermore, the data are divided into three subsets: the training sets is the data from 07/01/2013 to 10/20/2013, the validation set is the data from 10/21/2013 to 10/25/2013, and the test set is the data from 10/26/2013 to 10/30/2013.
The TaxiNYC dataset comes from the New York Taxi System from 01/01/2015 to 03/01/2015. For this dataset, we divide the city into areas and set the time interval to 0.5 hour. In the data, we set the data from 01/01/2015 to 02/09/2015 and from 02/10/2015 to 02/19/2015 as the training set and validation set in the experiment, respectively. The rest of the data is used as the test set in the experiment.
The BikeNYC dataset comes from the New York Citi bike system from 01/01/2016 to 06/30/2016. For this dataset, we, firstly, divide the city into areas and set the time interval to 1 hour. We set the training set, validation set, and test set to the data from 01/01/2016 to 6/10/2016, from 06/11/2016 to 06/20/2016, and from 06/21/2016 to 06/30/2016, respectively.
5.2. Baseline
In this paper, we employ the subsequent model to compare with our proposed model.(i)HA [13]: HA is a classic time series forecasting method, which predicts the traffic flow using historical average(ii)ARMA [14]: ARMA is a time series forecasting method for stationary series data, which is based on the AR model and MA model to predict future traffic flow(iii)VAR [6]: VAR uses the form of simultaneous equations to obtain the linear relationship between different traffic flows(iv)ConvLSTM [8]: ConvLSTM is a deep learning approach for spatiotemporal sequence prediction, which introduces the convolutional structure into the fully connected LSTM(v)ConvGRU [10]: ConvGRU employs convolutional operation to capture spatial features while preserving the structure of the gated recurrent unit
5.3. Evaluation Metric
In the experiment, we predict the traffic flow of the next time interval based on the previous 10-steps of historical traffic data. We use the root mean square error (RMSE) as an evaluation metric to evaluate the performance of our proposed model. The formula is as follows:where denotes the ground truth of the region , represents the predicted value of the region , and represents the number of prediction areas.
5.4. Experiment on TaxiBJ Dataset
5.4.1. Hyperparameter Setting
For the TaxiBJ dataset, a Min-Max normalization is used to scale the data to at first. The overall framework of the model is shown in Section 4.3, and the detailed settings are as follows: a two-layer convolutional neural network is employed to extract the spatial features of the data. After each convolutional operation, we use batch normalization and the rectified linear unit (ReLU). Furthermore, the number of layers of the units (SConvLSTM unit, SConvLSTM + unit, SConvGRU unit, and SConvGRU + unit) is set to 1. In addition, a two-layer deconvolutional neural network is used to obtain the predicted value of the same dimension as the real value. The detailed description of each module is shown in Table 2.
5.4.2. Convergence Analysis
Figures 7(a) and 7(b) are the loss curves of the ConvGRU model, SConvGRU model, and SConvGRU + model on the training set and validation set. From the figure, we can see that with the increase of training times, the RMSE on the training set and the validation set gradually decreases, and finally converges. Furthermore, we can see that the SConvGRU + model converges to a lower RMSE on the validation set, compared to the ConvGRU model and SConvGRU model. In addition, on the training set, the three models maintain a consistent convergence trend. In summary, the convergence speed and performance of our proposed methods do not reduce under the condition of compressing the original models.

(a)

(b)

(c)

(d)
Figures 7(c) and 7(d) are the loss curves of the ConvLSTM model, SConvLSTM model, and SConvLSTM + model on the training set and validation set. We can see from the figure that the three models basically maintain consistent convergence performance on the training set and validation set. Similarly, the model achieves convergence on the validation set. In a word, the convergence performance of our proposed models does not reduce.
5.4.3. The Comparative Results of Different Models
Table 3 describes the RMSE comparison between the traditional model and our proposed model on the test set. First of all, we can see that the deep learning method is better than the traditional predictive model, which shows that the deep learning method can better capture the nonlinear characteristics of the data. Compared with the ConvGRU model, the RMSE of the SConvGRU + model increases by 1.8%. This result shows that the parameters of the model are effectively reduced under the condition of losing limited prediction accuracy. In addition, compared with ConvLSTM, SConvLSTM + improves the prediction accuracy of the model by 4.5% while reducing the number of model parameters. In general, SConvLSTM achieves the best performance on the TaxiBJ dataset. This result shows the effectiveness of the sparse convolutional recurrent network.
5.4.4. The Rate of Model Compression
Table 4 shows the comparison of the rate of model’s compression. We use ConvGRU and ConvLSTM as the baselines to calculate the rate of parameter reduction of our proposed models. Compared with ConvGRU, the rates of parameter reduction of SConvGRU and SConvGRU + are 13.3% and 13.5%, respectively. Compared with ConvLSTM, the rates of the parameter reduction of SConvLSTM and SConvLSTM + are 15% and 15.1%, respectively. When the same hyperparameters in Table 2 are set, the number of the parameters of ConvLSTM, SConvLSTM, and SConvLSTM + are more than ConvGRU, SConvGRU, and SConvGRU+, respectively. The reason for this result is that ConvLSTM, SConvLSTM, and SConvLSTM + have more gating mechanism than ConvGRU, SConvGRU, and SConvGRU+.
5.5. Experiment on TaxiNYC Dataset
5.5.1. Hyperparameter Setting
For the TaxiNYC dataset, similarly, we first use the Min-Max normalization approach to scale the original data to . The framework of the model is shown in Section 4.3. Different to the setting on the TaxiBJ dataset, the number of layers of the units (SConvLSTM unit, SConvLSTM + unit, SConvGRU unit, and SConvGRU + unit) are set to 2, and the hidden channels of ConvGRU unit, SConvGRU unit, and SConvGRU + unit are set to 32 (first layer) and 64 (second layer). Furthermore, the hidden channels of ConvLSTM unit, SConvLSTM unit, and SConvLSTM + unit are set to 32 (first layer) and 32 (second layer). Then, the detailed settings are shown in Table 5. In addition, other hyperparameter settings are shown in Table 2, which are the same as those of the TaxiBJ dataset.
5.5.2. Convergence Analysis
Figures 8(a) and 8(b) are the loss curves of the ConvGRU model, SConvGRU model, and SConvGRU + model on the training set and validation set. From the figure, it can be seen that the three models maintain a consistent convergence curve. Furthermore, RMSE gradually decreases with the increase of epoch on the training set and validation set and finally converges.

(a)

(b)

(c)

(d)
Figures 8(c) and 8(d) are the loss curves of the ConvLSTM model, SConvLSTM model, and SConvLSTM + model. Firstly, the model achieves convergence on the validation set. Then, their convergence on the training set and validation set is basically the same, and the baseline model on the validation set converges to a lower value.
5.5.3. The Comparative Results of Different Models
From Table 6, in the traditional model, VAR achieved a good result of 14.306. Compared with ConvGRU, the RMSE of SConvGRU+ is increased by 0.07. Similarly, compared with ConvLSTM, the RMSE of SConvLSTM+ is increased by 0.029. This result shows that the parameters of the model are effectively reduced under the condition of losing limited prediction accuracy. On the whole, the prediction results of the ConvLSTM model, SConvLSTM model, and SConvLSTM + model are better than the ConvGRU model, SConvGRU model, and SConvGRU + model. This result shows that ConvLSTM and sparse ConvLSTM can better capture the nonlinear structures on this dataset.
5.5.4. The Rate of Model Compression
The rate of compression on different models are shown in Table 7 on the TaxiNYC dataset. Because of the different number of hidden channels, the number of parameters of ConvLSTM-based models is less than that of ConvGRU-based models. Compared with the ConvGRU model, the compression rates of the SConvGRU model and the SConvGRU + model are 22.2% and 22.3%, respectively. Compared with the ConvLSTM model, the compression rates of the SConvLSTM model and the SConvLSTM + model are 32.1% and 32.3%, respectively.
5.6. Experiment on BikeNYC Dataset
5.6.1. Hyperparameter Setting
Similarly, the overall framework of the model is shown in Section 4.3, which consists of a two-layer convolutional neural network, a two-layer unit, and a two-layer deconvolutional neural network. In addition, the specific details of the framework are shown in Table 8. Other hyperparameter settings are shown in Table 2, which are the same as those in the TaxiBJ dataset.
5.6.2. Convergence Analysis
Figures 9(a) and 9(b) are the loss curves of the ConvGRU model, SConvGRU model, and SConvGRU + model on the training set and validation set. From Figure 9(a), we can see that the RMSE of the model gradually decreases with the increase of epoch. Furthermore, we can find that the baseline (ConvGRU) is reduced to a lower RMSE compared to SConvGRU and SConvGRU+ in the training process. However, ConvGRU has worse performance in the validation set compared with our proposed models. From Figure 9(b), we can see that the RMSE of the model decreases first and then increases with the increase of the epoch. The reason for this result may be that the model is overfitting. In addition, the SConvGRU model and the SConvGRU + model achieve better performance on the validation set, which proves that the prediction accuracy is not reduced under the condition of reducing the number of parameters.

(a)

(b)

(c)

(d)
Figures 9(c) and 9(d) are the loss curves of the ConvLSTM model, SConvLSTM model, and SConvLSTM + model. Like Figures 9(a) and 9(b), we can see that a lower RMSE on the training set can be gained by ConvLSTM compared to SConvLSTM and SConvLSTM+, however, ConvLSTM on the validation set shows worse performance than SConvLSTM and SConvLSTM+. The reason for this result may be that when training epoches exceed a certain number, the model is overfitting. On the validation set, the SConvLSTM model and the SConvLSTM + model achieve a lower RMSE.
5.6.3. The Comparative Results of Different Models
In the BikeNYC dataset, it can be seen from Table 9 that SConvLSTM achieves the best performance. Compared with the ConvGRU model, the SConvGRU model and the SConvGRU + model achieve a lower RMSE. Similarly, compared with the ConvLSTM model, the SConvLSTM model and the SConvLSTM + model also achieve better performance. This result proves that under the condition of reducing the parameters of the model, the prediction effect of the model is improved at the same time. The reason for this result maybe that part of the parameters of the gating mechanism in ConvLSTM and ConvGRU are redundant. Furthermore, on the whole, compared with the ConvGRU model, the SConvGRU model, and the SConvGRU + model, the ConvLSTM model, the SConvLSTM model, and SConvLSTM + model achieve better RMSE, which shows that the ConvLSTM model and the sparse long short-term memory network have a stronger ability to capture the spatiotemporal information.
5.6.4. The Rate of Model Compression
Table 10 shows the rate of compression on different models on the BikeNYC dataset. The ConvGRU model and ConvLSTM model are used as the benchmark to measure the rate of compression of our proposed model. Compared with the ConvGRU model, the compression rates of the SConvGRU model and the SConvGRU + model are 25.6% and 25.7%, respectively. Compared to the ConvLSTM model, the compression rates of the SConvLSTM model and the SConvLSTM + model are 28.8% and 28.9%, respectively.
6. Conclusion and Future Work
In this paper, we study how to compress the convolutional recurrent networks while keeping the competitive results with traditional algorithms. To solve this problem, we propose a sparse convolutional recurrent network framework, in which a sparse gating mechanism is developed. For ConvGRU, we develop the SConvGRU unit and SConvGRU + unit. For ConvLSTM, we develop the SConvLSTM unit and SConvLSTM + unit. Based on three real traffic flow datasets, the experimental results prove that our proposed methods are able to effectively reduce the number of model parameters while keeping the prediction accuracy.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors have no conflicts of interest.
Acknowledgments
This research was supported by the National Natural Science Foundation of China (nos. 62062033 and 62067002), the Natural Science Foundation of Jiangxi Province (Nos. 20212BAB202008 and 20192ACBL21006), Key Research and Develop Project of Jiangxi Province (no. 20203BBE53034), and the Education Department Project of Jiangxi Province (GJJ200604).