Abstract

Pavement performance prediction is a crucial issue in big data maintenance. This paper develops a hybrid grey relation analysis (GRA) and support vector machine regression (SVR) technique to predict pavement performance. The prediction model can solve the shortcomings of the traditional model including a single consideration factor, a short prediction period, and easy overfitting. GAR is employed in selecting the main factors affecting the performance of asphalt pavement. The SVR is performed to predict the performance. Finally, the data collected from the weather station installed on Guangyun Expressway were adopted to verify the validity of the GRA-SVR model. Meanwhile, the contrast with the grey model (GM (1, 1)), genetic algorithm optimization BP[[parms resize(1),pos(50,50),size(200,200),bgcol(156)]]081%, −0.823%, 1.270%, and −4.569%, respectively. The study concluded that the nonlinear and multivariate prediction model established by GRA-SVR has higher precision and operability, which can be used in long-period pavement performance prediction.

1. Introduction

Big data maintenance is a central issue in highway management. Highway maintenance mileage accounted for 97.7% of the mileage of traffic in China by the end of 2018. Notably, the expressway has been transferred from the construction to the maintenance period. With the popularity of big data technology, roads have entered the era of big data maintenance. However, the reason why the performance of asphalt pavement is a vital component of maintenance management and operation is that the rational allocation of maintenance decision-making and maintenance funds are determined by an accurate prediction model in the later period. Therefore, the scientific establishment of the pavement performance prediction model is significant for asphalt pavement maintenance and can provide a model for big data maintenance.

The pavement management system (PMS) is applied for road life cycle management. However, it generally uses analytical tools and statistical methods to predict pavement performance [1]. Predicting of pavement performance is critical, but it is very complex, because the performance of asphalt pavement is affected by the combination of structural design, material properties, construction quality traffic load natural factors, and maintenance [2]. The pavement performance prediction model is a relationship that characterizes the variation of pavement performance with time, material, and traffic load [3].

There are different methods available for the determination of pavement performance; many scholars have attempted to develop a scientifically derived accurate model. There are four types of prediction models: uncertainty model, certainty model, dynamic model, and bionic model [4].(i)Uncertainty model: the commonly used model is the grey theoretical model that has the characteristics of a small amount of data, high prediction accuracy, and a simple calculation method. Therefore, it is widely used in pavement performance prediction. For example, Zhang et al. [5], Shen and Du [6], Wang and Li [7], and Zhang and Ji [8] used this model to predict pavement smoothness and rutting. Peng et al. [9] applied Weibull distribution to pavement performance prediction and obtained ideal results.(ii)Certainty model: it is an empirical method; it takes advantage of using traditional regression as a tool to fit the data that come from experiments and finite element mechanics to get the form and parameters of the model. For example, Sun and Liu [10] proposed the decay equation of asphalt pavement performance which was obtained through engineering experiment. Abed et al. [11] investigated the variability effect of thickness and stiffness of pavement layers; they used the Monte Carlo method to obtain the probability distribution function of pavement performance by using the parameters. Gong et al. [12] proposed a regularized regression method to estimate the asphalt concrete moduli with data available from the long-term pavement performance (LTPP) database.(iii)Bionic model: this model has high prediction accuracy. Yang et al. [13] used the genetic neural network model to estimate rutting and driving quality. Bianchini and Bandini [1] proposed the neuro-fuzzy hybrid model to predict the present serviceability index (PSI). Ferreira and Lima Cavalcante [14] and Beltran and Romo [15] presented the application of artificial neural networks (ANN) in pavement performance.(iv)Dynamic model: it is based on the traditional model. For instance, Shen et al. [16] improved the traditional grey model and proposed a dynamic grey model. Chen et al. [17] combined the US PME model with grey prediction theory and mechanical experience method, proposed a dynamic grey prediction model, and established a DGM-PME combination model to forecast the rutting. Chu and Durango-Cohen [18] used the autoregressive moving average time series state space method to predict the structural strength of the pavement. El-Badawy et al. [19] developed a comprehensive bottom-up fatigue cracking distress dynamic prediction model integrating the Mechanistic-Empirical Pavement Design Guide (MEPDG) and performance test methodology.

To date, various pavement performance prediction models have been proposed by scholars, but the models still have defects. For example, the grey model just adopts the time factor and does not take into account other factors such as natural environment and traffic load which may have maximum impact. And, as the forecast period increases, the stability and accuracy of the prediction are decreased. Weibull distribution model is only suitable for small sample data prediction. The certainty model is mainly determined by factors like the initial performance index of asphalt pavement and the road age. It is simple and convenient to use for it does not consider the reasonable dynamic data. It only can predict short period performance. The genetic neural network and ANN model are prone to overfitting when data are insufficient. The dynamic prediction model can make full use of the later data to predict longer periods. Simultaneously, the reason why the model can only consider the impact of time on pavement performance is that the model is based on time series. Hence, a new model is needed to be devised to be applied to pavement performance prediction.

Recently, support vector machines have been applied in various fields. Zhao et al. [20] proposed a k-means and SVM hybrid model for the development of an electric vehicle urban driving cycle. Hoang et al. [21] used it to recognize the pavement crack. Wang et al. [22] proposed a support vector machine online model for predicting metro ridership. Karballaeezadeh et al. [23] applied this model to the prediction of road residual life and compared the model with an artificial neural network (ANN) and multilayer perceptron (MLP) models. The results show that the support vector machine model has the highest accuracy.

The factors affecting the performance of asphalt pavement were processed firstly by GRA. The SVR with advantages of minimizing structural risk and strong generalization performance was then used to establish a hyperplane as a decision surface. Finally, the asphalt pavement performance prediction model was established to provide a model that can be applied to maintenance decision-making, maintenance fund investment, and big data of pavement maintenance.

The structure of this paper is as follows. Section 2 mainly introduces the main principles of GRA-SVR. Section 3 contains the modeling process of the whole model. Section 4 mainly uses the model to verify the example. Finally, the results are analyzed.

2. Methodology

2.1. Basic Principles of GRA-SVR
2.1.1. Basic Principle of Grey Relation Analysis

The grey system theory holds that the complex objective systems which are all ordered and discrete data must contain inherent laws [24]. There are many factors affecting the performance of asphalt pavement, but the effects of various factors are not very clear, so that we can call the factors grey. Therefore, GRA is used to quantitatively reflect the correlation between asphalt pavement performance and various factors. This method can find the main factors from many factors that affect pavement performance. The corresponding statistical data of the influencing factors in the system are converted into geometric curves by the method, and the closer the curve geometry is to the dependent variable, the greater the degree of association is [25].

2.1.2. Basic Principle of Support Vector Machine Regression

SVR is a model derived from the support vector machine (SVM) proposed by VAPNIK [26]. The SVM model is a machine learning method that mainly solves the classification problems of small samples, nonlinearities, and high-dimensional data [27, 28]. Its principle is based on the VC theory of statistical principle and structural risk minimization, and the optimal solution in data mining is sought by establishing an optimal hyperplane [29]. Usually, we reduce the dimension of the sample to simplify the problem, while the SVM method is the opposite. It uses the kernel function to map the sample points to high-dimensional and even infinite-dimensional space to deal with linear problems as shown in Figure 1.

Regression is essentially similar to classification. The SVM classification model is to manage a plane so that the support vectors of the two classification sets or all the data are farthest from the classification plane, and the SVR model is to find a regression plane so that all data of a collection could be closest to the plane, as shown in Figure 2. The SVR can predict the prediction vector of the test data by establishing a nonlinear relationship between the data tested in the training data and the support vector. Most of the various influencing factors of asphalt pavement performance are nonlinear. The specific method is as follows.

Assume the sample set , , Then, y and x in the sample set can be expressed as follows [2]:where and b are the coefficients of the hyperplane.

If the original data fit well with the support vector machine regression, then is as follows [2]:where ε is a positive number.

Equation (1) is transformed into (3) by introducing the Lagrangian logarithm [2]:where aI and are the sample support vectors, which take a value of zero in most cases.

The above process is the linear regression principle of SVR, but the effects of the factors including rainfall, traffic volume, maximum temperature, and minimum temperature for the pavement performance are nonlinear. When dealing with the nonlinear problem of the SVR, the sample xi is mapped to a high-dimensional space by . An optimal hyperplane should be constructed to solve the “dimensionality disaster”; the inner space operation is implemented using the original spatial parameters when is unknown. The internal kernel function can be obtained when the kernel function satisfies the condition of Mercer [30]. At the same time, Lagrange changes are introduced to get equation (4) [31]:

Finally, the transformed regression function [31] is as follows:

This method can avoid overfitting caused by traditional methods. SVR nonlinear regression fitting could control the fitting process by increasing the dimension. The high generalization performance that is closely related to the choice of kernel function is a big advantage of SVR.

Commonly used kernel functions are listed as follows [32]:(1)Linear kernel function: ;(2)Polynomial kernel function: ;(3)RBF kernel function: (4)Sigmoid kernel function: .

μ, r, and p are parameters of the kernel function.

However, each type of kernel function has different advantages and disadvantages:① Linear kernel functions are used to generalize linear samples.② Polynomial kernel functions are mostly used to process text data.③ Although Sigmoid kernel function has higher accuracy, it is complicated, which increases the complexity of the whole model.

Therefore, in this paper, the RBF kernel function is used for support vector machine regression prediction.

2.2. Construction of GRA-SVR Asphalt Pavement Performance Model
2.2.1. Selection of the Best Parameters

It is important to select the appropriate penalty parameter c and kernel function parameter to ensure the accuracy of the entire model when using SVR for prediction. Therefore, the CV method is generally adopted to solve this problem, which is a statistical analysis method for verifying the performance of the model. The principle is to group the original data and divide them into verification and training sets. In this way, it is possible to effectively avoid the states of underlearning and overlearning and ultimately obtain the accuracy. Common CV methods are as follows:(1)Hold-Out Method: the method randomly divides the data into two categories: one is the training set used to train the model, and the other is the verification set used to verify the model [20]. The final accuracy is the performance metric of the model.(2)LOO-CV: assuming there are N samples in the original data, that is why the model is called N-CV, so each sample is an independent verification set, and the remaining N-1 samples are training sets; thus, N models were obtained. The average accuracy of the final validation set is used as a performance indicator for the model. However, due to the high computational cost, the model has difficulties in practical operation.(3)K-CV: the original data are equally divided into K groups. The data of each group are used as verification set once, and the remaining data of other K-1 groups are used as a training set; therefore, K models are obtained. Then, the average of the classification accuracy calculated from the final verification set of those K models is used as the performance index of this model [33]. This method is more accurate due to the fact that it can effectively avoid the states of underlearning and overlearning.

According to the comparable selection of the three methods, the K-CV model is finally adopted to cross-validate and select the best penalty parameter c and function parameter . The specific method is as follows. Firstly, the parameters c and are limited to a specific range, and then the K-CV model is used for the training set in the range to obtain the accuracy. Finally, the parameters c and which make the training set with the highest accuracy are selected as the optimal parameters. The concrete implementation can be implemented using the libsvm3.20 tool.

2.2.2. Construction of Asphalt Pavement Performance Model

The pavement performance is affected by many factors. The factors, acting on performance, are uncertain and nonlinear. Hence, the performance and factors integrate a grey system. Therefore, the grey correlation analysis can be used as an attribute processor to select several important influencing factors, and then the SVR is used to perform the regression prediction. Through the establishment of the comprehensive model GRA-SVR to predict the trend of pavement performance under the influence of various factors, the specific modeling process is shown in Figure 3.

Specific steps are as follows:(1)Select dependent and independent variables.(2)Establish a raw data matrix: , . represents a certain level of the first influencing factor.(3)Data normalization.(4)Calculating the difference sequence [34]is as follows:(5)Achieving the largest and smallest difference of the sequence [34] is as equation (7). Write the maximum value as M and the minimum value as N:(6)Calculating the correlation coefficient of each sample [35] is as follows:ξ is called the resolution coefficient. When ξ ≤ 0.5463, the resolution is the best. Usually, the value of ξ is 0.5, which is also taken in this paper.(7)Calculating the correlation between each influencing factor and the system [35] is as follows:(8)Choose the factors that have a greater influence on pavement performance.(9)To improve the accuracy and training speed of the model and prevent big numbers of consuming decimals during the calculation process, the data should be normalized and processed to the interval [0, 1].(10)RBF which is researched has a high precision [36, 37], and this paper selects the RBF kernel function to predict the performance.(11)K-CV model is used to cross-validate and select the best penalty parameter c and function parameter .(12)Using the optimal parameters for SVR fitting, the prediction data are obtained.

3. Case Verification

3.1. Data Acquisition

This paper is based on the highway from Guangzhou to Yunfu (Guangyun highway) and the installed weather station in 2010, and it can collect the climate data including road temperature, humidity, wind speed, and solar radiation. The installation details and pavement structure are shown in Figures 4 and 5. Among them, the pavement temperature detection uses the ZDR-41 temperature sensor, subgrade temperature, and humidity testing to use a 5TE sensor (see Figure 6). The climate of Guangdong province is humid and the temperature is extremely high, rising to 41°C. Under the influence of large traffic volume, the rutting is serious as shown in Figure 7. The RDI prediction models GRA-SVR, PPI, GA-BP, and GM (1, 1) were established to analyze the accuracy of each model, which were based on the RDI, maintenance funds, traffic volume, and data collected by the weather station from 2011 to 2018 (see Table 1 for the survey results).

The factors, pavement structure, and materials should be considered in performance prediction. Usually, the pavement structure needs to be calculated as a numerical value. To address this issue, the structures number [12, 3840] is usually adopted. However, it needs to be calculated in two cases as follows:(i)Different structures: in this case, the thickness and material of each layer of the road are different. The structural number [41] (SN) is adopted according to the AASHTO guide for design of pavement structures. The road network level performance prediction can apply this case. The specific calculation method is as follows:where ai is ith layer coefficient; this parameter needs to be obtained through experiments, Di is ith layer thickness, and mi is the ith layer drainage coefficient.(ii)Same structure: the performance of the pavement material can be affected by the environment, and the structural bearing capacity is changed. The pavement structural bearing capacity can be expressed by the pavement structure strength ratio (SSR) [42]. The specific calculation method iswhere is pavement deflection standard value (0.01 mm), where is pavement measurement representing deflection (0.01 mm); this parameter needs to be obtained through multifunction vehicle.

This paper relies on engineering only one pavement structure, so the calculation of SSR represents the influence of pavement structure on pavement performance.

3.2. Grey Relation Analysis

The correlation of the data can be analyzed in Table 2; the correlation degree of each influencing factor can be obtained, as shown in Table 2.

The effects of various factors on rutting are sorted as follows:

Generally, the greater the degree of relevance, the better the correlation of factors to the main direction of system development, that is, the greater the influence of this factor on the evaluation index. When γ > 0.8 is well correlated, when γ = 0.6∼0.8, the correlation is good. We can see that γ of these 18 factors is greater than 0.6, indicating that these factors have an impact on the rutting. Among the 19 factors, γ of 12 factors is greater than 0.8, indicating that these 12 factors have a strong influence on the formation of rutting.

So, the better relevant factors that have the greatest impact were selected to establish the model, and the other factors were removed. The selected results are as follows:Equivalent single axle loads > maintenance funds > pavement structure strength ratio > mean value of soil moisture > highest temperature in the middle surface > highest temperature in the road surface > annual cumulative total radiation > annual average rainfall > lowest temperature in middle surface > highest temperature in the upper surface > lowest temperature of upper surface > highest temperature in lower surface.

The following can be observed from the above analysis:(1)The primary factor, the formation of rutting, is the equivalent single axle loads. The greater equivalent single axle loads are, the more serious the rutting is. The reason is that, under the action of traffic load, large shear stress will be generated in the asphalt pavement, which will cause irreversible cumulative deformation in the surface layer.(2)The maintenance funds have a significant repairing effect on the rutting. For example, in this section of the highway, the maintenance funds were RMB 81,500 in 2013. The traffic volume and rainfall increased, but the rutting disease was significantly improved in 2014.(3)The degree of relevance SSN is 0.9301. It shows that SSN has a greater impact on the rutting. The specific reason is that water, solar radiation, and temperature have an impact on the pavement material, and the structural bearing capacity is insufficient, resulting in the occurrence of rutting.(4)The annual cumulative radiation ages the asphalt and accelerates the formation of the rutting. After the aging of the asphalt, the overall shear resistance of the asphalt surface layer is reduced, resulting in a decrease in the rutting resistance. For example, the annual cumulative radiation was the largest in 2015, and the rutting in 2016 was more serious.(5)The maximum shear stress generally occurs in the midsurface, and the rainfall and wind speed accelerate the heat dissipation of the highest temperature of the environment and road surface. Based on the above factors, the influence of the highest temperature of the middle layer on the formation of the rutting is greater than the highest temperature of the road surface and the upper layer.(6)Under the action of traffic load, the water infiltrated into the asphalt surface layer by soil and rainfall will become high-pressure water, which will reduce the bond behavior between asphalt and aggregate, resulting in lower pavement strength and lower resistance to rutting.(7)The lowest temperature of the road surface would cause other diseases on the asphalt pavement, which indirectly lead to the occurrence of rutting.

The dimensionally reduced data are normalized by software, and the processing results are shown in Table 3.

3.3. Penalty Parameter Selection

In this paper, the optimal penalty parameter c and function parameter are solved by K-CV cross-validation model to select the best penalty parameter c and function parameter (see Figure 8). The axis of abscissa indicates the value of c after taking the base 2 logarithm. The ordinate axis represents the value of after taking the base 2 logarithm. Contour lines indicate errors in the range of c and . When the error is the smallest, the corresponding c and are the best. First, c and are initially selected. The range of c is within and that of is within . When the error is 0.0572, the optimal penalty parameter is c = 64.0 and  = 0.0039.

By primary election, the range of values for c can be reduced to and can be reduced to (see Figure 9). At the same time, reduce the interval between the contour and the three-dimensional view. When the error is 0.0605, the optimal penalty parameter is c = 4.0 and  = 0.0884.

4. Results and Discussion

The GRA-SVR, GM (1, 1) [43], GA-BP [44], and PPI model were applied and compared to predict the RDI of 2018 which was based on the training set consisting of various factors and RDI from 2011 to 2017. The PPI [10]model is as follows:where PPI is the performance index; PPI0 is the initial performance index; y is the road age; α and β are mode parameters. In this paper PPI0 = 94; y = 8; α = 13.2; β = 1.409.

The comparative analysis of the predicted and actual values of different models is shown in Table 4, the accuracy comparison was shown in Table 5, sand the corresponding variation trend and actual value of different models were shown in Figures 10 and 11.

The evaluation parameters of the four models obtained from Table 5 in predicting RDI are as follows:Correlation coefficient: GM (1, 1) (0.856) < PPI (0.879) <GA-BP (0.984) <GRA-SVR (0.992)RMSE: GA-BP (0.298) < GRA-SVR (0.499) <GM (1, 1) (1.304) <PPI (3.270)Relative error: GRA-SVR (0.081) <GM (1, 1) (0.823) <GA-BP (1.270) <PPI (4.569)

The GRA-SVR and GA-BP models all showed good performance in terms of the overall correlation and deviation of the predicted value from the true value. However, with respect to relative error in 2018, GRA-SVR is the best, followed by GM (1, 1). Figure 11 shows the relative errors of the predicted and true values for the four models from 2011 to 2018. It can be observed that the relative error of the GA-BP model is the smallest, higher than GRA-SVR in 2016, and higher than GM (1, 1) in 2018 from 2011 to 2015. This is because the model is prone to overfitting for samples with small data, resulting in reduced prediction accuracy.

The trends of the predicted and actual values from different model RDIs were depicted in Figure 10(a). It can be seen that the GRA-SVR and GA-BP models display nonlinear trends, which are close to the actual value. The other two models show a linear relationship, which is different from the actual value.

All four models have good accuracy in short period prediction (see Figure 10(b)), but the accuracy would change with the prediction period increasing (see Figure 10(c)); the GRA-SVR model has the highest prediction accuracy because the old data were replaced by the new prediction data as the new training set. The GA-BP takes second place. Thirdly, the GM (1, 1) model just used the data of 7 years, and the accuracy reduced as the new data are not replenished in time with the time increases. The PPI model has the worst prediction accuracy, which was due to the fact that the model only uses the first-year data for prediction. As the prediction period increases, the controllability of the model decreases. In order to verify the accuracy of the model, the pavement surface condition index (PCI) and pavement skidding resistance index (SRI) prediction applied this model. The relative error was −0.115% and 0.111%, respectively.

For the GRA-SVR and GA-BP model modeling process, more important factors that affect the production of rutting should be considered, so the modeling process is more complex than the other two models, but the prediction results are stable. The PPI model just considers the age and regional conditions, and the main factors affecting the pavement performance were unutilized; therefore, the prediction accuracy is lower. In the GM (1, 1) model, the time factor was only considered, whose prediction accuracy depends greatly on the accuracy of the annual data. If the data of a certain year are deviated, the whole system trend will have a large error, and the ease of operation of the model is between the other models. Therefore, the GRA-SVR model is suitable for multivariate, long-period, and nonlinear prediction of pavement performance.

The accuracy, prediction period, and operability of the three models are compared and analyzed. The results are shown in Table 6.

Overall, our study establishes the model that has offered better performance than other models. However, there are also limitations. In the future study, we want to choose the best parameters with better methods including genetic algorithm and particle swarm optimization. These algorithms are also widely used in other fields. If we find a better optimization method, we can make the prediction accuracy higher. We will build the database with more road information. Then, the GRA-SVR model at the computing terminal is used to predict the performance. Some decision model is applied to maintenance decision. Finally, the results are uploading the pavement management system (see Figure 12). We firmly believe that this will have far-reaching implications for road maintenance projects.

5. Conclusion

In this study, a GRA-SVR predictive hybrid model, combining the grey correlation analysis with support vector machine regression, was proposed for the first time to be applied to predict the performance of asphalt pavement. The main conclusions are drawn as follows:(1)The main factors including equivalent single axle loads, maintenance funds, highest temperature in the middle surface, pavement structure strength ratio, average value of soil moisture, highest temperature in the road surface, lowest temperature in the road surface, highest temperature in the upper surface, annual average rainfall, annual cumulative total radiation, highest temperature in the upper surface, annual average rainfall, lowest temperature of upper surface, highest temperature in lower surface, lowest temperature in lower surface, and annual maximum wind speed are well correlated in pavement performance.(2)Compared with other models, the GRA-SVR model is highly accurate and time-independent, which makes it suitable for short and long period predictions.

In conclusion, the GRA-SVR model is applicable for a multivariate, long period, and nonlinear performance of pavement prediction and is restricted by the amount of data. It is reliable for asphalt pavement maintenance decision-making. At the same time, this model can also be applied to big data road maintenance prediction.

Data Availability

This paper is from the Guangdong Provincial Department of Transportation (2015-02-011), and the data come from the project team experiment.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This research was funded by Guangdong Provincial Communication Department, Science and Technology Project (Grant no. 2015-02-011). The authors’ special thanks go to all the subjects that participated in the data acquisition.