Abstract
Aimed at the problem of the large error caused by uncertain factors in the fitting process of the traditional multiplicative seasonal model, the advantages of the Markov chain in this study are applied to the multiplicative seasonal model to optimize the prediction results. Based on the residual value between the theoretical and actual values, the values of different intervals are divided into states. The transition probability matrix is established through different probabilities; then, the weighted sum of different prediction probabilities is carried out to select the optimal prediction state. The real number of Meizhou Bay portlight buoys is used to verify the prediction effect of the model, and MAE, MAPE, RMSE, RRMSE, SSE, R2 are used to calculate the error between the predicted value and the actual value. The results show that compared to the traditional multiplicative seasonal model and other prediction models, the prediction MAE of the MC-SARIMA model is decreased by 2.19003794, the MAPE is decreased by 0.66%, the RMSE is decreased by 2.092671823, the RRMSE is decreased by 0.006221352, the SSE is decreased by 404.0231931, and the R2 is increased by 0.224686247. It shows that the multiplicative seasonal model optimized by the Markov chain can predict the azimuth data of the light buoy more effectively than the traditional multiplicative seasonal model and other prediction models.
1. Introduction
As an important time series analysis method, the multiplicative seasonal model (SARIMA) [1] can establish a specific mathematical model according to the correlation between the data, integrate the seasonal, trend, random interference, and other characteristics of the time series data to predict its future data, and then compare the actual data with the real data, and the prediction results are better. Since the study of the offset azimuth of the light buoy is still in the preliminary stage, this study will review the literature from the perspectives of the SARIMA optimization model and time series prediction model.
The prediction model based on optimization of the multiplicative seasonal model is widely used in the fields of medicine, economy, meteorology, hydrology, and transportation throughout the world [2–6]. For example, Parviz [7] used the time series decomposition hybrid model based on SARIMA and used cluster analysis to analyze the time series. The results show that the hybrid model is a valuable decision-making process tool, which effectively improves the level of precipitation forecast. Peirano [3] et al. combined LSTM with SARIMA to predict the inflation rate of five emerging economies in Latin America. The results show that the LSTM-SARIMA model has high precision in predicting the expansion of associates. To effectively predict the incidence of the cholera virus, Daisy et al. [8] proposed an SVM-SARIMA, which combines the two main factors influencing rainfall and maximum temperature, to evaluate its relationship with the incidence of the cholera virus. The results show that the model can effectively predict the incidence of the cholera virus and make better preparations for future public health management. Based on the demographic and distribution characteristics of AHC, Liu [9] et al. constructed the ETS-SARIMA model to predict the trend of incidence of AHC in mainland China. The results showed that AHC mainly occurred in farmers and children under 9 years old in southern and eastern China. The prediction results were good and the error was low. Wang and Zhou [10] proposed a new runoff prediction method that combines FPCA and time series analysis. The results show that, compared to traditional one-dimensional FPCA and SARIMA, the combination of FPCA and time series analysis is more suitable for runoff prediction. Berke [11] et al. used the ANN-SARIMA model to predict the monthly incidence of cryptosporidiosis and used MAPE and RMSE to check the error. The results show that the ANN-SARIMA model can effectively predict the public health time series of the monitoring system. To reduce the dependence on fossil fuels and optimize the integration of renewable resources, Blazquez-Garcia [1] et al. proposed a SARIMA model to predict the energy consumption of green elevators integrated with photovoltaic and batteries in different short periods and used a genetic algorithm to optimize the combination of the selected parameters of SARIMA. The results show that the model has good effect in short-term prediction. Qian [6] et al. proposed a SARIMA-GARCH model based on the periodicity and trend of the time series data of the monthly passenger traffic. This model can effectively improve the characterization accuracy and eliminate the influence of heteroscedasticity. The ARCH test is used. The results show that, compared to the SARIMA model, the short-term prediction performance of the SARIMA-GARCH model is better.
The improved methods above improve the prediction accuracy of SARIMA to a certain extent and take into account the different special circumstances of time series, but there are also some shortcomings. For example, ANN, as a widely used algorithm, can fully approximate any linear relationship with complex structures and adopt the parallel distributed processing method. However, the human and material resources’ loss of the algorithm is too large, which requires a lot of time to control the details of the algorithm. The famous “black box” problem of the algorithm is also a major problem that needs to be solved by the algorithm. The SVM model has the advantages of nonlinearity, high precision, and good generalizability, but it is difficult for the algorithm to train time series with large amounts of data. Therefore, the operation of the algorithm requires a lot of machine memory and operation time, and SVM has a certain difficulty in solving the classification problem. The ETS algorithm is flexible in use, has wide application, and can obtain the prediction results required by the model based on fewer training data. However, due to the large proportion of the short term given by the ETS algorithm and the small proportion of the long term given by the ETS algorithm, it is unable to predict the corresponding time series for a long time. Moreover, the algorithm has certain randomness in selecting the smoothing index, which cannot withstand scrutiny.
Research on the prediction model is based on time series [12]. A PCA prediction method for sports event model evaluation is proposed to solve the problems of poor average fitness and low-risk prediction accuracy of traditional sports event model evaluation prediction methods. The results show that the method has good average fitness. Tian [13] proposed a new prediction method for short-term wind speed based on local mean decomposition (LMD) and combined kernel function least squares support vector machine (LSSVM), and the results show that the proposed prediction method has higher prediction accuracy and can reflect the laws of wind speed correctly. Li et al. [14] constructed a WebGIS scheme of forest information system and open-source network geographic information system oriented to spatial information field and supported OGC and used the grey model to predict the development trend. The results showed that the method realized the double growth of forest area and forest volume, and the forest coverage rate reached a new level. Liu et al. [15] used data from TCGA to create multigene features and evaluated the predictive significance of each lncRNA related to cell proptosis for survival. Tian et al. [16] proposed a prediction approach for short-term wind speed using ensemble empirical mode decomposition-permutation entropy and regularized extreme learning machine, and the results show that the prediction approach in this study has higher reliability under the same confidence level. Zeng et al. [17] proposed a time series prediction method based on pattern analysis and used the probability relaxation method to classify the probability vectors of the basic pattern. Yuan et al. [18] proposed a kernel-HFCM model based on kernel mapping and HFFCM to predict time series inspired by the kernel method and SVR. Sebastian et al. [19] proposed a fractal interpolation method, which can generate finer-grained time series from insufficient data sets, and applied the grey theory model GM (1, 1) to the prediction of the price of Panama-type two-wheeled ships, to deal with bulk carriers with different periods and different sample sizes. Her et al. [20] applied the grey theory model GM (1, 1) to the prediction of the price of Panama-type two-wheeled ships, to deal with bulk carriers with different periods and different sample sizes. Mei et al. [21] used the BP-ANN model to predict COD removal efficiency and TEC and used particle swarm optimization to optimize BP-ANN, thereby improving the weights and thresholds of prediction accuracy optimization. Duan et al. [22] used the Bayesian model to predict the time and space of suspicious criminals to predict the driving traces of their social activities.
The Markov chain (MC) [23] refers to the stochastic process that contains Markov property in mathematical statistics and probability theory and exists in discrete state space and index set. The Markov chain applied to the continuous exponential set can also be called a Markov process. As an important method to study the state transition law of things, the Markov process can determine the best state of the development of things according to the initial probability and different state distribution of the research object and the transition probability matrix, and make the corresponding prediction.
Therefore, based on the analysis of the latest research on the optimization of the SARIMA model, a multiplicative seasonal model based on Markov chain optimization is proposed in this study. Based on the prediction of the multiplicative seasonal model, the prediction results of the model are optimized to further improve the prediction accuracy of the model.
The angle between the drift position of the sediment marine light buoy and the position of the sediment in Meizhou Bay Port is used as data to verify the model. The results show that the prediction error of the multiplicative seasonal model optimized by the Markov chain is lower than that of the traditional multiplicative seasonal model, which can be used on a scientific basis.
2. Multiplicative Seasonal Model
The multiplicative seasonal model refers to the multiplicative of the differential autoregressive moving average model and the random seasonal model. It is an important method for processing time series with seasonality and trend. The general form is SARIMA (p, d, q) × (P, D, Q) S, among them, the parameters p, d, q, P, D, Q, and S represent the nonseasonal regression order, moving average order, nonseasonal moving average order, seasonal autoregressive order, seasonal difference order, and seasonal average moving order, and the general form is shown as follows:where .
2.1. ARIMA
The autoregressive integrated moving average model (ARIMA) [24] is a time series prediction method proposed by Jenkins and Box. It is mainly used to study time series with periodicity, trend, and seasonality. ARIMA (p, d, q) is a combination of the autoregressive model AR (p) and the moving average model MA (q); its general form is shown in the following formula:where is a delay operator, is a zero-mean white noise sequence, is the time series, is an autoregressive coefficient polynomial, is a moving average coefficient polynomial, and and are expressed as its corresponding coefficients, respectively.
After integrating AR (p) and MA (q), the D-order difference of its seasonal part with the ARIMA (p, d, q) model is obtained, and its general form is shown in the following formula:where is difference operator, is trend difference, and .
2.2. Random Seasonal Model
The random seasonal model [25] is a time series with only periodicity and seasonality obtained by the integration of seasonal autoregressive model and seasonal moving average model through seasonal periodic difference, and the general forms of AR (P) and AR (P) are shown in the following equation:where is a seasonal autoregressive coefficient polynomial, is a seasonal moving average coefficient polynomial, and and are corresponding coefficients, respectively.
After integrating AR (P) and MA (Q), the S and D-order difference processing are performed for seasonal and periodic parts, respectively, and the general expression of the obtained ARIMA (P, D, Q) model is shown in the following equation:where is a periodic difference and is a seasonal difference.
3. Markov Chain
The Markov process refers to the division of a system into several small systems with different states, and each state can be transferred to another corresponding state according to its inherent transition probability. It is concluded that the Markov chain represents a process between one state and another, and that the probability of this process depends only on the corresponding preceding and following states [26]; its basic definition is as follows: let the random variable in discrete space I = (Xn, n ≥ 0), if any random variable n and any state i0, i1, i2,i3…, in-1, i, j all exist = , then the random variable I is called Markov chain.
3.1. Transition Probability Matrix
Under the condition that is Markov chain, then represents the one-step transition probability matrix of random variable n; if the one-step transition probability matrix is independent of n, then the Markov chain is called a homogeneous Markov chain, and the specific expression form is as follows:
3.2. Autocorrelation Coefficient
The correlation coefficient [27] is the degree of interaction between the time series of two different things; autocorrelation coefficient is the degree of correlation between different time series of the same thing. Therefore, according to the initial time-series data, the optimal multiplicative seasonal model is established. After the theoretical value is obtained, the actual value is subtracted from the theoretical value and the residual sequence is obtained. According to the residual value, the autocorrelation coefficient between the theoretical value and the actual value is calculated, and its general form is shown in the following formula:where is residual mean, is the i-th data in the residual sequence, and is the i + k-th data in the residual sequence.
3.3. Markov Chain Weight and Probability
Weight refers to the importance of a factor relative to a thing, and its meaning and proportion are different, but the meaning expressed is not only the percentage of a factor to a thing, but it also focuses more on the importance of a factor to a thing. The weight of the Markov chain reflects the importance of the target training data to the time series of future prediction. The higher the weight, the greater the possibility. Therefore, the model weight is obtained by standardizing the autocorrelation coefficient of each order, and its general form is shown in the following formula:where is the weight of the k-th step and m is the largest order of the time series.
After the weight of the Markov chain and the transition probability matrix of different steps are obtained, the corresponding state probability of each order is multiplied by the weight, and then the probability of the same state is added to obtain different prediction probabilities. Finally, the final probability is obtained by weighting the probability of each order, and its general form is shown in the following formula:
4. MC-SARIMA
The multiplicative seasonal model refers to the use of a specific mathematical model to describe the trend between a set of random variables related to time. It can comprehensively consider components such as season, trend, and random interference and has a good prediction effect on time series. However, future data prediction often appears high and low or even seriously deviates from the actual situation, and the model has certain requirements for data stability [28]. The Markov chain refers to the random process as a theoretical basis to discuss the development law and prediction direction of the selected data. It is a time series itself, and the obtained value has one with the present, but it has nothing to do with the historical situation. However, in terms of data fitting, it often needs to cooperate with other models to play its maximum prediction effect [29]. Based on considering the advantages of the two and applying the multiplicative seasonal model to the selected time series, this study comprehensively considers the relationship between the existing data and further optimizes the prediction results by using the Markov chain model, to improve the prediction accuracy.
The MC-SARIMA model is based on the SARIMA model and uses the Markov chain to further optimize the error. The predicted value obtained by the SARIMA model is taken as the theoretical value and then compared with the actual value to obtain the residual value of the original data. According to the size relationship between the residual data, each data is divided into states, Markov test, probability calculation, etc., to optimize the latter prediction value. Then, the data are pushed back to a time node as a whole, and the data of the later time node are predicted and optimized by the same method to predict and optimize the value of the future time node; the steps are detailed as follows:
Stage 1. Data Detection. The stability of the data is an important prerequisite for the application of the multiplicative seasonal model. The ADF test can be used to test the stability of the data and white noise. If the data are stationary, go to step 3; if the data are not stationary, go to step 2.
Stage 2. Differential Processing. The difference is an important method to deal with the stability of time series. The results reflect a change between discrete variables and are a tool for studying discrete mathematics.
Stage 3. Parameter Selection. Using the AIC best criterion method to select the optimal parameters and bring them into the multiplicative seasonal model to obtain the theoretical value data.
Stage 4. Acquisition of Residuals. The residual value is obtained by the direct difference between the theoretical value and actual value of the multiplicative seasonal model, which is used as the training data of the Markov chain model.
Stage 5. State Division. Divide the residual value into different state intervals according to its size and distribution, and calculate the MAE, MAPE, RMSE, RRMSE, R2, and SSE [30].
Stage 6. Markov Chain Test. The Markov chain test [31] is an important prerequisite for testing whether trained time series conform to the one-dimensional Markov chain distribution and also testing the independence of transitions between different states, and the statistical formula is shown in equation (9).
Stage 7. Transfer Probability Matrix Calculation. According to different state intervals, the transition probability from the target state to different states is an important process condition to obtain the prediction results.
Stage 8. Predictive Optimal. After calculating the optimal probability according to the autocorrelation coefficient, weight, and probability calculated in the Markov chain model, the predicted value is optimized according to the state interval and optimization formula corresponding to the optimal probability, and the optimization formula is shown in formula (10).where is the predicted value after optimization, is the theoretical value, is the upper interval of the corresponding state interval, is the lower interval of the corresponding state interval, and is the number of states divided.
5. Case Analysis
In this study, the light buoy azimuth data for 96 consecutive hours from August 11 to August 14, 2018, at Meizhou Bay Port are taken as the actual value data and are incorporated into the MC-SARIMA model to predict the light buoy azimuth data for 12 hours on August 15, 2018. Firstly, the training data through the stationarity test and the white noise test are brought into the SARIMA model, and the corresponding parameters are selected to obtain the theoretical value data. Then, the residual data from 1:00 on August 11 to 24:00 on August 14 are brought into the MC-SARIMA model to obtain the residual prediction data at 1:00 on August 15, and then the residual data from 2:00 on August 11 to 1:00 on August 15 are brought into the MC-SARIMA model to obtain the residual prediction data at 2:00 on August 15. By analogy, the residual prediction data from 1:00 to 12:00 on August 15 are obtained respectively. Then, the predicted azimuth data were obtained according to the relevant formulas, and the specific steps are shown as follows:
5.1. Stationarity and White Noise Test
To meet the requirements of the multiplicative seasonal model for the stability of time series and determine whether it is a white noise sequence, it is necessary to carry out differential processing, and the initial time-series data are tested by ADF to detect whether the time series can meet the stability requirements. The trend diagram and the first-order difference diagram are shown in Figures 1 and 2.


It can be seen from the comparison between Figures 1 and 2 that the first-order difference has reached the stability requirements, but to ensure the accuracy of the data, it is necessary to perform the ADF test. If it cannot meet the stability requirements, further difference processing is carried out.
An augmented Dickey–Fuller test statistic [32] (ADF) is one of the important methods to detect whether the time series are stationary and white noise sequences. The test results are listed in Table 1.
It can be seen from Table 1 that the ADF test statistics for the time series are less than the detection values of the corresponding test levels of 1%, 5%, and 10%, and the probability meets the requirements of the test observations. Therefore, the time series is a stationary sequence and a nonwhite noise sequence.
5.2. Parameter Selection
After the first-order difference of the time series, when the trend of the time series disappears, the parameter d = 1, and when the first-order difference of the time series disappears seasonally, the parameter D = 1. Then, the ACF and the PACF are integrated to determine the values of the parameters p and q. The general range of parameters p and q is [0, 2]. The analysis of ACF and PACF is shown in Figures 3 and 4.


By analyzing the autocorrelation function diagram and partial autocorrelation function diagram, it can be concluded that there is no obvious tailing or truncation. Therefore, the time series are divided into seasonal trend, random fluctuation trend, and growth trend as shown in Figure 5.

It can be seen from Figure 5 that there are two peaks every 24 hours, so the time series is affected by periodicity and seasonality. Therefore, it can be determined that the period of the time series is S = 24, and there is no obvious law between the random fluctuation trend and the growth trend. After the known parameters d = 1, D = 1, S = 24, to optimize the combination of various parameters, the AIC minimum information criterion method [33] is used to select the optimal parameters. The smaller the AIC value, the better the model. The AIC calculation results of the time series are listed in Table 2.
Table 2 lists that the minimum AIC value is 320.38, so the parameters p = 0, q = 1, P = 0, and Q = 1 can be determined, so the model combination should choose SARIMA (0,1,1) × (0,1,1) 24 as the best SARIMA multiplicative seasonal prediction model.
5.3. Residual Acquisition
According to the requirements of the Markov chain for time series data, the actual value data of 1–96 hours are taken as training data, and the multiplicative seasonal model is introduced to obtain the theoretical value data of 97–120 hours. Finally, the actual value is subtracted from the theoretical value to obtain the residual value. Some data are listed in Table 3.
5.4. State Division
State division is an important process in the Markov chain, which can divide each training data into different states, to make a theoretical basis for calculating the state probability. After the residual sequence is obtained, the mean value A and the mean square deviation M are calculated, and then the residual sequence is divided into four different states. The classification level is listed in Table 4.
5.5. Probability Matrix and Markov Chain Test
According to the state partition diagram, the frequency matrix is calculated and the one-step transition probability matrix is obtained, and the results are as follows:
After calculating the two-step to four-step transition probability matrix , , and marginal probability = 0.136842105, = 0.29473684, = 0.4210526, and = 0.147368, the Markov chain test is done for the residual sequence.
The Markov chain test is an important prerequisite to test whether the trained time series conform to the one-dimensional Markov chain distribution. The statistical formula is shown in formula (9) and the results are listed in Table 5.
Given the explicitness level = 0.05, the statistics calculated according to Table 5, = 14.66091051 and = 14.067, from which it can be seen that > , so the residual sequence satisfies the Markov chain test and the predicted value can be optimized by the Markov chain.
5.6. State Prediction and Prediction Optimization
According to the residual data of 96 hours from August 11 to August 14, 2018, and the corresponding state probability matrix, and combined with the weighted sum formula, the 12 hours on August 15, 2018, are predicted. The predicted results at 1:00 on August 15, 2018, are listed in Table 6.
It can be seen from Table 6 that max = 0.453017, corresponding to state i = 3, and the predicted value is optimized according to the different intervals of the predicted state, the number of states divided, and the corresponding interval value; the optimization formula is shown in equation (10).
According to the optimization formula, the optimization results at 1 o’clock on August 15, 2018, were 320.6522 m, and the data in the next 11 hours can be predicted using the same method and compared with the actual value, ARIMA prediction value, and SARIMA prediction value. The comparison figure is shown in Figure 6, and the specific data are listed in Table 7.

The prediction error is calculated based on the mean absolute error (MAE), sum of squares due to error (SSE), R-square (R2), mean absolute percentage error (MAPE), root mean square error (RMSE), and relative root mean square error (RRMSE) [34, 35], and the formula is shown in equations (11)–(16) and the comparison of prediction error is shown in Figure 7.where is the prediction value, is the actual value, and is the average of actual values.

(a)

(b)

(c)

(d)

(e)

(f)
Figure 7 shows that compared to the traditional multiplicative seasonal model and other prediction models, the prediction MAE of the MC-SARIMA model is decreased by 2.19003794 and the MAPE is decreased by 0.66%, the RMSE is decreased by 2.092671823, the RRMSE is decreased by 0.006221352, the SSE is decreased by 404.0231931, and the R2 is increased by 0.224686247.
It shows that the multiplicative seasonal model optimized by the Markov chain can predict the azimuth data of the light buoy more effectively.
6. Conclusion
In this study, a single multiplicative seasonal model is established and the time series is predicted based on the periodic characteristics of the offset azimuth of the light buoy. The azimuth prediction MAE is 7.456869211, MAPE is 2.22%, RMSE is 9.090740801, RRMSE is 0.02722108, SSE is 991.6988198, and R2 is 0.448494319, respectively. However, to further improve the prediction accuracy, the Markov chain is used to further optimize the multiplicative seasonal model. The azimuth prediction MAE of the light buoy azimuth is 5.266831271, MAPE is 1.57%, RMSE is 6.998068978, RRMSE is 0.020999727, SSE is 587.6756267, and R2 is 0.673180566, respectively. The MC-SARIMA model, ARIMA model, SARIMA model, BP-ANN, and GM (1, 1) are compared, and the results are obtained. The results show that compared with the single multiplicative seasonal model and other prediction models, the prediction results of the multiplicative seasonal model optimized by the Markov chain are more able to meet the needs of the navigator, and the prediction error is lower, which provides a new idea for the offset warning of the light buoy. However, the model does not fully take into account the special circumstances under the influence of nonseasonal cycles and will be followed up to study the possible changes in the azimuth of the buoy in special cases such as typhoons and ship collisions and make predictions according to their existing conditions.
Data Availability
The light buoy position data used to support the findings of this study were supplied by Jinxing Shao under license and so cannot be made freely available. Requests for access to these data should be made to Jinxing Shao; affiliation: Aids to Navigation Department, Xiamen; e-mail: 13606931987@139.com.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the Natural Science Foundation of Fujian Province (Grant no.2020J01658, 2019J01325), Open Project Fund of National Local Joint Engineering Research Center for Ship Assisted Navigation Technology (Grant no. HHXY2020002), Doctoral Start-up Fund of Jimei University (Grant no. ZQ2019012).