Research on Small Sample Nonlinear Cointegration Test and Modeling Based on the LS-SVM Optimized by PSO

Du, Jungang

doi:https://doi.org/10.1155/2022/8416706

Complexity

On this page

Abstract Introduction Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 8416706 | https://doi.org/10.1155/2022/8416706

Research on Small Sample Nonlinear Cointegration Test and Modeling Based on the LS-SVM Optimized by PSO

Jungang Du¹

Academic Editor: Zhen Zhang

Received19 Apr 2022

Revised05 Aug 2022

Accepted08 Aug 2022

Published26 Aug 2022

Abstract

According to the definition of nonlinear cointegration, this article studies the small sample nonlinear cointegration test and NECM (Nonlinear Error Correction Model) based on the LS-SVM (Least Squares Support Vector Machine) optimized by PSO (Particle Swarm Optimization). And the logical process of this method is also designed. Then, we carry out empirical research on the ship maintenance cost index and the several price indexes. Based on the judgment of the type of cointegration test relationship, the test of the nonlinear cointegration test relationship of small samples is realized, and the NECM for predicting the ship maintenance cost index is established as well, which is compared with the VAR model of linear vector. The research result shows that the small sample nonlinear cointegration test and modeling method based on the LS-SVM Optimized by PSO can describe the nonlinear cointegration test relationship of the small sample system. And the NECM has better performance. The prediction effect can effectively predict small sample nonlinear systems. We also compare the prediction results with the wavelet neural network algorithm, and the results show that the generalization ability of LS-SVM Optimized by PSO is better, and the prediction accuracy of small samples is higher.

1. Introduction

The modeling method for traditional econometrics is based on the asymptotic theory of least squares estimation, which premised on the stability of the time series. In economic systems, especially in macroeconomic, variables are mostly nonstationary. Therefore, traditional statistical inference cannot be directly applied. As a result, Engle and Granger have proposed the cointegration test theory [1] and pointed out that the cointegration system has three main forms, namely, the VAR (Vector Auto Regressive) model, the VMA (Vector Moving Average) model, and the ECM (Error Correction Model), which provide an approach to model nonstationary time series. Cointegration theory describes the long-term linear equilibrium relationship between each component sequence of the economic system and can be expressed by the ECM. Due to the pseudoregression phenomenon can be effectively avoided, the ECM based on linear cointegration technology was widely used in economic forecasting [2–4].

Linear cointegration implies two basic conditions. The first one is that each time series has an integer dimension, and the second one is that it has the same single integer order, which is the basis for the study of linear cointegration theory. However, many time series in the economic system have the characteristics of nonlinearity and long memory [5]. A single time series is not a single integer dimension but a fractal dimension or different integer dimensions, which means that the single time series is nonlinear and featured with fractal character, and the equilibrium relationship between sequences is often nonlinear as well. For nonlinear systems, the linear cointegration theory is not applicable, and the nonlinear cointegration theory shall be used to study the nonlinear equilibrium relationship between time series. Literature [6] gave the definition of nonlinear cointegration test:

As for a vector time series X_t= (x_1t, x_2t,…, x_nt)^T, the component sequence of {X_t} shall be named as nonlinearly cointegrated, if(1)x_it (i = 1,…,n) is a LMM(Long Memory in Mean, LMM) sequence or an I(1) sequence.(2)There is a nonlinear function f(), let y_t = f(x_1t, x_2t,…, x_nt), which is an SMM (Short Memory in Mean, SMM) sequence.

The function f() is named as the nonlinear cointegrating function. If the nonlinear cointegration function has linear properties with respect to the independent variables, that iswhere α =(α₁, α₂,…, α_n) is the vector in Rⁿ, and the long memory of the component sequence in {X_t} is expressed as the I(1) sequence; the nonlinear cointegration shall be simplified to the linear cointegration relationship. Therefore, the definition of nonlinear cointegration is a generalization of the concept of linear cointegration, and the nonlinear cointegration is a special case of nonlinear cointegration.

As for the vector time series X_t= (x_1t, x_2t,…, x_nt)^T, if the component sequences are all LMM sequences, (I(1) sequences are all LMM sequences; if the LMM sequence is I(1), its single integer order is not necessarily equal.), and there is a nonlinear function f() which ensure the f(x_1t, x_2t,…, x_nt) is an SMM sequence. Therefore, {X_t} is named as nonlinear cointegration.

The key element to the testing and modeling of nonlinear cointegration relationship lies in the estimation of the nonlinear cointegration function f(). In view of the fact that the general linear cointegration analysis method cannot realize the nonlinear cointegration test and modeling, the literature [6] proposed a neural network algorithm suitable for the test and modeling of the nonlinear cointegration relationship of vector time series and discussed the theoretical basis of the method and its feasibility. By using a wavelet neural network with the ability to approximate nonlinear functions arbitrarily, literature [7] studied the NECM of nonlinear cointegration systems, provided the modeling method, and carried out relevant analysis. These research found that compared with the linear VAR model, the NECM has a better prediction effect and can effectively predict the nonlinear economic system.

For the nonlinear system modeling problems, many meta-heuristic algorithms have emerged in recent years, such as the Monarch Butterfly Optimization (MBO) [8], Slime Mould Algorithm(SMA) [9], Moth Search Algorithm (MSA) [10], Hunter Games Search(HGS) [11], Runge Kutta Method(RUN) [12], Colony Prediction Algorithm (CPA) [13], and Harris Hawks Optimization (HHO) [14]. These algorithms have the characteristics of simple principle and easy implementation and have excellent performance in solving the optimization problems of complex nonlinear systems in the case of large samples. But the basic model of algorithm also has obvious defects; there is a lot of room for improvement.

For example, MBO has a single local search ability and limited search location, which will lead to the degradation of the population. Its global search method is too simple to make full use of the effective information of the population. However, adopting the elite retention strategy requires setting parameters and sorting operations, which increases the complexity of the algorithm [15]. The structure of SMA is simple and clear, but it is found that this algorithm still has some shortcomings, such as easy to fall into local optimization, slow convergence speed, low accuracy, and so on [16]. The MSA has the advantages of simple structure, few parameters, high accuracy, and strong robustness, but its development ability still needs to be improved, and it is easy to fall into local optimization [17]. In numerical analysis, the RUN is an important family of implicit and explicit iterative methods for the approximation of solutions of ordinary differential equations. But compared with other intelligent algorithms, the calculation process of this method is more complex. The CPA algorithm has long running time, intermittent edges, and difficult parameters to adjust [18]. For HHO algorithm, how to balance contradictions between the exploration and exploitation capabilities and alleviate the premature convergence are two critical concerns that need to be dealt with in the HHO study [19]. Like the above algorithms, HGS requires a high number of training samples and has a good application effect in large sample research objects, but it is not suitable for small samples [20].

The above research provides effective methods and technical tools for the testing and modeling of nonlinear cointegration relationships. However, these methods have certain requirements for the quantity and quality of each series, which are suitable for large sample time series (samples shall be larger than 100 pieces). The neural network algorithm and wavelet neural network shall be prone to overfitting phenomenon, and the generalization ability of the model is not good as well [21], including relational testing and modeling have poor applicability. In view of this, this article proposes to apply the LS-SVM [22] with excellent learning performance and strong generalization ability to test the nonlinear cointegration relationship among small sample time series. Meanwhile, the PSO, which is widely used in function optimization, neural network training, fuzzy system control, and other genetic algorithms, is selected to jointly optimize the parameters of LS-SVM. This article further studies the modeling of small-sample NECM by the LS-SVM optimized by PSO. According to the method proposed in this article, the nonlinear cointegration relationship among the ship maintenance cost index and several price indexes is tested and modeled, and the empirical analysis’ is carried out and compared with the VAR model and the wavelet neural network algorithm.

2. Nonlinear Cointegration Test and NECM Model

2.1. Nonlinear Cointegration Test

According to the definition of nonlinear cointegration, if each component of vector time series X_t= (x_1t, x_2t,…, x_nt)^T is LMM sequence, the SMM sequence with a constant mean value shall be used as the target value, and after conversion of nonlinear function f(), we may have the output sequence y_t = f(x_1t, x_2t,…, x_nt). However, it cannot be confirmed that this f() is an estimation of the nonlinear cointegration function, and it is necessary to check whether the residual sequence{}is an SMM sequence. If the sequence{}is an SMM sequence, then f() can be seen as the estimation of a nonlinear cointegration function, which shows that there is a nonlinear cointegration relationship among the component sequences of {X_t}. The inspection process for{}is as follows:

2.1.1. Unit Root Test

Given the null hypothesis is H₀: there is a unit root in{}; the alternative hypothesis is H₁: there is no unit root in{}.

If the null hypothesis H₀ is accepted, then{}shall be a LMM sequence, which means that there is no nonlinear cointegration relationship among the component sequences of the {X_t}. If the alternative hypothesis H₁ is accepted, then{}does not have a unit root. However, it cannot be determined that it belongs to the SMM sequence, and the sequence memory test of{}is also required as well.

2.1.2. Sequence Memory Test

Calculate the auto-correlation function sequence {ρ_k} of the sequence{}, namely: γ_k = Cov(, )，(k = 1, 2, …)，ρ_k = γ_k/γ₀.

Time series memory shall be mainly judged from the convergence speed of auto-correlation function ρ_k. As for short-memory time series, the auto-correlation function ρ_k shall be decreased along with a negative exponential speed as k⟶∞, that is, ρ_k ∼ ca^{- k} (k = 1, 2, …). Where, a and c are constants, and a>1 and c > 0. As for long memory time series, the auto-correlation function ρ_k shall be decreased along with the speed of negative exponent as k⟶∞, that is, ρ_k ∼ ck^-^2d−1 (k⟶∞), and d < 0.5. Therefore, the faster the auto-correlation function of time series decreases, the more likely it is to be a short memory time series.

Quantitative methods for memory test of time series include R/S test [21], modified R/S test [22], KPSS test [23], and LM test [24].

2.2. NECM Model

Similar to the error correction model (ECM) derived from linear cointegration, for the vector time series X_t = (x_1t, x_2t,…, x_nt)^T, when there is a nonlinear cointegration relationship among its component series, the derived NECM shall bewhere f() is a nonlinear cointegration function, θ is a parameter vector, ε_t= (ε_1t, ε_2t,…, ε_nt)^T is a random error sequence, Г_j (j = 0, 1, 2, …, k) is the coefficient matrix of n×1, and k is the lag intervals for endogenous.

Then, the NECM of the ordinal number i component of {X_t} is

Estimate each parameter in formula (3) and bring the parameter estimation result into the original formula, then the NECM corresponding to each component can be obtained. The model can be used to predict each component sequence, and the ordinal number h step prediction model of the ordinal number i component shall be

The estimation f(X_t-k; ) of the nonlinear function in NECM is the equilibrium error of {X_t} at time t-k, and its function is to correct the difference sequence value, which is the same as the function of the error correction term in linear cointegration. As h is different, the estimation f(X_t ₊ _h-k; ) of the nonlinear cointegration function is also different. Therefore, NECM’s predictions of variables at different time periods are independent of each other.

3. Parameter Optimization of LS-SVM Model Based on PSO

Support Vector Machine (SVM) is an effective nonlinear problem processing tool developed in recent years, which has great advantages for solving small sample, nonlinear, and high-dimensional problems [25]. LS-SVM [26] is an extension of the standard SVM, which has good robustness and requires fewer parameters to be optimized, which is widely applied [27].

3.1. LS-SVM Model and Selection of Kernel Function

LS-SVM is obtained by transforming inequality constraints into equality constraints in standard SVM algorithm [28], which is a form of SVM under quadratic loss function.

In LS-SVM modeling, the role of the kernel function is equivalent to projecting the samples into a high-dimensional space, transforming it into a linear regression problem, and then constructing the optimal regression curve [29]. Therefore, the selection of the kernel function directly affects the generalization ability of the curve.

The common kernel function mainly includes polynomial kernel function (, or d = 1, 2, ….), sigmoid kernel function (), spline kernel function (), and radial basis function (RBF) kernel function (). The selection of the kernel function includes the determination of the form of the kernel function and the determination of the kernel parameters. Considering the strong generalization ability of the RBF kernel function [30, 31], this article selects the RBF kernel function for modeling research.

3.2. Optimization of LS-SVM Model Parameters Based on PSO

After selecting RBF as the LS-SVM kernel function, there are 2 parameters shall be defined, namely the regularization parameter γ and the kernel parameter σ². These two parameters largely determine the learning ability and prediction ability of LS-SVM. Therefore, it is necessary to find the optimal combination of γ and σ².

At present, for the optimal value of LS-SVM parameters, the general selection method is grid search method, which achieves satisfactory results through continuous experiments. However, this method is time-consuming and inefficient. Some scholars proposed to use the gradient descent method to select the parameters of LS-SVM [32]. However, this method is limited by the fact that the kernel function must be differentiable, and it is easy to fall into local minima in the search process. Some scholars have also tried to use immune algorithm to optimize the parameters of LS-SVM [33] to reduce the blindness of parameter selection and improve the prediction accuracy of LS-SVM, but the implementation of this method is complicated. Besides, some scholars proposed to use genetic algorithm to determine LS-SVM parameters [34], but genetic algorithm needs to perform crossover and mutation operations, and many parameters need to be adjusted, which is computationally complex and inefficient.

The PSO is a new stochastic optimization algorithm based on swarm intelligence. Similar to the genetic algorithm, PSO is also a random search optimization tool based on population, but there is no crossover and mutation operation. Particles follow the optimal particle in the solution space to search. It has the characteristics of parallel processing, good robustness, simple and easy to implement, high computational efficiency, and can find the global optimal solution of the problem with a large probability. Therefore, the PSO algorithm was selected in this article to optimize the parameters of LS-SVM, and the nonlinear cointegration relationship of small sample time series was tested based on the LS-SVM optimized by PSO.

3.3. PSO Algorithm

In the PSO algorithm, each alternative solution is called a “Particle”. Multiple particles coexist and cooperatively search for optimization. Each particle flies to a better position in the problem space and searches for the optimal solution according to its own “Experience” and the best “Experience” of the adjacent particle swarm. Each particle in the particle swarm represents a potential solution of the system. Each particle is represented by three indicators: position, speed, and fitness. The mathematical expression of PSO algorithm is as follows [35]:

Suppose in an n-dimensional search space, there are m particles forming a group, where X_i=(X_i1, X_i2, …, X_im) is the current position of particle i; V_i=(V_i1, V_i2, …, V_im) is the current flight speed of particle i; P_i=(P_i1, P_i2, …, P_im) is the position with the best fitness value experienced by particle i, which is called the individual optimal position; =(, , …, P_gm) represents the position of the optimal fitness value searched by all particles in the whole particle swarm so far, which is called the global optimal position. The position of each particle varies according to the following formula:where k represents the kth generation of evolution; is the inertia weight, indicating to what extent the original speed of the particle can be retained, the larger has better global convergence ability, while the smaller has a stronger local convergence ability, in order to slow down the movement process of the particles and prevent the oscillation phenomenon when the particles move towards ; generally, is defined linearly decrease with the evolution; c₁ and c₂ are two positive learning factors, and c₁ adjusts the step size of particles flying to the individual optimal position P_i, c₂ adjusts the step size of the particle flying to the global optimal position . The larger the value, the greater the acceleration of the particle flying to P_i and . Generally, their value is 2.0. rand1() and rand₂() are random numbers in the range [0,1].

Meanwhile, in order to reduce the possibility of particles leaving the search space during evolution, V_i is usually limited to a certain range, namely V_i [-V_max，V_max]. V_max is a constant value. In general, if the search space of the problem is limited to [-X_max，X_max], then V_max = αX_max can be defined, generally 0.1α ≤ 1.0. The number of particles m is generally 20∼40, the length n is determined by the optimization problem, and the range of particles is also determined by the optimization problem, and different ranges can be set for each dimension. The termination condition (fitness) of the algorithm can be defined as the maximum number of cycles or the minimum error requirement, or it can be defined by the specific problem [36].

3.4. Basic Procedures for LS-SVM Optimized by PSO

The LS-SVM kernel function selected in this article is RBF, and the parameters to be optimized are γ and σ². Therefore, the particle length in PSO shall be defined as 2. The particle fitness shall select the mean square error (MSE) between the LS-SVM target value and the output value, whose function form iswhere n is the number of training samples, y_i is the target value of LS-SVM, and is the output value. The smaller the particle fitness means, the better current particle position.

According to the basic principle and steps of PSO algorithm, the logic flow for optimizing the parameters of γ and σ² LS-SVM is shown in Figure 1. The optimization steps are as follows: PSO initialization, define the particle swarm number as m, the maximum number of iterations as T, the initial value of inertia weight as 0.9, which decreased linearly with the increase of iteration times, (t) = 0.9- t/T 0.5. Define the maximum speed V_max; random initialization of particle swarm in each particle position and velocity, namely t = 0, noted as X_i(0)=(γ_i(0), σ_i²(0)), , i = 1, 2, ..., m. Calculate the fitness of each particle. The values of γi and σi2 contained in the current position of each particle shall be substituted into the LS-SVM model to obtain the model output value , which shall be compared with the target value yi to calculate the fitness of each particle. For each particle, if the fitness (that is, the MSE of the LS-SVM target value and the output value) of its current position is smaller than the fitness of its previous best position Pi, the best position of this particle is its current position. For each particle, if the fitness of its best position Pi is smaller than the fitness of the global best position Pg. of all particles, the global best position Pg. will be replaced by Pi. Change the velocity and position of the particle according to formula (5) and (6), so as to generate new particles. Check whether the end condition is reached, if not, return to Step2. Obtain the optimal particle information, and assign the optimal combination of γ and σ2 to the LS-SVM model.

4. Nonlinear Cointegration Test and NECM Modeling Based on LS-SVM Optimized by PSO

The nonlinear cointegration relationship of small sample time series shall be tested by the method of LS-SVM optimized based on PSO; thus, the NECM model is established. The basic steps are as follows: Check whether each component sequence has a linear cointegration relationship. The method is to calculate the fractional integration order of each sequence. If the fractional integration order is different, it means that each component sequence does not have a linear cointegration relationship. Based on the LS-SVM optimized by PSO, the nonlinear cointegration relationship among small sample time series shall be tested, and the nonlinear cointegration function f() shall be estimated and tested as well. The input of LS-SVM is each small sample sequence to be tested, and the target value is the SMM sequence with constant mean value. The minimization of the mean square error (MSE) between the target value and the output value is used as the training target, and the kernel function of LS-SVM is optimized based on PSO algorithm to obtain the optimization model. The memory test is performed on the optimized output sequence. The memory test method adopts the modified R/S statistic proposed by Lo [21], and the distribution function of shall be where is the quantile and F() is the cumulative probability, that is, F() = p(Qn< ). The larger the F(), the greater the probability that the sequence belongs to the long memory sequence. When is 3, F() ˜ 0.999 99. When the long memory is not significant, the output sequence of LS-SVM is a short memory time series, indicating that there is a nonlinear cointegration relationship between the sequences, and f() is the nonlinear cointegration function of the sequence to be tested. Substitute the obtained nonlinear cointegration function f() into formula (3) and use the least squares method to estimate the parameters based on Eviews software. Substitute the parameters into the prediction models (4) and (5), and the calculation result shall be the predicted value.

5. Empirical Analysis

Based on the 2000∼2021 Ship Maintenance Price Index (SMPI), Consumer Price Index (CPI), Product Price Index (PPI), Materials, Fuel, and Power Purchasing Price Index (MPI) consists of 22 groups of data as samples for empirical analysis (The index in 2000 is 100), and the trend of each index is shown in Figure 2.

By using the LS-SVM optimized by PSO, we test the nonlinear cointegration relationship among SMPI, CPI, PPI, and MPI, then establish the NECM prediction model of SMPI according to the nonlinear cointegration function f().

5.1. Judgment of the Type of Cointegration Relationship

5.1.1. Sequence Memory Test and Calculation Method of Integer Order

The basic calculation steps for using R/S test to judge sequence memory are as follows:

Define the time series {x_t, t = 1, 2, …, n}, definition:

The sample mean value shall be

The sample variance shall be

The accumulated deviation shall be

The range value shall be

Then the estimated value of R/S statistic shall be

In case of short memory and heteroscedasticity, the R/S statistic is not robust. Lo [37] proposed a revised R/S statistic aswhere , , q < n。

As for a time series {x_t, t = 1, 2, …, n}, taking m observations of the series, the {x_t, t = 1, 2, …, n} is divided into l = [n/m] independent time series with length l. Calculate the modified R/S test statistic Q_mⁱ(i = 1, 2, …, l). Furthermore, the average value of number l for R/S test statistics Q_mⁱ shall be calculated to obtain the modified R/S test statistics of time series with length m.

As for different m values, the modified R/S statistic sequence can be obtained, and Mandelbrot proved thatwhere C is a constant, and H is the Hurst exponent. That is: Q_mm^H, taking the logarithm of both sides, we may have

By using least squares regression, H can be obtained. The relationship between d and H shall be H = d + 1/2. Thus, an estimated value of the fractional order d is obtained.

5.1.2. Cointegration Relationship Types among SMPI, CPI, PPI, and MPI

According to the above calculation methods of sequence memory and fractional order, the memory of SMPI, CPI, PPI, and MPI in the sample interval is tested. The modified R/S statistic of each index is shown in Table 1.

From the test results in Table 1, the modified R/S statistic of SMPI, CPI, PPI, and MPI is far greater than 3, These time series have significant long memory characteristics and belong to long memory time series. The following is the calculation of the fractional order of each index, taking SMPI as an example.

If we define value m as 1, 2, 3, 4, and 5, respectively, for finding out the statistic corresponding to SMPI, the results are shown in Table 2.

The modified R/S statistic and the corresponding m value of SMPI are substituted into Equation (17). H = 0.804 was obtained by least square regression. Therefore, the fractional order of SMPI is d = H– 0.5 = 0.304.

By using the same method and steps, the fractional order of CPI, PPI, and MPI shall be calculated separately. The results are shown in Table 3.

According to the statistic test results of SMPI, CPI, PPI, and MPI, and the calculation of the fractional order of each index, we find out that each index belongs to the long memory time series, and the fractional order of each index is not the same. Therefore, there is no linear cointegration relationship among SMPI, CPI, PPI, and MPI. Whether there is a nonlinear cointegration relationship among them requires further testing.

5.2. Estimation and Testing of Nonlinear Cointegration Relationships

According to the steps of nonlinear cointegration relationship test described in 4.2, this article selects 18 groups of data (SMPI, CPI, PPI, and MPI) from 2000 to 2017 as training samples to estimate the nonlinear function relationship and optimize the LS-SVM parameters. Then, the 4 groups of data (SMPI, CPI, PPI, and MPI) from 2018 to 2021 are used as test samples and substituted into the optimized LS-SVM model by rolling verification. The revised R/S statistic are estimated for 19, 20, 21, and 22 output sequences to test whether they were short memory sequences.

In the training and verification of LS-SVM model, the input variables are SMPI, CPI, PPI, and MPI, and their minimum value is 100 (that is, the benchmark value in 2000). Among these variables, SMPI has the largest value, but its maximum value is less than 600. Considering the possibility of prediction range, this article sets the maximum value of input variables as 1000. Therefore, the range of input variables of LS-SVM model is [100, 1000]. Since it is necessary to determine whether there is a nonlinear cointegration relationship among input variables, the output variable of LS-SVM model is set as an SMM sequence according to the definition of nonlinear cointegration.

5.2.1. Model Parameter Setting

Define key parameters of PSO algorithm: particle number is 20; the particle length is 2, and the parameters to be optimized are γ and σ². The search range of parameter γ is (0,40000), and search range of σ² is (0,10000). The learning factors c₁ and c₂ shall be 2. The maximum number of iterations is 5000; the initial value of inertia weight was defined as 0.9, which decreased linearly to 0.4 with the increase of iteration number. The maximum speed V_max is 10. Random initialization of particle swarm in each particle position and velocity, namely t = 0, X_i(0)=(γ_i(0), σ_i²(0)), , i = 1,2, …, 20. Considering that the PSO algorithm is a random search algorithm, this article operates the defined situation 30 times and takes the one with the smallest error as the final training result.

5.2.2. Model Optimization Results

According to the above settings, the optimal parameters shall be calculated as γ＝34 369.670, σ²＝5.912. The final error with the target value is -0.025 1, and the average error is -0.026 8. The iteration in the calculation process and the particle distribution in the final iteration are shown in Figure 3 and Figure 4, respectively.

5.2.3. Nonlinear Relationships Test

The test samples from 2018 to 2021 were substituted into the LS-SVM model optimized by PSO algorithm, respectively, and the modified R/S statistic was used to perform long memory test for the 4 output sequences. The statistical test results are shown in Table 4.

It can be seen from Table 4 that the test results are not significant, which proves that the four output sequences are short memory sequences, and the nonlinear function determined by the LS-SVM model is the nonlinear cointegration function of SMPI with CPI, PPI, and MPI.

5.3. Prediction of SMPI Based on NECM Model

Based on the test results of nonlinear cointegration relationship among SMPI, CPI, PPI, and MPI, this article establishes the NECM prediction model of SMPI.

5.3.1. Model Setting

We suppose that X_t=(SMPI_t, CPI_t, PPI_t, MPI_t)’, the nonlinear cointegration function among SMPI, CPI, PPI, and MPI is f(X_t). The model shall be defined as follows：

According to the AIC and SC criteria, the lag order in formula (18) is determined as k₁ = 1, k₂ = 3, k₃ = k₄ = 1. From the properties of the NECM model, it can be known that the estimation of the nonlinear function f(X_t) in equation (18) is directly related to the time of the system. As t is different, the estimation of the nonlinear cointegration function is also different. The change of SMPI at time t, ΔSMPI_t, is affected by the estimation result of the system deviation from equilibrium value f(X_(t-1)) at time t-1. Therefore, the lag order k₅ in the nonlinear function estimation is 1.

5.3.2. Estimation of Model Parameters

To predict SMPI based on the NECM model, it is necessary to estimate the f(X_t) of the system at different times. Afterward, the estimation result shall be substituted into equation (18) to estimate the parameters of the equation. Therefore, this article also uses the data from 2000 to 2017 as training samples to estimate the nonlinear functional relationship based on the parameters of the LS-SVM model optimized by PSO, applies the output as the estimation result of f(X₍₂₀₁₇₎), and adopts the least squares method. We estimated equation (18) for obtaining ΔSMPI₂₀₁₈ in 2018. Afterward, the predicted value of SMPI of 2018 is also obtained by SMPI_t = SMPI_t-1+ΔSMPI_t, with implementing the error analysis for actual value. Along the same lines, the SMPI for 2019–2021 shall be predicted and compared with the actual value using a rolling forecast verification method. In the construction of each ΔSMPI_t prediction model from 2018 to 2021, the LS-SVM model was optimized by the PSO algorithm, and the nonlinear cointegration function among SMPI, CPI, PPI, and MPI is estimated as well. We have the parameters γ and σ² in the LS-SVM model. The results are as follows shown in Table 5.

With LS-SVM optimized by PSO in each group, we have sequence results of f(X_(t)) (t = 2017, 2018, 2019, 2020). The historical data of each index change value and the sequence results of f(X_(t)) (t = 2017, 2018, 2019, 2020) were substituted into Equation (18), and the least-squares estimation of model parameters was carried out by using Eviews software. Taking ΔSMPI₂₀₁₈ as an example, the parameter estimation results are shown in Table 6.

5.3.3. Analysis and Comparison of Model Prediction Results

According to the parameter estimation results in Table 6, we use formula (18) to obtain ΔSMPI₂₀₁₈ = -7.469. The above results and the ship maintenance cost index in 2017 are substituted into SMPI_t = SMPI_t-1+ΔSMPI_t, and we have SMPI₂₀₁₈ = 575.235. The error with the actual value of 2018 ship maintenance cost index is 0.26%. According to the same idea, we may have the predicted results of the SMPI from 2019 to 2021 and the error with the actual value. The prediction results and errors of NECM are shown in Table 7.

In order to compare the prediction effect of SMPI based on NECM, this article constructs linear VAR models of SMPI, CPI, PPI, and MPI with samples from 2000 to 2021 to predict SMPI and compares the results with NECM. The VAR model is defined as follows:

The lag order of each variable in Formula (19) is set in the same way as that in Formula (18). The samples from 2000 to 2017 are substituted into Equation (19), and the least square method is adopted to estimate model parameters. According to the estimation results of model parameters, we predict the SMPI from 2018 to 2021. The prediction results and errors of VAR model are shown in Table 7.

As can be seen from Table 7, compared with SMPI prediction results based on NECM model, the prediction effect of VAR model is poor and cannot be effectively predicted, which also proves that the linear relationship among SMPI, CPI, PPI, and MPI is not significant.

Next, based on the conclusion that there is a nonlinear relationship among SMPI, CPI, PPI, and MPI, we use the wavelet neural network algorithm proposed in literature [7] to study the nonlinear cointegration relationship, establish the NECM model to predict SMPI, and get the prediction results from 2018 to 2021. The results are shown in Table 7. In the research process, it is found that the calculation process of wavelet neural network algorithm is more complex than LS-SVM. At the same time, due to the small number of samples in this paper (there are about 480 training samples in literature [7]), there is an overfitting phenomenon, but the accuracy of future trend prediction is also weaker than the model proposed in this article.

6. Conclusion

In this article, the small sample nonlinear cointegration test and NECM based on the LS-SVM optimized by PSO were studied. And the logical process of this method was also designed. Then, we carry out empirical research on the SMPI and the several price indexes. Based on the judgment of the type of cointegration test relationship, the test of the nonlinear cointegration test relationship of small samples was realized, the NECM for predicting the SMPI was established as well, which compared with the VAR model of linear vector. The research result showed that the small sample nonlinear cointegration test and modeling method based on the LS-SVM optimized by PSO can describe the nonlinear cointegration test relationship of the small sample system. And the NECM had better performance. The prediction effect can effectively predict small sample nonlinear systems. We also compared the prediction results with the wavelet neural network algorithm, and the results showed that the generalization ability of LS-SVM Optimized by PSO was better, and the prediction accuracy of small samples was higher.

Data Availability

The data presented in this study are available upon reasonable request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This research was funded by the National Social Science Foundation of China with grant nos. 17BJY028 and 19CGL073.

References

R. F. Engle and C. W. J. Grange, “cointegration and error correction: representation estimation and testing,” Econometrica, vol. 55, no. 3, pp. 251–276, 1987.
View at: Google Scholar
M. Rahman and A. Islam, “Some dynamic macroeconomic perspectives for India’s economic growth: applications of linear ARDL bounds testing for cointegration and VECM,” Journal of Financial Economic Policy, vol. 12, no. 4, pp. 641–658, 2020.
View at: Publisher Site | Google Scholar
C. Mouelhi, “The relationship between cat bond market and other financial asset markets: evidence from cointegration tests,” European Journal of Business and Management Research, vol. 6, no. 2, pp. 78–85, 2021.
View at: Publisher Site | Google Scholar
M. Sulochana and S. Rajkumar, “An empirical analysis of cointegration test among asian stock markets,” Test Engineering and Management, vol. 83, pp. 9619–9625, 2021.
View at: Google Scholar
C. W. J. Granger, “Modelling nonlinear relationships between extended memory variables,” Econometrica, vol. 63, no. 2, pp. 265–279, 1995.
View at: Publisher Site | Google Scholar
X. B. Zhang, Q. H. Sun, and S. Y. Zhang, “Research on nonlinear cointegration test and its testing metho,” Journal of system engineering, vol. 14, no. 1, pp. 57–68, 1999.
View at: Google Scholar
X. Sun, G. Wang, and Y. Fan, “Adaptive trajectory tracking control of vector propulsion unmanned surface vehicle with disturbances and input saturation,” Nonlinear Dynamics, vol. 106, no. 3, pp. 2277–2291, 2021.
View at: Publisher Site | Google Scholar
L. Sun, S. Chen, J. Xu, and T. Yun, “Improved Monarch Butterfly Optimization Algorithm Based on Opposition-Based Learning and Random Local Perturbation,” Complexity, 4182148, vol. 2019, 20 pages, 2019.
View at: Publisher Site | Google Scholar
V. K. Kamboj, C. L. Kumari, S. K. Bath et al., “A cost-effective solution for non-convex economic load dispatch problems in power systems using slime mould algorithm,” Sustainability, vol. 14, no. 5, p. 2586, 2022.
View at: Publisher Site | Google Scholar
A. M. Hussein, M. Abd Elaziz, M. S. A. Wahed, and M. Sillanpää, “A new approach to predict the missing values of algae during water quality monitoring programs based on a hybrid moth search algorithm and the random vector functional link network,” Journal of Hydrology, vol. 575, pp. 852–863, 2019.
View at: Publisher Site | Google Scholar
I. Davut, E. Serdar, E. Erdal, and K. Murat, “A novel modified opposition-based hunger games search algorithm to design fractional order proportional-integral-derivative controller for magnetic ball suspension system,” Advanced Control for Applications: Engineering and Industrial Systems, vol. 4, no. 1, 2022.
View at: Publisher Site | Google Scholar
A. Iman, H. A. Asghar, H. Gandomi Amir, C. Xuefeng, and C. Huiling, “RUN beyond the metaphor: an efficient optimization algorithm based on Runge Kutta method,” Expert Systems with Applications, vol. 181, Article ID 115079, 2021.
View at: Google Scholar
J. Tu, H. Chen, M. Wang, and A. H. Gandomi, “The Colony predation algorithm,” Journal of Bionics Engineering, vol. 18, no. 3, pp. 674–710, 2021.
View at: Publisher Site | Google Scholar
S. Song, P. Wang, A. A. Heidari, X. Zhao, and H. Chen, “Adaptive Harris hawks optimization with persistent trigonometric differences for photovoltaic model parameter extraction,” Engineering Applications of Artificial Intelligence, vol. 109, Article ID 104608, 2022.
View at: Publisher Site | Google Scholar
Y. Feng, S. Deb, G. G. Wang, and A. H. Alavi, “Monarch butterfly optimization: a comprehensive review,” Expert Systems with Applications, vol. 168, Article ID 114418, 2021.
View at: Publisher Site | Google Scholar
J. Alfadhli, A. Jaragh, M. G. Alfailakawi, and I. Ahmad, “FP-SMA: an adaptive, fluctuant population strategy for slime mould algorithm,” Neural Computing & Applications, vol. 34, no. 13, Article ID 11175, 2022.
View at: Publisher Site | Google Scholar
A. Fathy, M. A. Elaziz, E. T. Sayed, A. G. Olabi, and R. Hegazy, “Optimal parameter identification of triple-junction photovoltaic panel based on enhanced moth search algorithm,” Energy, vol. 188, no. 1, Article ID 116025, pp. 1–14, 2019.
View at: Google Scholar
B. Shi, H. Ye, L. Zheng et al., “Evolutionary warning system for COVID-19 severity: Colony predation algorithm enhanced extreme learning machine,” Computers in Biology and Medicine, vol. 136, Article ID 104698, 2021.
View at: Publisher Site | Google Scholar
J. Liu, X. Liu, Y. Wu, Z. Yang, and J. Xu, “Dynamic multi-swarm differential learning Harris hawks optimizer and its application to optimal dispatch problem of cascade hydropower stations,” Knowledge-Based Systems, vol. 242, Article ID 108281, 2022.
View at: Publisher Site | Google Scholar
S. R. Fahim, H. M. Hasanien, R. A. Turky et al., “Parameter identification of proton exchange membrane Fuel cell based on hunger games search algorithm,” Energies, vol. 14, no. 16, pp. 1–21, 2021.
View at: Google Scholar
M. Yang, B. Ai, R. He et al., “Machine-learning-based fast angle-of-arrival recognition for vehicular communications,” IEEE Transactions on Vehicular Technology, vol. 70, no. 2, p. 1, 2021.
View at: Google Scholar
Y. Li, H. Darong, and Q. Zixia, “A Classification Algorithm of Fault Modes-Integrated LSSVM and PSO with Parameters' Optimization of VMD,” Mathematical Problems in Engineering, vol. 2021, Article ID 6627367, 12 pages, 2021.
View at: Publisher Site | Google Scholar
D. Lee and P. Schmidt, “On the power of the KPSS test of stationarity against fractionally-integrated alternatives,” Journal of Econometrics, vol. 73, no. 1, pp. 285–302, 1996.
View at: Publisher Site | Google Scholar
P. M. Robinson, “Testing for strong serial correlation and dynamic conditional heteroskedasticity in multiple regression,” Journal of Econometrics, vol. 47, no. 1, pp. 67–84, 1991.
View at: Publisher Site | Google Scholar
M. Ahmadi and Z. Chen, “Machine learning-based models for predicting permeability impairment due to scale deposition,” Journal of Petroleum Exploration and Production Technology, vol. 10, no. 7, pp. 2873–2884, 2020.
View at: Publisher Site | Google Scholar
M. A. Ahmadi, “Toward reliable model for prediction Drilling Fluid Density at wellbore conditions: a LSSVM model,” Neurocomputing, vol. 211, no. 26, pp. 143–149, 2016.
View at: Publisher Site | Google Scholar
M. A. Ahmadi and M. Pournik, “A predictive model of chemical flooding for enhanced oil recovery purposes: application of least square Support vector machine,” Petroleum, vol. 2, no. 2, pp. 177–182, 2016.
View at: Publisher Site | Google Scholar
M. A. Ahmadi and Z. Chen, “Comparison of machine learning methods for estimating permeability and porosity of oil reservoirs via petro-physical logs,” Petroleum, vol. 5, no. 3, pp. 271–284, 2019.
View at: Publisher Site | Google Scholar
M. A. Ahmadi and A. Bahadori, “A LSSVM approach for determining well placement and conning phenomena in horizontal wells,” Fuel, vol. 153, no. 1, pp. 276–283, 2015.
View at: Publisher Site | Google Scholar
N. I. U. Wenqing, H. A. Yinaer, and C. Nan, “Support vector machine based machine learning method for GS 8QAM constellation classification in seamless integrated fiber and visible light communication system,” Science China(Information Sciences), vol. 63, no. 10, pp. 182–193, 2020.
View at: Google Scholar
B. P. Vrigazova, “Detection of malignant and benign breast cancer using the ANOVA-BOOTSTRAP-SVM,” Journal of Data and Information Science, vol. 5, no. 2, pp. 62–75, 2020.
View at: Publisher Site | Google Scholar
Y.-B. Shen, J. Song, and W. Zhi-chao, “SVM feature selection and parameter optimization based on improved fireworks algorithm,” Microelectronics and Computers, vol. 35, no. 1, pp. 21–25, 2018.
View at: Google Scholar
Y. L. Gu, Y. Zhang, X. P. Rui, W. Q. Lu, M. Li, and S. Wang, “Short-term traffic flow prediction based on LSSVM optimized by immune algorithm,” Journal of Jilin University (Engineering and Technology Edition), vol. 49, no. 06, pp. 1852–1857, 2019.
View at: Google Scholar
C. Tian, Z. Xing, X. Pan, and H. A. Wang, “A GA-LSSVM approach for predicting and controlling in screw chiller,” Proceedings - Institution of Mechanical Engineers Part A Journal of Power and Energy, vol. 235, no. 7, 2021.
View at: Publisher Site | Google Scholar
E. S. El-Kenawy and M. Eid, “Hybrid gray wolf and panicle swarm optimization for feature selection,” International Journal of Innovative Computing Information and Control, vol. 16, no. 3, pp. 831–844, 2020.
View at: Google Scholar
B. B. Mandelbrot and M. Taqqu, “Robust R/S analysis of long run serial correlation,” Bulletin of International Statistical Institute, vol. 48, no. 3, pp. 59–104, 1979.
View at: Google Scholar
A. Lo, “Long-term memory in stock market prices,” Econometrica, vol. 59, no. 5, pp. 1279–1313, 1991.
View at: Google Scholar

Copyright

Copyright © 2022 Jungang Du. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies