Abstract

The nonlinear autoregressive models under normal innovations are commonly used for nonlinear time series analysis in various fields. However, using this class of models for modeling skewed data leads to unreliable results due to the disability of these models for modeling skewness. In this setting, replacing the normality assumption with a more flexible distribution that can accommodate skewness will provide effective results. In this article, we propose a partially linear autoregressive model by considering the skew normal distribution for independent and dependent innovations. A semiparametric approach for estimating the nonlinear part of the regression function is proposed based on the conditional least squares approach and the nonparametric kernel method. Then, the conditional maximum-likelihood approach is used to estimate the unknown parameters through the expectation-maximization (EM) algorithm. Some asymptotic properties for the semiparametric method are established. Finally, the performance of the proposed model is verified through simulation studies and analysis of a real dataset.

1. Introduction

One of the most widely used classes for time series analysis is the class of autoregressive models. The normality of innovations is a common assumption for autoregressive models. However, such an assumption may be unrealistic in many empirical situations. In recent years, more attention has focused on nonnormal innovations rather than normal innovations. Tarami and Porahmadi [1] considered multivariate autoregressive processes with the t distribution for modeling volatile time series data. Jacobs and Lewis [2] analyzed an autoregressive model with nonnormal innovations. Ghasemi et al. [3] considered autoregressive models with generalized hyperbolic innovations. The most important limitation of the normal distribution is that it cannot model skewness. In this article, we consider the skew normal (SN) distribution (Azzalini [4]) for modeling the uncertainty of the innovations in the time series analysis. This skew-symmetric distribution is a generalization of the normal distribution that enables it to model asymmetric observations as well as the symmetric data. A non-Gaussian autoregressive model with epsilon SN innovations is considered by Bondon [5]. Sharafi and Nematollahi [6] introduced an autoregressive model of order one with SN innovations and proposed some methods for parameters estimation.

Many researchers have recently studied nonlinear autoregressive models in various fields of science. For example, Tsay [7]; Farnoosh and Mortazavi [8]; Hajrajabi and Mortazavi [9]; Farnoosh et al. [10] and Ortega Contreras et al. [11] considered nonlinear time series models and analyzed various datasets. Farnoosh and Mortazavi [8] considered the Gaussian first-order nonlinear autoregressive model with dependent innovations to estimate the yearly amount of deposit in Iran’s Tejarat-Bank. Hajrajabi and Mortazavi [9] proposed a nonlinear autoregressive model with SN innovations and presented asymptotic behaviors of the estimators. Tong [12] and Haggen and Ozaki [13] investigated nonlinear models for modeling sound vibrations. A class of nonlinear additive autoregressive models with exogenous variables to analyze the nonlinear time series is proposed by Chen and Tsay [14]. In addition, the estimation of autoregression function in nonlinear autoregressive models through semiparametric methodology is interested in literature. Fan and Yao [15] discussed modern parametric and nonparametric approaches for estimating nonlinear models. Zhuoxi et al. [16] proposed a semiparametric approach for a nonlinear autoregressive model considering the innovations are normal distribution with mean zero and fixed variance. Hajrajabi and Fallah [17] followed Zhuoxi et al. [16] by assuming the SN innovations. Farnoosh et al. [10] studied a partially linear autoregressive model by considering independent innovations with mean zero and fixed variance. To estimate the regression function, similarly to Zhuoxi et al. [16], they used a semiparametric approach.

This article aims at developing a partially linear autoregressive model with SN innovations for both independent and dependent innovations. This model is an extension of the proposed model by Farnoosh et al. [10]. The estimation of our proposed model consists of two parts. In the first part, we use a semiparametric approach consisting of parametric estimation and nonparametric adjustment introduced by Zhuoxi et al. [16]. For parametric estimation, the conditional least squares approach is used to estimate the model's nonlinear function. Also, the smooth kernel approach is used to estimate the nonparametric adjustment. The second part is to compute the conditional maximum-likelihood (CML) estimators of parameters using the EM algorithm. We also derived the closed iterative forms for the CML estimators of parameters.

The plan of the article proceeds as follows: Section 2 covers the brief properties of the SN distribution. In Section 3, the SN partially linear autoregressive models with independent and dependent innovations are introduced. This section also shows how to estimate the nonlinear part of the models considering the semiparametric approach. In Section 4, the CML estimation of the model parameters via the EM algorithm is discussed. The performance of suggested methods is investigated by simulation in Section 5. A real dataset is also considered in this section to explain the applicability of the proposed models. Finally, conclusions are provided in Section 6. Some asymptotic behaviors of the estimators are given in the Appendix.

2. A Brief Introduction about the SN Distribution

Let be a random variable with univariate SN distribution, denoted by , where , and indicate the location, scale, and skewness parameters, respectively. Then, the density function of iswhere is the density function of the standard normal distribution and is its cumulative distribution function, , , and .

Lemma 1. If , then(a)(b)(c)(d)where , and and are the coefficients of skewness and kurtosis, respectively.

Lemma 2. From [18] (Theorem 1) and [19] (p. 201), if , and and be independent, thendistributed as . Also, the joint density of and is

Lemma 3. Suppose , , and is defined as , thenand also

3. Semiparametric Approach in the Proposed Model

In this section, we consider partially linear autoregressive models of the following forms.

3.1. Model with Independent Innovations (Model I)

Consider the following model: where , is a nonlinear autoregressive function, and is an unknown parameter. Also, and are independent for each t.

At first, we estimate by an initial guess as a known function of and . The parameter can be estimated by using the conditional nonlinear least squares errors (CNLSE) method based on data as follows:where (see Lemma 1).

We use the semiparametric form of to adjust the initial approximation, where shows a nonparametric adjustment.

For estimating , we use the following local L2-fitting criterion:where and are the kernel function and bandwidth, respectively. We get the estimator of by minimizing (2) with respect to as follows: and the estimator of autoregression function of the model is

However, the function in the formula in equation (9) is unknown. Therefore, by usingand regarding the fact that are small values, one can obtain

Finally, the estimator of is

3.2. Model with Dependent Innovations (Model II)

By considering the partially linear autoregressive model in equation (6) with dependent innovations as first-order autoregressive AR(1), we havewhere and is a nonlinear autoregression function similar to Model (I). Also, and are independent for each t.

From (14), we can write

And, therefore,

We want to estimate the unknown regression functions and that can be formed as and , respectively. As it is shown in Section 3.1, to estimate , we can apply the CNLSE approach as follows:where (see Lemma 1).

By applying the same idea as in Model (I), the local L2-fitting criterion is

By minimizing (18) with respect to , one can write

Unfortunately, equation (19) includes the unknown function . Therefore, by usingand regarding the fact that are small values, we have

Finally, the estimator of is obtained as

4. Conditional Maximum-Likelihood Estimation

There are several methods for estimating the parameters in time series models. This article implements the CML estimation method. The conditional likelihood function of the Model (II) given the observed data are defined bywhere , and is the unknown parameters vector. Since the likelihood function in equation (23) is complicated, we need a computational approach to maximize it. Therefore, an EM algorithm is developed to calculate the ML estimate of the parameters. To do this, we consider the missing data problem. By defining the variables , , andand using Lemma 2, the conditional distribution of observation in the Model (II) is given by

Let and be the incomplete and missing data, respectively, and using Lemma 2, the joint density function of the complete data can be written as

Therefore, the complete data likelihood and log likelihood functions are, respectively,

The EM algorithm iterates between E and M steps. The E step obtains the conditional expectation of given the observed data and current parameters. Based on Lemma 3, we havewhere

Calculating the conditional expectation of (27) yieldswherewithwhere and are given in (22).

The M step of the algorithm maximizes the expectation computed in (30).

Given the values of the parameters in kth iteration, equating the first-order derivatives of (30) to zero and solving the resulted system of equations, the maximum-likelihood estimates of model parameters in k+1th iteration of the algorithm are obtained to be

Then, updating the parameter from equation (17) yieldswhere

Therefore, the nonparametric estimator of is updated as follows:

Finally, the semiparametric estimation of autoregression function in the k + 1 th iteration is

As it can be seen, there is no closed iterative form for the ML estimator of parameter . Therefore, an iterative procedure similar to Newton–Raphson should be employed to calculate the corresponding value for this parameter.

5. Simulation Study

5.1. Simulation Study 1

We present a simulation study to examine the performance of the suggested methods by using the R programming environment. Two partially linear autoregressive models are considered with the different situations of independent and dependent innovations, respectively, aswhere . The data were simulated from Model (II) in (14) with sample sizes n = 100, 200, 400, and 500 iterations considering the nonlinear function . We assume . Also, for choosing the bandwidth , we perform an opening window method, that is, considering several bandwidths [20].

Table 1 presents some descriptive statistics for the simulated data from Model (II). The Kolmogorov–Smirnov (K–S) test is also provided. The test statistics and p-value reject the normality of the datasets.

The values of and their semiparametric estimates from Model (II) are displayed in Figure 1(a) with selected bandwidth. This figure presents that the semiparametric estimator of the autoregression function with AR(1) innovation performs well.

As can be seen from Figure 1(b), the estimated values of data are close to the exact values of data. Figure 1(c) presents the autocorrelation function (ACF) of the residuals of the model with AR(1) innovations. This figure illustrates that the residuals of the model are uncorrelated.

Finally, we calculate the root of mean squared error (RMSE) for comparing the efficiency and accuracy of the suggested partially linear autoregressive models as

Table 2 reports the RMSE of Model (I) and Model (II) with three values of and different sample sizes . As we can see from Table 2, the simulation results show that the model with AR(1) innovations has more performance in comparison to the model with independent innovations. Also, the results show that the RMSE decreases when the sample size increases for all sample sizes n and all values of .

Also, the values of and their semiparametric estimates under two above models with selected bandwidth are displayed in Figure 2(a). The solid line is the regression function , and the blue line and red line are the semiparametric fitted function of Models I and II, respectively.

Figure 2(b) presents the exact and estimated values of data under Models I and II. The solid line corresponds to exact values of , and the blue line and red line correspond to semiparametric estimates of from Models I and II, respectively. We can see that the estimation results of Model II are very close.

5.2. Simulation Study 2

This section presents a simulation study to compare the performance of the proposed model with normal model in modeling time series data with SN innovations. We consider a partially linear autoregressive model of the following form:where and chose two different functions by assuming , by assuming .

The datasets are generated with sample sizes and different values of . Also, the bandwidth is chosen by an opening the window technique [20].

The values of RMSE of the model in equation (40) under normal and SN innovations are presented in Tables 3 and 4.

The RMSE results show the SN model has a better performance than the normal model for positive and negative skewness parameters. Also, there are no significant differences between the RMSE values of normal and SN models when .

Figures 3(a) and 4(a) show the values of and its semiparametric estimator under the normal and SN model with selected bandwidth , and sample size for two functions and . Also, the exacted and estimated values of data are shown in Figures 3(b) and 4(b). The figures show that the SN model performs better than the normal model.

6. Empirical Application

This section analyzes an application of the partially linear autoregressive model with SN innovations to a real-world data set. This dataset consists of three Pinus eldarica trees randomly selected from a plantation at Garagpas-Kelardasht site, located in the western part of the Mazandaran province in the north of Iran. Farnoosh et al. [10] and Hajrajabi and Mortazavi [9] studied this dataset by applying different autoregressive models. Considering the proposed semiparametric approach and by assuming , we estimate autoregression function in two models (i.e., partially linear autoregressive models with AR(1) and independent innovations) for the ARW data from 1974 to 2008 using the SN innovations.

Table 5 shows descriptive statistics for the ARW data. We can see that the data have skewness.

Tables 6 and 7 report the ML estimates and RMSE values for the models with independent and AR(1) innovations, respectively.

Figures 5(a) and 6(a) show the exact values of the ARW data and their estimates in the partially linear autoregressive models with independent and AR(1) innovations, respectively. For diagnostic checking of the fitted models, the residuals of models are analyzed by using ACF plot. Figures 5(b) and 6(b) show the ACF of residuals in the partially linear autoregressive models with independent and AR(1) innovations, respectively. The residuals of the models are almost uncorrelated. As we can see from these figures, the model with AR(1) innovations fits better than the model with independent innovations in describing the ARW data. Also, the comparison of RMSE for the two discussed models shows that the suggested semiparametric approach for a partially linear autoregressive model with AR(1) innovations is more efficient.

7. Conclusion

This article suggested the partially linear autoregressive model with SN innovations from a semiparametric point of view. Both independent and dependent innovations are considered. The CNLSE approach and the local L2-fitting criterion are used to estimate the regression function. For parameter estimation, we applied the CML method using the EM algorithm. The findings of the simulation studies indicated the proposed model is quite flexible for modeling skewed data. Furthermore, the proposed semiparametric method is used to show that the partially linear autoregressive model under SN innovations is an efficient model for modeling the ARW data of Kelardasht site. The results of the study verified the effectiveness of the proposed model.

Appendix

A. The Asymptotic Behaviors of Estimators

To investigate asymptotic behaviors of the estimators, we consider the following assumptions A1-A12 of Farnoosh et al. [21]:(A1) is Lipschitz continuous and all moments of innovations are finite with Lipschitz density function. Also, the sequence is a stationary ergodic sequence of integrable random variables.(A2) exist and are continuous for all , where (A3) a.s. where is constant.(A4) , where in the independent innovation case is given byand in the dependent innovation case is asand consider the following matrices:We will assume throughout that , and are positive definite matrices.(A5)For the expectations and are finite, where r and its partial derivatives are evaluated at and .(A6)For , there exist functions:such thatfor all , and we assumeare finite for .(A7)The sequence is -mixing (see [22]).(A8)The random variables and have the same distribution such that the density of exists, bounded, continuous, and strictly positive in a neighborhood of the point .(A9)Functions and are bounded and continuous with respect to , in a neighborhood of the point away from 0. Set .(A10)Function has a continuous derivative with respect to and its derivative at the point is uniformly bounded with respect .(A11)The kernel function is a compactly symmetric bounded function, such that for in a set of positive Lebesgue measures.(A12), where .

Considering the assumptions A1–A12, we have the following theorems and lemmas:

Lemma 4. Under the assumptions of A1–A12, as , we have(a)(b)(c)(d)where and are defined in (A8) and (A9), respectively.

Theorem 1. Consider the estimator in equation (10). Then, , as .

Proof. Using Lemma 4 and the strong consistency of and , we can prove Theorem 1.

Theorem 2. Let be the autoregression function estimator given in equation (13) for the model with independent innovations. Then, as .

Proof. We can obtain the following equality for independent innovations:One can writeIt is known that a.s. as (see [23]). Then, where is an upper bound of the kernel function. Hence, a.s. as . Since where is a constant, and as Hence, by lemma A.1 and the strong consistency of and , we have

Theorem 3. Let be the defined estimator in equation (22) for the model with dependent innovations. Then , as .

Proof. For the dependent innovations, we havewhere is a stationary AR(1) process given by , and . To finish the proof, it is enough to prove , and as We haveThe last term in (A.13) can be written asNotice that a.s. as (see [8]). Then,where is an upper bound of the kernel. Therefore, a.s. as
Sincewhere is a constant. Therefore, , as Using the equality , it is found that as Similarly, we can prove as Thus,

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest in the publication of this article.

Authors’ Contributions

All authors contributed equally. All authors read and approved the final manuscript.