Abstract
The problem of finding optimal sampling schemes has been resolved in two models. The novelty of this study lies in its cost efficiency, specifically, for the applied problems with expensive sampling process. In discussed models, we show that some observations counteract other ones in prediction mechanism. The autocovariance function of underlying process causes mentioned result. Our interesting result is that, although removing neutralizing observations convert sampling scheme to nonredundant case, it causes to worse prediction. A simulation study confirms this matter, too.
1. Introduction
Prediction is often the main goal in analyzing spatial processes and time series. It is widely used to making applicable decisions in a numerous scientific fields such as geology, biology, medicine, crime surveying, natural disasters, and so on (see for instance Isaaks and Srivastava [1] and Akselsson et al. [2]). The problem of finding an optimal sampling scheme is an important task of researchers, since in applied sciences often collecting of samples is a laborious and expensive process. We refer the interested readers to the works such as McBratney and Webster [3]; Zio et al. [4]; Xiao et al. [5]; and Sward et al. [6].
In this work, the authors attempt to consider some theoretical points in a branch of stochastic processes. In contrast, there are deep studies in existing stochastic processes. Among them, we refer to Xie et al. [7] and Cheng et al. [8]. In the first work, a memory-based event-triggered asynchronous control was addressed for semi-Markov switching systems. In the latter, the authors focus on static output feedback quantized control for fuzzy Markovian switching singularly perturbed systems with deception attacks. For other works, one can see Zhou et al. [9], Cheng et al. [10] and Xie et al. [7].
By Cressie [11], kriging is the most important prediction method for geostatistical data. For prediction in lattice data or discrete time series, we need to apply a model to data. Two predictors for a single missing value in a stationary autoregressive process of order one () model have been compared by Hamaz and Ibazizen [12] and Saadatmand et al. [13]. They considered the following model:in which is a white noise process. Their comparison was done by Pitman’s measure of closeness (PMC) criterion, They showed the best predictor depend only on two nearest observations. In spatial context, Saber and Nematollahi [14] and Saber [15] studied the stationary first-order multiplicative spatial autoregressive () model on a lattice , that is,
Saber and Nematollahi [14] performed a comparison among the three predictors in the above model. They showed that predictor which uses the quarter observations is better than predictor based on observations in the first neighborhood. Also, it is better than predictor based on observations in the nearest neighborhood wherever parameters and are near . However, for most values of and , the predictor which uses the nearest neighborhood observations is the best among three recommended predictors. Saber [15] compared interpolation and extrapolation in model. Figure 1 represents schemes of used observations for these two predictors.

In Saber and Nematollahi [14], predictors use from all observations, while two predictors in Saber [15] do not use all observations. In fact, the interpolator which has been constructed on eight observations uses only two observations. Also, the extrapolator which is based on ten observations is constructed by just one observation. There are one or two observations which play an important role in the prediction process. It can be seen that the result of the prediction based on one basic and neutralizing observation is better than prediction based on a large number of nonessential observations. We show both models (1) and (2) are in this mentioned class.
So, this paper is organized as follows. In Section 2, the neutralizing observation for prediction in model is achieved in some sampling schemes. A comparison between prediction by one neutralizing observation and prediction by eight usual observation has been performed in this section, too. Section 3 is the same as Section 2 in which the model is in the role of model. Finally, a discussion about redundant sampling is stated in Section 4.
2. Prediction in MSAR(1) Model
Suppose for any fixed values of and we want to predict by a set of other observations () whose indices belong to as a subset of lattice . In other words, and . Also, the predictor of in terms of the components of set is denoted by .
Theorem 1. Let satisfy (1) with and . Then, the best linear predictor for w.r.t mean square error (MSE) based on set is only dependent to and is given byif there exists a location such that one of the following conditions is satisfied for all .
Proof. Here, the best predictor is earned by projection theorem (see Brockwell and Davis [16] for details of this theorem). Consider this predictor in a linear form of observations  where  and . By Saber [15],  is computed from the following equation:in which  and ; the first is covariance matrix of observations and the second is covariance vector between observations and unobserved variable, respectively. Regarding Gaetan and Guyon [17],  whose constants  are removed from both sides of (12). Therefore, the general components of  and  areEquation (12) can be rewritten as follows:in which  are the columns of . Since one of the columns of  is covariance between  and , the latter equation can be written in the following form:On the other hand, assumptions (3)–(10) along with (13) lead towhich givesBy substituting (17) in (15), we haveand the solution isThis completes the proof.
To understand the concept of equations (4)–(7), some examples are given in Figure 2. Conditions (4)–(11) are available in a sampling process. In Saber [15],  has been defined as  which satisfies condition (11) for . Therefore, we see that the predictor  is only dependent on  observation.

2.1. Comparison with Unredundant Sampling
Some of the observations, which satisfy the mentioned cases, do not have an effect on prediction.Therefore, an important question is that with a fixed number of observations which sampling scheme is better: sampling with redundant observation or the other?
In other words, is it logical that we remove neutralizing observations in the prediction process? In order to find answer of this question we compare these two methods in this section. To this end, two sampling schemes are considered in Figure 3.

By Theorem 1, the predictor which has based on redundant sampling is
By the same method as Saber [15], the other predictor based on unredundant sampling iswhere .
and . After a cumbersome computation, we have
A theoretical comparison of and is not possible, so to compare them, we use and estimators. Here is 1 for positive t and 0 otherwise.
Two well-known distributions normal () and exponential () for errors have been used. By Saber and Nematollahi [14], in case of errors, we apply the mean-zero variables .
Now, we simulate random variables with Normal distributed errors on a lattice and compute , , and . All the findings are demonstrated in Figures 4 and 5, respectively. In both figures, one can see that and for almost all values of parameters and . For some values of these parameters which are less than 0.25, . In these cases, , so they are negligible. In other words, approximately we have and for all values of parameters and . These both criteria are increasing functions w.r.t parameters and . These findings state that the predictor is better than predictor with respect to both criteria MSP and PMC. Therefore, for predicting in model (2), using one observation leads to better results than using 8 observations in . When parameters and are near 1, recommendation for using of this one observation rather than 8 observations in becomes more serious.


The similar work with normal errors has been done for case of exponential distributed errors with parameter . For exponential case, our findings demonstrate a bit difference with normal case. Results of this simulation are displayed in Figures 6 and 7. These figures show approximately and for all values of parameters and . These findings state that the predictor is better than predictor with respect to criterion MSP. Regarding PMC criterion, we can see that the predictor is better than predictor whenever parameters and are near 0.5. However, the latter result is not significant.


Finally, we conclude that removing observation which has caused other observations be redundant in prediction does not lead to better results in almost all situations.
3. Prediction in Model
In this section, we show that the best linear predictor at time for stochastic process uses at most one observation in every side of time . First of all, for any fixed time , define vector where is a subset of natural numbers ().
In the following theorem, we demonstrate the best linear predictor in this model based on samples uses at least 2 observations.
Theorem 2. Let come from model (1) with and and there exists a time such that for all observed times ,orThen, the best linear predictor for w.r.t criterion MSP based on variables is only dependent to and
Proof. The proof is similar with the proof of Theorem 1 in which  where , , , and 
This matter shows that this is not required to do a corresponding comparison with Section 2.1. By Theorem 2 and above paragraph, the best linear predictors based on observations  and  are  and , respectively. Here,  is the nearest time to time  among all times in . Clearly, , and hence the predictor  is better than predictor .
In the end, we give a theorem with which Theorems 1 and 2 can be expressed as its special cases.
Theorem 3. Let , be a stationary stochastic process with and . Then, the best linear predictor for based on with respect to MSE is if there exists a point such that
The proof of theorem is not presented, since it is point to point similar to the proof of Theorem 1. Notice that both Theorems 1 and 2 are special cases of Theorem 3 when and , respectively.
4. Conclusion
In this paper, three theorems were presented which are useful for deriving optimal sampling scheme. In Theorem 1, the design of an optimal sampling scheme is given for model (2). A similar result for model (1) is found in Theorem 2.
In fact, we have tried to give an answer to the following question: for a fixed set of data collected through observations, if we would like to find the best linear predictor of a missing value, which of the following sampling schemes is logical?(1)Should we use neutralizing observations?(2)Should we remove neutralizing observations and use remaining observations?
The results show that the first scenario is more efficient in prediction process. Indeed, we have shown that some samples have no effect on prediction in discussed models. The achievement of this work will be useful, since it might not be possible to access a list of elements in many practical situations and several types of populations, so the use of an element as a sampling unit is not applicable.
There are other complicated situations than those described in Theorem 1. They lead to redundant sampling in mechanism of prediction, too. This is left as future work. Also, in Saber and Khorshidian [18], the best predictor in stationary first-order moving average model has been represented. So, another interesting study may be searching and exploring neutralizing and neutralized samples in moving average models.
Appendix
Comparison between redundant and nonredundant sampling. rm(list = ls()) m < −200; n < −200 z0 < -rnorm(m + n + 1) epsilon < -matrix(rnorm((m + 1)∗(n + 1)),m + 1,n + 1) A < -seq(.05,.96,.05) B < -seq(.05,.96,.05) msezr < -msezir < -matrix(0,length(A),length(B)) pzrzir < -matrix(0,length(A),length(B)) for(a in A){ for(b in B){ g < -function(x,y) a^x∗b^y zAR < -matrix(0,m + 1,n + 1) zAR[1,]<-z0[1:(n + 1)] zAR[−1,1]<-z0[(n + 2):(n + m + 1)] for(i in 2:(m + 1)){ for(j in 2:(n + 1)){ zAR[i,j]<-zAR[i − 1,j]∗a + zAR[i,j − 1]∗b + zAR[i − 1,j − 1]∗-(a∗b) }} z < -zAR + epsilon z < -z[−1,−1] fZR < -function(i,j){ if((j<=(n − 4)) &(i<=(m − 4))) y < −((a∗b)^2/2)∗z[i + 2,j + 2] if((j > (n − 4))|(i > (m − 4))) y < −z[i,j] return(y)} fZIR < −function(i,j){ if((j<=(n − 4)) &(i<=(m − 4))) { A1<-matrix(c(g(0,0),g(0,1),g(1,1),g(1,0),g(1,1),g(2,1),g(2,0),g(2,1), g(0,1),g(0,0),g(1,2),g(1,1),g(1,0),g(2,2),g(2,1),g(2,0), g(1,1),g(1,2),g(0,0),g(0,1),g(0,2),g(1,0),g(1,1),g(1,2), g(1,0),g(1,1),g(0,1),g(0,0),g(0,1),g(1,1),g(1,0),g(1,1), g(1,1),g(1,0),g(0,2),g(0,1),g(0,0),g(1,2),g(1,1),g(1,0), g(2,1),g(2,2),g(1,0),g(1,1),g(1,2),g(0,0),g(0,1),g(0,2), g(2,0),g(2,1),g(1,1),g(1,0),g(1,1),g(0,1),g(0,0),g(0,1), g(2,1),g(2,0),g(1,2),g(1,1),g(1,0),g(0,2),g(0,1),g(0,0)),8,8) B1<-matrix(c(g(2,3),g(2,4),g(3,2),g(3,3),g(3,4),g(4,2),g(4,3),g(4,4)),8,1) Q1<-c(z[i + 2,j + 3],z[i + 2,j + 4],z[i + 3,j + 2],z[i + 3,j + 3],z[i + 3,j + 4],z[i + 4,j + 2], z[i + 4,j + 3],z[i + 4,j + 4]) y < -t(solve(A1,B1))%∗%Q1 } if((j > (n − 4))|(i > (m − 4))) y < −z[i,j] y < -as.vector(y) return(y)} fZIR1<-function(i,j){ if((j<=(n − 4)) &(i<=(m − 4))) y < −(a^ 2∗b ^ 2)∗(b∗z[i + 2,j + 3] + a∗z[i + 3,j + 2]−a∗b∗z[i + 3,j + 3]) if((j > (n − 4))|(i > (m − 4))) y < −z[i,j] return(y)} zr < -matrix(0,m,n) zir < -matrix(0,m,n) for(i in 1:m){ for(j in 1:n){ zr[i,j]<-fZR(i,j) zir[i,j]<-fZIR1(i,j) }} msezr[abs(a)∗20,abs(b)∗20]<-mean((zr-z)^2) msezir[abs(a)∗20,abs(b)∗20]<-mean((zir-z)^2) pzrzir[abs(a)∗20,abs(b)∗20]<-sum((abs(zr-z))<(abs(zir-z)))/sum((abs(zr-z))!=(abs(zir-z))) }} grid < -expand.grid(xj = A,yj = B) AA < -grid$xj BB < -grid$yj persp(A,B,msezr, theta = 135,phi = 5,scale = TRUE, expand = 1,col = “green,” ltheta = 0, lphi = 0,box = T,ticktype = ”simple,”xlab = “a,” ylab = “b,” zlab = “MSE(Z1)”) grid < -expand.grid(xj = A,yj = B) AA < -grid$xj BB < -grid$yj persp(A,B,msezir, theta = 155,phi = 5,scale = TRUE, expand = 1,col = “green,” ltheta = 0, lphi = 0,box = T,ticktype = “simple,” xlab = “a,” ylab = “b,” zlab = “MSE(Z2)”) grid < -expand.grid(xj = A,yj = B) AA < -grid$xj BB < -grid$yj persp(A,B,pzrzir, theta = 115,phi = 5,scale = TRUE, expand = 1,col = “green,” ltheta = 0, lphi = 0,box = T,ticktype = “simple,” xlab = “a,” ylab = “b,” zlab = “PMC(z1|z2)”) par(mfrow = c(2,2)) # mean on b similar to b fixed plot(A,rowMeans(pzrzir), “l,”xlab = “a,” ylab = “PMC(ZR|ZIR),” main = “Mean on b”) lines(A,rep(.5,length(A)), “l”) minn < -min(c(rowMeans(msezr),rowMeans(msezir))) maxx < -max(c(rowMeans(msezr),rowMeans(msezir))) AA < -seq(minn, maxx,length = length(A)) plot(A,AA, “n,” xlab = “a,” ylab = “MSP”) lines(A,rowMeans(msezr), “l”) lines(A,rowMeans(msezir), “l”, lty = 2) legend(“topleft,” legend = c(“ZR,” “ZIR”),lty = c(1,2), merge = TRUE) plot(A,colMeans(pzrzir), “l,”xlab = “b,” ylab = “PMC(ZR|ZIR),” main = “Mean on a”) lines(A,rep(.5,length(A)), “l”) minn < -min(c(colMeans(msezr),colMeans(msezir))) maxx < -max(c(colMeans(msezr),colMeans(msezir))) AA < -seq(minn, maxx,length = length(A)) plot(A,AA, “n,” xlab = “b”, ylab = “MSP”) lines(A,colMeans(msezr), “l”) lines(A,colMeans(msezir), “l”, lty = 2) legend(“topleft”, legend = c(“ZR”, “ZIR”),lty = c(1,2), merge = TRUE) par(mfrow = c(2,2)) plot(A,pzrzir[,j], “l”, xlab = “a”,ylab = “PMC(ZR|ZIR)”, main = “b = 0.95”) lines(A,rep(.5,length(A)), “l”) minn < -min(c(msezr[,j],msezir[,j])) maxx < -max(c(msezr[,j],msezir[,j])) AA < -seq(minn, maxx,length = length(A)) plot(A,AA, “n”, xlab = “a”, ylab = “MSP”) lines(A,msezr[,j], “l”) lines(A,msezir[,j], “l”, lty = 2,xlab = “a”) legend(“topleft”, legend = c(“ZR,” “ZIR”),lty = c(1,2), merge = TRUE)
More complicated forms: a<-.5 b < -.13 g < -function(i,j) a ^ i∗b ^ j A < -matrix(c(1,a,g(3,2),g(3,3),a,1,g(2,2),g(2,3),g(3,2),g(2,2),1,b,g(3,3),g(2,3),b,1),4,4) B < -c(g(2,1),g(1,1),g(1,1),g(1,2)) solve(A,B) A < -matrix(c(1,a,g(2,2),a,1,g(1,2),g(2,2),g(1,2),1),3,3) B < -c(g(2,1),g(1,1),g(1,1)) solve(A,B)
Figure 3: par(mfrow = c(1,2)) y < -c(1,5) x < -c(1,5) y1<-c(1,2,3,2,3,1,2,3) + 2 x1<-c(1,1,1,2,2,3,3,3) + 2 plot(x,y,xlab = “”ylab = “”, “n,” main = “Redundant Sampling,” axes = FALSE). points(x1,y1,xlab = ““ylab = ,”” pch = 15). points(3,3,pch = 3). points(1,1,pch = 2). axis(1,1:5,c(expression(i),'',expression(s[0]),““,””)). axis(2,1:5,c(expression(j),“,expression(t[0]),““,””)) box() y < -c(1,5) x < -c(1,5) y1<-c(2,3,1,2,3,1,2,3) + 2 x1<-c(1,1,2,2,2,3,3,3) + 2 plot(x,y,xlab = “”,ylab = ””, “n,”main = “Irredundant sampling,”axes = FALSE) points(x1,y1,xlab = ““,ylab = ””,pch = 15) points(1,1,pch = 2) axis(1,1:5,c(expression(i),“,”,““,””)) axis(2,1:5,c(expression(j),“,”,““,””)) box()
Comparison between redundant and nonredundant sampling with exponential error. rm(list = ls()) m < −200; n < −200 z0<-rnorm(m + n + 1). epsilon < -matrix(rexp((m + 1)∗(n + 1)),m + 1,n + 1). A < -seq(.05,.96,.05). B < -seq(.05,.96,.05) msezr < -msezir < -matrix(0,length(A),length(B)). pzrzir < -matrix(0,length(A),length(B)). for(a in A){ for(b in B){ g < -function(x,y) a^x∗b^y. zAR < -matrix(0,m + 1,n + 1) zAR[1,]<-z0[1:(n + 1)] zAR[−1,1]<−z0[(n + 2):(n + m + 1)] for(i in 2:(m + 1)){. for(j in 2:(n + 1)){. zAR[i,j]<−zAR[i − 1,j]∗a + zAR[i,j − 1]∗b + zAR[i − 1,j − 1]∗ − (a∗b) }} z < -zAR + epsilon z < −z[−1,−1] z < −z−1 fZR < -function(i,j){ if((j<=(n − 4)) &(i<=(m − 4))) y < −((a∗b) ^ 2/2)∗z[i + 2,j + 2] if((j > (n − 4))|(i > (m − 4))) y < −z[i,j] return(y)} fZIR < -function(i,j){ if((j<=(n − 4)) &(i<=(m − 4))) { A1<-matrix(c(g(0,0),g(0,1),g(1,1),g(1,0),g(1,1),g(2,1),g(2,0),g(2,1), g(0,1),g(0,0),g(1,2),g(1,1),g(1,0),g(2,2),g(2,1),g(2,0), g(1,1),g(1,2),g(0,0),g(0,1),g(0,2),g(1,0),g(1,1),g(1,2), g(1,0),g(1,1),g(0,1),g(0,0),g(0,1),g(1,1),g(1,0),g(1,1), g(1,1),g(1,0),g(0,2),g(0,1),g(0,0),g(1,2),g(1,1),g(1,0), g(2,1),g(2,2),g(1,0),g(1,1),g(1,2),g(0,0),g(0,1),g(0,2), g(2,0),g(2,1),g(1,1),g(1,0),g(1,1),g(0,1),g(0,0),g(0,1), g(2,1),g(2,0),g(1,2),g(1,1),g(1,0),g(0,2),g(0,1),g(0,0)),8,8) B1<-matrix(c(g(2,3),g(2,4),g(3,2),g(3,3),g(3,4),g(4,2),g(4,3),g(4,4)),8,1) Q1<-c(z[i + 2,j + 3],z[i + 2,j + 4],z[i + 3,j + 2],z[i + 3,j + 3],z[i + 3,j + 4],z[i + 4,j + 2], z[i + 4,j + 3],z[i + 4,j + 4]) y < -t(solve(A1,B1))%∗%Q1 } if((j > (n − 4))|(i > (m − 4))) y < −z[i,j] y < -as.vector(y) return(y)} fZIR1<-function(i,j){ if((j<=(n − 4)) &(i<=(m − 4))) y < −(a ^ 2∗b ^ 2)∗(b∗z[i + 2,j + 3] + a∗z[i + 3,j + 2]−a∗b∗z[i + 3,j + 3]) if((j > (n − 4))|(i > (m − 4))) y < −z[i,j] return(y)} zr < -matrix(0,m,n). zir < -matrix(0,m,n). for(i in 1:m){. for(j in 1:n){ zr[i,j]<-fZR(i,j) zir[i,j]<-fZIR1(i,j) }} msezr[abs(a)∗20,abs(b)∗20]<-mean((zr-z)^2) msezir[abs(a)∗20,abs(b)∗20]<-mean((zir-z)^2) pzrzir[abs(a)∗20,abs(b)∗20]<-sum((abs(zr-z))<(abs(zir-z)))/sum((abs(zr-z))!=(abs(zir-z))) }} grid < -expand.grid(xj = A,yj = B) AA < -grid$xj. BB < -grid$yj persp(A,B,msezr, theta = 135,phi = 5,scale = TRUE, expand = 1,col = “green,”ltheta = 0, lphi = 0,box = T,ticktype = “simple,”xlab = “a,”ylab = “b,” zlab = “MSE(Z1)”) grid < -expand.grid(xj = A,yj = B) AA < -grid$xj BB < -grid$yj persp(A,B,msezir, theta = 155,phi = 5,scale = TRUE, expand = 1,col = “green,”ltheta = 0, lphi = 0,box = T,ticktype = “simple,”xlab = “a,” ylab = “b,” zlab = “MSE(Z2)”) grid < -expand.grid(xj = A,yj = B) AA < -grid$xj BB < -grid$yj persp(A,B,pzrzir, theta = 115,phi = 5,scale = TRUE, expand = 1,col = “green,”ltheta = 0, lphi = 0,box = T,ticktype = “simple,” xlab = “a,” ylab = “b,” zlab = “PMC(z1|z2)”) par(mfrow = c(2,2)) # mean on b similar to b fixed. plot(A,rowMeans(pzrzir), “l”,xlab = “a”,ylab = “PMC(ZR|ZIR)”,main = “Mean on b”) lines(A,rep(.5,length(A)),“l”). minn < -min(c(rowMeans(msezr),rowMeans(msezir))), maxx < -max(c(rowMeans(msezr),rowMeans(msezir))) AA < -seq(minn, maxx,length = length(A)) plot(A,AA,“n”,xlab = “a”,ylab = “MSP”) lines(A,rowMeans(msezr),“l”) lines(A,rowMeans(msezir),“l”,lty = 2). legend(“topleft”, legend = c(“ZR”,“ZIR”),lty = c(1,2), merge = TRUE), plot(A,colMeans(pzrzir),“l”,xlab = “b”,ylab = “PMC(ZR|ZIR)”,main = “Mean on a”) lines(A,rep(.5,length(A)),“l”). minn < -min(c(colMeans(msezr),colMeans(msezir))). maxx < -max(c(colMeans(msezr),colMeans(msezir))) AA < -seq(minn, maxx,length = length(A)) plot(A,AA,“n”,xlab = “b”,ylab = “MSP”) lines(A,colMeans(msezr),“l”) lines(A,colMeans(msezir),“l”,lty = 2) legend(“topleft”, legend = c(“ZR”,“ZIR”),lty = c(1,2), merge = TRUE) j < −19; B[j] par(mfrow = c(2,2)) plot(A,pzrzir[,j],“l”,xlab = “a”,ylab = “PMC(ZR|ZIR)”,main = “b = 0.95”) lines(A,rep(.5,length(A)),“l”) minn < -min(c(msezr[,j],msezir[,j])) maxx < -max(c(msezr[,j],msezir[,j])) AA < -seq(minn, maxx,length = length(A)) plot(A,AA,“n”,xlab = “a”,ylab = “MSP”) lines(A,msezr[,j],“l”) lines(A,msezir[,j],“l”,lty = 2,xlab = “a”) legend(“topleft”, legend = c(“ZR”,“ZIR”),lty = c(1,2), merge = TRUE) j < −18; A[j]
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Authors’ Contributions
The idea of the paper was conceived and planned by Mohammad Mehdi Saber and Zohreh Shishebor. M. M. Abd El Raouf, E.H. Hafez, and Ramy Aldallal took the lead in writing the manuscript. All authors provided critical feedback and helped shape the research, analysis, and manuscript.