Abstract
As in parametric regression, nonparametric kernel regression is essential for examining the relationship between response variables and covariates. In both methods, outliers may affect the estimators, and hence robustness is essential to deal with practical issues. This paper proposes a family of robust nonparametric estimators with unknown scale parameters for regression function based on the kernel method. In addition, we establish the asymptotic normality of the estimator under the concentration properties on small balls of probability measure of the functional explanatory variables. The superiority of the proposed methods is shown through numerical and real data studies to compare the sensitivity to outliers between the classical and robust regression (fixed and unknown scale parameter). Such a new proposed method will be useful in the future for analyzing data and making decisions.
1. Introduction
Nonparametric kernel regression is a familiar tool to explore the underlying relationship between response variables and covariates. In the functional data studies, these estimators are given by Ferraty and Vieu [1] and Ferraty and Vieu [2]. As in parametric regression estimation, the kernel estimator may be affected by outliers; hence, considering robustness is essential.
In this context, let be a sequence of strictly stationary dependent random variables identically distributed as (X, Y), a random pair valued in , where is a semi-metric space. Let denote the semi-metric. The purpose of this work is to study the nonparametric estimation of the robust regression when the scale parameter is unknown. In fact, for any , is defined as zero with respect to the parameter of the following equation:where is a real-valued function that satisfies some regularity conditions, to be stated later, and is a robust measure of conditional scale. In what follows, we assume, for all that the robust regression exists and is unique (see, for instance, Boente and Fraiman [3]).
We point out that the robustification method is an old subject in statistics. Among many papers dealing with the robust nonparametric estimates of regression function in finite dimension case, one can refer, for example, to key works of Robinson (1984), and Collomb and Härdle [4]. We refer to Laïb and Ould Saïd [5] for some results on multivariate time series (mixing and ergodicity conditions), and Roozbeh [6] and Roozbeh et al. [6–8] for multicollinearity in the least trimmed squares regression and outliers effects.
The robust regression is widely studied in nonparametric functional statistics. Indeed, it was firstly introduced by Azzedine et al. [9]; they proved the almost complete convergence of this model in the independent and identically distributed (i.i.d.) case. The asymptotic normality of their model has been established later by Attouch et al. [10–12] in both dependent and independent cases and unknown scale parameters. However, all these results are obtained in the complete data. In the incomplete data cases, we can refer to Derrar et al. (2020) and the references therein.
The main objective of this paper is to generalize the results of Boente and Vahnovanb [13]. Precisely, we prove the asymptotic normality of the constructed estimator by combining the ideas of robustness with those of unknown scale parameters. This result is obtained under standard conditions allowing us to explore the different structural axes of the subject, such as the robustness of the regression function and the correlation of the observation. We emphasize that, contrary to the usual case where the scale parameter is fixed, it must be estimated here, which makes it more difficult to establish the asymptotic properties of the estimator. However, although this difference is more important in the context of this work, we have been able to overcome it.
The paper is organized as follows. Section 2 is dedicated to the presentation of the robust estimator with unknown scale parameters. The needed assumptions and notations are given in Section 3. We state our main results in Section 4, and the proofs of the main results are relegated to the appendix. Sections 5 and 6 are devoted to the simulation and real data applications of our proposed methods. The conclusion is stated in Section 7.
2. The Robust Equivariant Estimators and Their Related Functional
Let us consider a functional stationary ergodic process (see Laïb and Louani [14] for some definitions and examples). When the scale parameter is unknown, the robust estimator may be constructed based on two steps. Firstly, we estimate the scale parameter by the conditional median of the absolute deviation from the conditional median, that is,where is the conditional distribution of given that and is the median of the conditional distribution. Then, for , the kernel estimator , where is given bywhere is a kernel function and is a sequence of positive numbers which goes to zero as goes to infinity. Next, the kernel estimator , of the robust regression , is the zero, with respect to , of the equationwhere
3. Notation, Conditions, and Comments
Throughout the paper, when no confusion is possible, we will denote by for some strictly positive generic constants. stands for a fixed point in , denotes a fixed neighborhood of and we set and , for We state the following conditions: A1. The function is a continuously differentiable function, being strictly monotone and bounded w.r.t. the second component, and its derivative is bounded and continuous at uniformly in A2. There exists a nonnegative differentiable function and a nonnegative function such that where A3. The function satisfies a Lipschitz condition of order one; that is, there exists a strictly positive constant such that A4. The function satisfies a Lipschitz condition of order one; that is, there exists a strictly positive constant such that A5. The functions and are bounded functions on such that and Moreover, is a continuous function in a neighborhood of A6. (i) is a continuous function of in a neighborhood of Furthermore, it satisfies the following equicontinuity condition:(ii) is symmetric around and a continuous function of for each fixed . A7. The kernel is a differentiable function supported on Its derivative exists and is such that there exist two constants and with for A8. The function is a continuous function in a neighborhood of A9. The bandwidth satisfies and as A10. The sequence is such that , and
Comments on conditions A1-A10:(1)A1 keeps the same conditions on the function given by Collomb and Härdle (1986) and Boente and Rodriguez [15] in the multivariate case. We point out that the boundness required to the score function can be dropped by using the classical truncated method (see Laïb and Ould Saïd [5] for more details).(2)A2 keeps the same conditions given by Attouch et al. [16].(3)Conditions A3-A4 are regularity conditions that characterize the functional space of our model and are needed to evaluate the bias term in our asymptotic properties.(4)Conditions A5-A6 state regularity conditions on the marginal density of and on the conditional distribution function which imply that, for any compact set and that is a continuous function of (5)A7–A10 are technical conditions imposed for the brevity of proofs. The function defined in A9 intervenes in all asymptotic behavior, in particular in the asymptotic variance term. With a simple algebra, it is possible to specify this function in the above examples by where is Dirac function.
4. Main Results
The following result in Proposition 1 ensures uniform consistency of the regression function , when the smoothing is based on either local medians or local M-smoothers, while Theorem 1 deals with the asymptotic normality of the proposed estimator. The proofs of these asymptotic results are postponed to the appendix.
Proposition 1. Assume thattohold. Moreover, assume thatholds for kernel weights and thatandhold for nearest neighbor with kernel weights. Then, for any compact set(a)underand, we have that(b)if, in addition,have a unique median atwe have that
We will now set
Lemma 1. Under A1 and A6(ii) , if there exists a real constantand an increasing functionsuch thatwiththen, we have thatfor any sequencesuch thatin probability.
Theorem 1. Under A1 to A4 , A7 , and A9 , if in additionandfor anythen, we havewherewith forandmeans the convergence in distribution.
4.1. Application to Conditional Confidence Interval
An important application of the asymptotic normality result is the building of confidence intervals for the true value of given that curve A plug-in estimate for the asymptotic standard deviation can be obtained using the estimators and of and , respectively. We get
Then, can be used to get the following approximate confidence interval for where denotes the quantile of the standard normal distribution. Here, we point out that the estimators and can be estimated by
We estimate and by and .
This last estimation is justified by the fact that, under A2, A7, and A9, we have (see Ferraty and Vieu (2002, p. 44))
Then, by the asymptotic normality result in Section 4, we have
as
5. Simulation Study
The purpose of this section is to apply our method to an example of simulated data. More precisely, our main aim is to compare the sensitivity to the outliers of the classical kernel regression estimator (CKE: ; see Ferraty and Vieu [17]; the kernel robust estimator (KRE: ; see Azzedine et al. [9]); and the equivariant robust regression estimator see Boente and Vahnovanb [13] (ERE: associated with ; see Boente and Vahnovanb [13]). This score function is known in the literature as function. In this example, we consider the functional variable in the interval , and we takewhere and is Bernoulli random variable distributed. We carried out the simulation with 200 samples of the curve which is represented in Figure 1.

The scalar response variable is defined by , where is the nonlinear regression model, and , where is . The selection of the bandwidth is an important and basic problem in all kernel smoothing techniques. In this simulation, for both methods, the optimal bandwidths were chosen by the cross-validation method on the nearest neighbors, where is the bandwidth corresponding to the optimal number of neighbors obtained by a cross-validation procedure:withwhere is the leave-one-out-curve estimator of the CKE, KRE, and ERE (we refer the reader to Ferraty and Vieu [2] for more details). We choose the quadratic kernel:
Another essential point for ensuring a good behavior of the method is using a semi-metric that is well adapted to the kind of data that we have to deal with. Here, we used the semi-metric defined by the -distance between the first derivatives of the curves (for further discussion, see Ferraty and Vieu [2]).
This choice is motivated by the regularity of the curves To compare these two methods, we split the data randomly into two subsets: training sample and test sample for measuring the forecasting accuracy of the responses.
To verify the theoretical results, it is possible to visualize the data histogram and then compare its shape to the normal density for a fixed (we carry out 100 independent replications).
The histogram of is almost symmetric around zero point and well-shaped like the standard normal density. Thus, the simulation results indicate that obeys the standard normal law when n is large (see Figure 2).

In order to construct confidence bands, we proceed using the following algorithm:
Step 1. For each in the test sample, we calculate the estimator using the training sample.
Step 2. For all , we define the confidence bands bywhere is the quantile of a standard normal distribution.
Step 3. We present our results by plotting the extremities of the predicted values versus the true values and the confidence bands.
The results are presented in Figures 3 and 4, where the solid black curve connects the true values and the dashed blue curve connects the lower and upper predicted confidence interval (IC). Clearly, Figure 3 shows the excellent behavior of our functional forecasting procedure for the ERE method in the absence of outliers.
The performance of the estimators is described by the mean squared error (MSE):where is the length of testing sample , and means the regression estimators of CKE, KRE, and ERE calculated at the point . The main feature of our approach is illustrated in the second case when we perturb the data by introducing some outliers as indicated below. In this part, following the same approach in Sinova et al. [18], let be independent sequences of random variables as in Bernoulli (with parameter ) and contamination size constant be either 5 or 25.
The data generating model is given byIn all these cases, we arrived at the same conclusion. Usually in the presence of outliers, the ERE regression shows better behavior than that of the CKE and KRE methods. Even if the MSE of the three methods increases substantially with the and with the value of contamination size constant M, it remains very low for the ERE method. The results are given in Table 1 and Figure 4 where we only present the case of , because the results are very similar to the other cases.


6. Real Data Analysis
We focus now on the comparison of our estimator ERE () with CKE () and KRE () estimators for some spectroscopic datasets. For instance, let us consider a sample of 108 soil samples measured by near-infrared reflectance (NIR). Each sample is illuminated by a light beam at 1050 equally spaced wavelengths in the range of 400–2500 nanometres (nm) with a 2 nm resolution. The dataset is available at http://www.models.kvl.dk/NIRsoil. For each wavelength and each soil sample the absorption of radiation is measured. The th discretized spectrometric curve is given by Figure 5 displays the 108 NIR reflectance spectra curves, where the red ones correspond to the outlier curves.

Moreover, to determine the chemical and microbiological properties of soil, ergosterol concentration was determined through High-Performance Liquid Chromatography (HPLC), which is taken in the following as a response variable . The latter variable is affected by the presence of some outliers as shown in Figure 6.

In this application, we use the MAD-median rule (cf. Wilcox [19] for detecting outliers). Recall that this method will refer to declaring an outlier ifwhere is the sample median and MAD is the median absolute deviation given by
Moreover, is taken to be : the square root of the quantile of a chi-squared distribution with one degree of freedom. We applied this method to our sample data and the result shows that the MAD-median method identifies 21 outliers. The goal now is to predict the ergosterol concentration from the observation of a new spectrometric curve. To this end, one observes pairs where (resp., ) is the th spectrometric curve (resp., response). The prediction problem is very simple and can be formulated via the model for . Then, we split the observations into two parts: one is the training sample containing for modeling, and the other is which is a testing sample for measuring the forecasting accuracy of the responses. In the modeling procedures, we choose the asymmetrical quadratic kernel function and the semi-metric of the first derivative. As in all smoothing methods, the choice of the smoothing parameter has a crucial role in the computational issues. In this illustration, we use the cross-validation procedure described in Rachdi and Vieu (2007) for which the bandwidth is chosen via the following rule:where is the leave-one-out-curve estimator of the CKE, KRE, and ERE. Finally, we predict the by the three methods above and check the performance of both models by computing the mean squared prediction error (MSE), and the relative mean squared prediction error (RMSE):where means the regression estimators of CKE, KRE, and ERE calculated at the point . Then, we display in Table 2 and Figure 7 these errors where it appears clearly that the ERE is quite more performant than the KRE and CKE. That is, the classical kernel method is susceptible to the presence of outliers.

(a)

(b)
Next, the obtained prediction results are shown in Figure 8, while we give in Figure 9 the predictive intervals of the concentrations for the ergosterol concentration in the sample test. Figure 9 also shows clearly a good behavior of the ERE estimator compared to the KRE and KRE ones. It appears in the band of confidence intervals which is very large for the CKE (0 to 200) and KRE (60 to 120) as compared to the ERE (80 to 100). This conclusion shows the good performance of our asymptotic normality.


7. Conclusion
This work generalizes the results of Boente et al. [13] in real response random variables to the functional response random variables. Our results are applied to derive asymptotic normality of the predictor estimate and build conditional confidence intervals. The results were obtained under sufficient standard conditions that allow one to explore different structural axes of the subject, such as the functional naturalness of the model and the data as well as the robustness of the regression function.
Appendix
Proof .of Proposition 1. To prove Proposition 1, the authors begin by fixing some notations. If for any measurable , note that where(a)Following Theorem 3.3 in Boente and Fraiman [20], we will only need to show that Theorem 3.1 or 3.2 from Boente and Fraiman [20] entails the following: where with and being defined in (A1) and (A2), respectively. The weights Wi,n are the kernel weights given by (A.3) or the nearest neighbor with kernel weights Note that (A.6) can be derived for kernel weights using Proposition 2 in Collomb [21]. Now, (6) follows from A5 and the inequality where and (b)The equicontinuity condition required in and the uniqueness of the conditional median imply that is a continuous function of , and thus for any fixed , the function will also be continuous as a function of Given let be such that Then, from the uniqueness of the conditional median and (A.9), we get thatWrite and The continuity of and together with (A.10) and (A.11) implies that and thus Since (6) holds, let be such that , and for any Thus, for large enough, we haveTherefore, for we havewhich implies thatTherefore, which concludes the proof.
Proof .of Lemma 1. For each fixed , the authors defineThe proof will be complete if we showand the analogous result with replaced by Equation (A.17) follows easily from the dominated convergence theorem. In order to show (13), it is enough to prove that the sequence of random variables on the space of continuous function on is tight. According to Theorem 12.3 of Billingsley [22], it suffices to verify the following:(i)The sequence is tight.(ii)There exist constants and and a nondecreasing continuous function on such thatholds for all and large enough. Since (i) follows:As in lemma A of Fraiman [23], the authors have that for In addition,Then,which implies that Finally, a similar argument shows that the same result holds for
Proof of Theorem 1. For the authors consider the quantity , and using Taylor's expansion of order one, the authors get the following:withwhere is an intermediate point. It is enough to show the following:(a) in probability as .(b), where To prove (a), the authors consider the following decomposition:Concerning the first term, observe that and because is continuous at uniformly in , the use of the convergence in probability of to 1 shows that the first term of (15) converges in probability to 0.
However, the limit of the second term is obtained by evaluating the bias term of By stationarity, we haveUnder A4 and the boundness of , the authors getFrom this, it follows that the proof of (a)(b)can be deduced from Lemma 1,(c)follows from using Taylor's expansion of order two. Effectively, denote where The authors have the following expansion with and being intermediate points. Using the boundness of the Lipschitz continuity of , and the continuity of together with the consistency of the scale estimator, the authors get that for
On the other hand,By conditioning with respect to real variable the authors getThe integration w.r.t. the distribution of the real variable shows thatIt follows from the usual change of variables thatThen,
In order to prove (d), let Therefore, the main point is to calculate the following limit:By definition of the authors have Accordingly,It suffices now to show thatIndeed, a simple calculation gives usMoreover, by (with ) the authors haveThen,Consequently, (20) holds.
On the other hand, for the first term of the right hand side of (19), by using analogous arguments to those considered to derive (20), we show thatIndeed, by (A.1), we can easily getFinally, again by (A.3) (with ), we getNext, by using the distribution of the real variable the authors can show that, under (A.2) and (A.4) (see Ferraty and Vieu (2006, p. 44)),It follows thatThen, using (20)–(22), we get
Data Availability
The real data that support the findings of this study are available at http://www.models.kvl.dk/NIRsoil.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The authors thank and extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through the Research Groups Program under grant no. R.G.P. 1/177/43.