Abstract

Robust regression is an important iterative procedure that seeks analyzing data sets that are contaminated with outliers and unusual observations and reducing their impact over regression coefficients. Robust estimation methods have been introduced to deal with the problem of outliers and provide efficient and stable estimates in their presence. Various robust estimators have been developed in the literature to restrict the unbounded influence of the outliers or leverage points on the model estimates. Here, a new redescending M-estimator is proposed using a novel objective function with the prime focus on getting highly robust and efficient estimates that give promising results. It is evident from the results that, for normal and clean data, the proposed estimator is almost as efficient as ordinary least square method and, however, becomes highly resistant to outliers when it is used for contaminated datasets. The simulation study is being carried out to assess the performance of the proposed redescending M-estimator over different data generation scenarios including normal, t-distribution, and double exponential distributions with different levels of outliers’ contamination, and the results are compared with the existing redescending M-estimators, e.g., Huber, Tukey Biweight, Hampel, and Andrew-Sign function. The performance of the proposed estimators was also checked using real-life data applications of the estimators and found that the proposed estimators give promising results as compared to the existing estimators.

1. Introduction

Robust regression is an alternative method to the ordinary least square (OLS) regression when the basic assumptions are violated by the nature of the data. The method of OLS requires several assumptions to fit a regression line efficiently, but it produces very poor estimates of the regression coefficients when the same assumptions are not fulfilled, and consequently, the residuals become very large leading to inflated standard errors, which can seriously distort model predictions. Thus, the confidence interval becomes wider, and the estimates of the regression coefficients are no longer asymptotically consistent. Observations that bias the estimates of the parameters are called bad leverage points, whereas observations that lie along the predicted line are called good leverage points.

During estimation of the regression coefficients, if assumptions of the OLS are violated, then this problem is often fixed by using the transformation techniques. The transformed variables sometimes eliminate the effect of the influential outliers, which can affect the significance of the regression coefficients that can also lead to incorrect predictions. Under these situations, robust regression methods that are resistant to outliers become the only best choice left to fit the regression line and find efficient estimates of the regression coefficients.

A regression model is generally defined aswhere the dependent variable Y and the vector of true residuals ε are n x 1, and the design matrix X is n × p. Write for an estimate of β, andfor the corresponding fitted residuals.

The least squares method is commonly used for parameter estimation of the regression model under certain assumptions; for example, the errors term may follow normal distribution with zero mean and constant variance, i.e., eiN (0, δ2). The basic notion of the least square principle for parameter estimation is to minimize the sum of squared residuals. For normal data, the least square method estimation is defined bywhere represents the vectors of residuals. However, this estimator is very sensitive to bad observations and to even a small departure of the data points from normality. Therefore, statisticians and data-scientists have been working to introduce new robust methods that are highly resistant to outliers.

While numerous robust methods have been proposed in the literature that are resistant to outliers. For example, some of the popular robust regression estimators are presented here: L1-estimator, M-estimator, R-estimators, GM-estimator, RM-estimator, S-estimator, LMS and LTS, MM; τ-estimator and redescending M-estimators [1, 2]. However, there is no single method that can outperform all the methods in all outlying scenarios. One method may be good and perform well for one outlier scenario but may not be good for other outlying situation.

Keeping in view the weaker performance of the available robust estimators, an attempt has been made in this paper to introduce a novel redescending M-estimator that outperforms the existing robust methods in terms of efficiency and robustness, which gives promising results. The main contribution in this paper is to propose a new redescending M-estimator that has the potential to perform well in almost all kinds of contaminated datasets without compromising on its efficiency.

2. Literature Review

In this section, we present and discuss all those robust methods that have been introduced in the literature by different researchers, with the main objective being coping with the problem of outliers that may be vertical outliers or leverage points. The details of these methods are as follows.

2.1. L-Estimators

L-estimators are computed by using some linear combinations of order statistics. The first type of this estimator is called L-estimation method. It is more resistant than the classical ordinary least square and is known as least absolute deviation (LAD) regression, which is also called L1-regression, because it minimizes the sum of absolute deviations. Least absolute value is a very simple and easy procedure towards bounded influence of robust regression. The method of least square regression, which minimizes the sum of square of regression, also fits the definition of L-estimators and is sometimes called as L2-norm. Other types of L-estimators are least median of square (LMS) and least trimmed square (LTS). However, a brief discussion on L-estimators is given as follows.

2.2. Least Absolute Deviation (LAD) Estimator

The LAD estimator was introduced by Edgeworth in 1887 [3], which is more resistant to outliers, specifically when they appear in the direction of dependent variable y. The estimates of the regression coefficients can be obtained by minimizing the sum of absolute value of the residuals. The mathematical form of the least absolute value regression is defined as

The objective function of the LAD estimator denoted by is a more general form of the quantile regression and can be written aswhereand ω is the quantile, which needs to be estimated. More details on the topic of quantile regression and its application are given in Koenker and Basset [4] and Koenker & D’orey [5].

Although LAD is less affected by outliers in the direction of response variable y and performs better than OLS, yet when unusual values appear in the direction of explanatory variable x (called leverage points), the method of LAD produces very poor estimates of the regression coefficients as its breakdown point becomes zero, i.e., BDP = 0. The LAD estimates are not efficient in the case of mean, using the assumption that and the sampling variance of y for ordinary least square (OLS) is , and for LAD it is times greater than that of the variance of the OLS, i.e., . The low breakdown points in case of outliers in the direction of explanatory variable x and low efficiency of LAD are compared to the ordinary least square (OLS), which makes the LAD regression estimators less attractive than other robust regression techniques [6].

2.3. Least Median of Squares (LMS) Regression

Least median of square (LMS) was introduced by Rousseeuw and Yohai [7], which is based on the median of square residuals instead of sum of square residuals used in the ordinary least squares. The mathematical form of the least median square is given aswhere are the ordered squared residuals and where the optimal coverage is chosen as Here, p is the number of regression coefficients to be estimated including the intercept term. The LMS procedure performs well on the majority of the data points. One formal technique for expressing and quantifying this feature depends upon the breakdown point of an estimator. Usually, the breakdown point (BD) of an estimator is the minimum fraction of change in the data points that can make an estimator large. The breakdown point of the location estimate, which is called mean, is zero percent, and even a single outlier out of n can bring arbitrarily large changes. Similarly, the breakdown point for median is 50%. It can be seen that these figures persist for the regression; i.e., the breakdown point for OLS is 0%, and that for the LMS regression is 50%. Since the breakdown point of LMS estimator is 50%, therefore, it has certain disadvantages that limit its use. The maximum relative efficiency of LMS is 37% [8]; because of the convergence rate of n1/3 the influence function is not well defined [7]. Despite these disadvantages, it can be seen that LMS estimates are significant and play an important role in the calculation of more efficient estimators called MM-estimators by giving them initial estimates of the residual.

2.4. Least Trimmed Square (LTS) Estimator

Least trimmed square (LTS) estimator was proposed by Rousseeuw in 1984, which is based on the idea of trimmed sum of squared residuals that allows some observations to have potentially large residuals. The principle of LTS estimation is to minimize the trimmed sum of squared residuals. It is generally defined by the following mathematical expression:where is the subset of observations having smallest residuals with the minimum least squares objective function, and λ is the proportion of trim values. It is equivalent to searching for finding observations having smallest residuals and applying LS fit to these observations. For the value of the breakdown point for LTS estimator is 50%, i.e., BDP = 0.50 [9].

Even though the LTS estimator has more resistance to outliers, it suffers from the problem of very low relative efficiency [10]. Because of the low efficiency of the LTS estimator, it is not a desirable one. The importance of the LTS estimator cannot be denied, because it plays a significant role in the calculation of the other robust estimators like GM-estimators, which were introduced by Coakley and Hettmansperger [11]; and the residuals obtained from LTS method can be used successfully in outlier diagnostic plots.

3. Huber M-Estimators

In order to restrict the influence of outliers in a regression problem, the M estimator was introduced by Huber [12], which is obtained by minimizing a less rapidly increasing function of residuals instead of sum of squared residuals. The main objective behind its development was to make a trade-off between the efficiency of OLS and robustness of LAD estimators. M-estimator is nothing but a generalization of the OLS and LAD estimators. The method of M-estimation is based on the minimization of some function of the residuals. The robustness of the M-estimation depends upon the choice of the weight function.

If the assumptions, like linearity, homoscedasticity, and no autocorrelation between error terms of the classical regression, are satisfied, then the OLS estimator of β can be obtained by minimizing the sum of square of error by using the method of maximum likelihood estimation, i.e.,

The criterion function for M-estimator based on the minimization of the sum of square of residuals is replaced with a less rapidly increasing function of residuals, say, , i.e.,where the function is called objective function, which possesses the following properties:(i)(ii)(iii)(iv) for (v)ρ is continuous (ρ is differentiable)

The M-estimator is not necessarily scale invariant (i.e., if the errors ri= are multiplied by a constant, the new solution to eq. (8) might not be the same as the old one). The ψ function is obtained by taking the derivative of the objective function with respect to the coefficients, and then equating it to zero gives a system of k+ 1 estimating equations for the coefficients, that is,where ψ is the score function. The weight function is then obtained by dividing the Psi-function by the corresponding residuals, that is, =ψ(ri)/ri. Now, suppose that , then the above estimating equation can be written as

For computation of M-estimator, an iterative method is required to solve the above system of nonlinear equation. For this purpose, the most commonly used optimization techniques is iterative reweighted least square (IRLS) method.

3.1. Redescending M-Estimators

Redescending M-estimator was proposed by Hampel in 1972 [13], which is also known as three-part redescending M-estimator. The objective function ρ(ri) of the redescending estimator is bounded, and for large values of residuals, its psi-function ψ re-descends towards zero. Redescending M-estimators completely reject extreme outliers and are thus more robust than the M-estimators. Alternatively, if the derivative of ρ (objective function) redescends to zero, the estimators is said to be redescending M-estimator, i.e.,

The breakdown point of the redescending M-estimators approximately equals 0.5 and is considered to be high breakdown robust estimators. The score function of such high breakdown point estimators smoothly redescends to zero [14]. There are many types of redescending M-estimators, some of which are discussed as follows.

3.2. Hampel’s Three-Part Function

Hampel’s three-part redescending M-estimator was introduced by Andrews et al. [15] and was considered as a first attempt towards a redescending M-estimator. The objective function of Hampel’s estimators is a step function, which is also known as Hampel’s three part redescending estimators. The objective, Psi, and weight function of the Hampel’ estimator are as follows:

Hampel’s objective function is defined by

The corresponding redescending Ψ-function is

The weight function iswhere the parameters a, b, and c are constants, and their ranges vary from . The performance of Hampel estimator is efficient as shown in Figure 1. The score function is differentiable. However, a smooth -function was required that led to the development of Andrews’ sine function.

3.3. Andrew’s Sine Function

Andrews et al. [15] introduced a new redescending M-estimator popularly known as Andrews-sine function. This estimator is more resistant to outliers and smoothly redescending as can be observed from Figure 2. The objective function, Psi, and weight function of the Andrews-sine estimator are as follows:

The corresponding redescending Ψ-function is

The weight function is

We graph the ρ-function, Ψ-function, and the weight function in Figure 2.

3.4. Tukey’s Biweight Function

Beaton and Tukey [16] introduced a biweight function. Tukey’s biweight function is smooth and has been used successfully and extensively used in a wide variety of applications. The objective function ρ, Ψ-function, and weight function can be defined as follows:

The corresponding Ψ-function is

The weight function is

We graph the ρ-function, Ψ-function, and the weight function in Figure 3.

3.5. Iteratively Reweighted Least Squares (IRLS)

Since there is no closed form solution of robust estimator, therefore, IRLS fitting algorithm is commonly used to estimate the regression coefficients. Before finding the residuals, one needs to fit the model, which is not possible in a single step. The estimates of the regression coefficients cannot be found until the values of the residuals are known. That is why the method of iterative reweighted least square is employed. It has the following steps:(i)Regression line is fitted to the data through OLS method by setting the iteration counter at I= 0, and the initial values of the estimates of the regression parameters are found.(ii)Initial estimates of the weights can be found from the residuals which are extracted from fitting the OLS regression to the data.(iii)A weight function is then applied to the initial values of the OLS residuals to create preliminary weights, .(iv)The weighted least square minimizes and thus obtains in the first iteration at I= 0. In matrix form, with W representing the n × n diagonal matrix of individual weights, the solution is(v)The new weights can be obtained from the residuals of the initial weighted least square (WLS).(vi)The new weights are used in the next iteration, I= 2, of weighted least square (WLS) to estimate .(vii)The iteration will be stopped when estimates of stabilize from the previous iteration.

In general, at each of the q iterations, the solution is , where . The procedure of iteration continues until . The iteration should be stopped, and the solution is considered to be converged if the deviation in estimates is not more than 0.01% from the preceding iteration.

M-estimators are considered to be robust in situations when the distribution of error is heavy-tailed, the variance of error does not follow the assumption of homoscedasticity, and outliers appear in the direction of dependent variable y. M-estimators also consider that the matrix X of the model is measured without error. Under these situations, M-estimators are considered to be more efficient than OLS estimates [14].

The efficiency of the M-estimates is about 95% as compared to OLS under the assumptions of classical regression model. Although M-estimators are developed in place of the OLS especially in terms of resistance and robustness to those unusual data points, which appear in the direction of y, like least absolute value (LAV) estimators, M-estimators are not totally resistant to odd observations, because they are not those outliers that are leverage points. It is to be noted that M-estimates of location are highly robust, because they have a breakdown point (BDP) of 0.5 as well as a bounded influence function, but M-estimates for regression have these characteristics only for y but not for x, and thus their breakdown point equals zero, i.e., BDP = 0 [17]. In other words, in some situations, M-estimators perform no better than OLS [2]. The importance of M-estimators cannot be denied because of the fact that it plays a significant role in computing more robust estimates.

4. Proposed Estimator

Redescending estimators aim to minimize where is even, smooth, differentiable, and continuous function. The choice of mainly depends on keeping in view the objective of the estimators. For example, the objective function should be that function that is able to resist extreme outliers and should have bounded influence ψ–function that is zero for large residuals. Similarly, redescending estimators should have ideally differential and smooth function to maintain high efficiency along with robustness.

In this study, an attempt has been made to cover some of the drawbacks of the above mentioned redescending estimators. For this purpose, the properties of the tangent hyperbolic function were assessed, and it was found that tangent hyperbolic function satisfies all the properties associated with a good objective function as mentioned below.

Consider the following tangent hyperbolic function as the objective function:where c and ri in the above defined objective function are the tuning constant and the scaled residuals over MAD. The proposed ρ-function satisfies all the properties associated with a good objective function, namely,(i)(ii), the objective function is zero for any residual ri= 0(iii), the objective function should be monotonic(iv) for (v)ρ is continuous (ρ is differentiable)

The corresponding score function Ψ is obtained by taking the first derivative of the proposed ρ-function, that is,

Similarly, the corresponding weight function is obtained by dividing the Ψ-function by ri, and we get

The graphical representation of the objective function, score function, and weight function is presented in Figure 4.

An important characteristic of a robust estimator is the breakdown point (BDP) and efficiency, which measure the stability of a robust estimator regarding how much it is consistent against wild data points present in the dataset. A common practice to these questions is a compromise between efficiency and resistance; that is, choose an optimal estimator, which has maximum resistance with minimum loss of efficiency. In other words, avoid a robust estimator at the cost of maximum efficiency loss, nor should you choose completely nonrobust estimator with maximum efficiency, but prefer make a compromise between these two properties. The proposed Ψ-function, given in Figure 2(b), has a different mechanism as compared to Andrew’s sine and Tukey’s biweight estimators. The score function Ψ of the mean is a linear-straight line, and, therefore, it becomes a more efficient estimator provided that all the assumptions of normality are fulfilled. The longer and linear central section of the proposed Ψ-function behaves linearly for most of the central observations, where other smoothly redescending estimators fail to do so. This increased linearity, certainly, improves the efficiency as most of the central observations still satisfy the normality assumption. The Ψ-function, then, redescends slowly for large values of residuals and becomes zero for values lying outside the particular band.

The most interesting point of the proposed estimator is its weight function. The weight function of the proposed estimator assigns weights nearly equal to 1 from zero to moderate size of the residuals and then weights down the residuals, and thus the outlier gets weight that is nearly equal to zero. Figure 2(c) shows graph of the proposed weight function.

5. Simulation Study

To assess the overall performance of the proposed estimator and the existing methods in the literature, an extensive simulation study is carried out. For comparison purposes, datasets have been simulated from different distributions and levels of contamination in the datasets. A simulation study is carried out by considering different factors that can potentially influence the model fit and the outcomes of the competing methods. For the purpose of simulation study, 1000 datasets were generated, and the proposed estimator, called Ali redescending M-estimator, is utilized along with Huber, Tukey biweight, Andrew’s sine, Hampel, and OLS estimator. For comparison purposes, total mean square error is calculated for all the replications in the simulation study by using the following formula:

The simulated datasets were generated from well-known distributions, of which two are heavy tailed distributions. The following simulation scenarios were considered here in this study:(i)Generating both from standard normal distribution, i.e., (ii)Student’s t-distribution, i.e., (iii)Double exponential distribution, i.e., .

All the datasets were either generated from the above-mentioned distributions or some contaminations were added in the form of outliers, and the results of the competing methods are presented in the following tables along with data generation scenarios and dimensions.

The three desiring properties of every robust method are bias, efficiency, and breakdown point. The results of the simulation study for the clean data set are generated from normal distribution with λ = 0% level of contamination summarized in Table 1. It can be observed from the results given in Table 1 that the OLS estimator has the lowest total mean square error (TMSE). The proposed Ali redescending M-estimator is a good competitor of OLS in terms of the smallest total mean square error and highest efficiency.

Table 2 presents the results obtained for the second scenario, where the datasets of different sizes and number of predictor variables were generated from t-distribution, and the total mean square errors were obtained for all the competing methods including the proposed estimator. It is evident from the results given in Table 2 that the proposed method outperforms all the competing methods in terms of smallest mean square error. Similarly, the results presented in Table 3 are based on simulated datasets generated from standard Laplace distribution, which is a heavy tailed distribution with different sample sizes and number of predictors. It can be observed from the given results that the performance of the proposed estimators based on lowest total mean square error is better than the rest of the mentioned methods. Table 4 presents the total mean square error of the proposed estimator along with competing methods based on datasets generated from normal, where 90% of the observation are normal, and 10% consist of outliers in the response as well as predictors. From the given results, it is evident that the proposed method performs well than the rest of the methods having lowest TMSE in almost all the data generation scenarios with different sample sizes and predictors. From Table 5, it is evident that the proposed methods are more efficient having lowest TMSE as compared to the competing methods for the simulated datasets, where 80% of the observations have been generated from standard normal distribution, and the remaining 20% were generated as outliers from the same distribution.

The results presented above show that the performance of the proposed method is promising for higher dimension, which means that as the sample size increases, the efficiency of the proposed method also increases, and the TMSE decreases. Similarly, when the number of predictors increases, the performance of the proposed methods becomes better, which testifies the performance of the proposed method for higher dimensional simulated datasets. From the simulations results given in Table 6, where 80% of the observations of the explanatory and response variables were simulated from standard normal and 20% from the same distribution with different location-scale parameters as mentioned, it is evident that the performance of the proposed Ali estimator is more efficient that the other competing methods with smallest total mean square error. Moreover, the proposed estimator has shown similar performance when the rate of contamination of the outliers increased from 20% to 40% in the simulated data as shown in Tables 79. It is also clearly evident from the results in these tables that the performance of the proposed estimators has improved from smaller sample size to large and from lowest to high-dimensional datasets.

6. Applications on Real Datasets

Here, in this section, the performance of the proposed estimator will be assessed using real world datasets in order to check its performance along other competing methods.

6.1. Brownlee’s Stack Loss Data

The Stack loss data set is a well-known data set taken from Brownlee [18]. It consists of the examination of a plant for the oxidation of ammonia in the production of nitric acid. There are 21 observations in the dataset and three explanatory variables, that is, air flow (X1), cooling temperature (X2), and acid concentration (X3), and the dependent variable is stackloss (Y).

The analysis results presented in Table 10 give estimates of the regression coefficients and their corresponding sum of squares computed by OLS and the existing robust methods including the proposed redescending M-estimator. The estimates of the regression coefficients and fitted values obtained by the proposed estimator are promising as compared to the competing robust methods as shown in the table. Similarly, the residuals computed from the model fitted through the proposed estimator are smaller than the residuals computed from the models fitted through other redescending M-estimators that lead to smaller sum of square residuals. Hence, from the simulation study, as well as real data applications, it can be concluded that the performance of the proposed estimator is better than the redescending M-estimators considered in the study.

7. Conclusion

The proposed redescending estimator is more continuous, smooth ,and linear in the middle, and, therefore, according to Winsor’s principle, it satisfies almost all the properties of a good estimator in the presence of outliers. To assess the overall performance of the proposed estimator, a simulation study was conducted in different data generation scenarios, for example, normal distribution, t-distribution, and Laplace distribution along with different contamination scenarios in the response as well as explanatory variables. It was found from the simulation results that when the dataset is normal, the performance of the proposed estimator is almost the same as that of OLS in terms of total mean square error, but it outperforms the existing robust estimators. In case of contaminated datasets (that is, when outliers are present in response as well as explanatory variable), the proposed estimator was found to be the best among all the estimators for all sample size “n” and number of predictors “p” with different level of contamination in the datasets. Similarly, it can also be concluded from the simulation results that the performance of the proposed estimator has improved as the sample size increases while maintaining its robustness higher as compared to the competing estimators. Moreover, the efficiency of the proposed estimator was found to be higher than the rest of the estimators. The same results were also found in the real data applications of the proposed estimator as compared to the existing estimator. Finally, it was concluded from the simulation study, as well as real data applications, that all robust estimators perform well for all levels of contamination in response, but their performance gets poor in case of leverage points. However, the proposed estimator was found to perform better than all the redescending estimators considered in this study.

Data Availability

To replicate the results of this paper, the data are available from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.