Abstract
Many authors defined the modified version of the mean estimator by using two auxiliary variables. These proposed estimators highly depend on the calculated regression coefficients. In the presence of outliers, these estimators do not give satisfactory results. In this study, we improve the suggested estimators using several robust regression techniques while obtaining the regression coefficients. We compared the efficiencies between the suggested estimators and the estimators presented in the literature. We used two numerical examples and a simulation study to support these theoretical results. Empirical results show that the modified ratio estimator performs well in the presence of outliers when adopting robust regression techniques.
1. Introduction
The classical ratio estimator is the most usual estimator of when the correlation between study variable and auxiliary variables is positively high. However, when there are outliers in dataset, classical estimators perform poorly by decreasing their efficiency. As a result, the following studies in ratio estimators are available in the literature to lessen the detrimental impact of outlier data. In ratio estimators, Kadilar et al. [1] introduced using Huber-M estimate instead of least squares estimation (LSE). Noor-ul-Amin et al. [2] proposed to use Huber estimate, instead of LSE under double sampling. In ratio estimators, Zaman and Bulut [3] improved the estimators by applying robust regression coefficients. Zaman [4] provided estimators combining ratio estimators for utilizing the ratio estimators given in Zaman and Bulut [3]. Subzar et al. [5] provided ratio estimators to predict by using various robust regression techniques. Shahzad et al. [6] presented some ratio estimators to predict by using of the Zaman and Bulut [3] ratio estimators for several cases of missing observations. Zaman and Bulut [7] presented ratio estimators by considering robust regression techniques and robust covariance estimations for stratified random sampling. Subzar et al. [8] presented various ratio estimators by considering robust regression tools. Ali et al. [9] provided robust regression-type estimators of the mean for simple random sampling. Grover and Kaur [10] improved the various ratio-type estimators by considering robust regression tools. In this paper, we are suggested to use of some robust regression estimates, instead of LSE, for the improved estimator of by considering two auxiliary variables for simple random sampling.
In the next section, we will go over ratio-type estimators for for simple random sampling, as well as their MSEs. In Section 4, we deduce the properties of the suggested estimators. Section 5 compares the efficiency of the estimators offered by Kadilar and Cingi [11] and the suggested estimators based on the MSEs. An empirical study utilizing two datasets and a simulation study are conducted, and we obtain satisfactory results, both theoretically and numerically. The numerical results are presented in Sections 6 and 7, respectively. We provide conclusion in the last section. In addition, the current study relies on robust estimates for two auxiliary variable studies for simple random sampling.
2. Traditional Ratio Estimator
Abu-Dayyeh et al. [12] provided the following estimator of utilizing two auxiliary variables for simple random sampling and assuming that and are known:where and are real numbers.
Considering the ratio estimator presented in (1), Kadilar and Cingi [11] provided the following estimator:
The MSE equation of the estimator given in (2) was obtained as follows:where , , , and .
and are obtained by the LS estimate, , and are the variances of population of and , respectively, and and are the covariance of population between and and between and , respectively [11].
3. Robust Regression Methods
Here, we describe some important and famous robust regression methods.
3.1. Huber-M Estimation
Huber [13] proposed an estimation by using different ρ functions known as M-estimators. This estimate is based on minimizing another function of outliers instead of error squares .
The objective function of M-estimation is presented asand is a symmetric function of outliers.
Huber’s function is designed as
A derivative of the function presented in (5) is equal in the following equation:where is sign function and specified as . For Huber’s estimate, and a constant value [14].
3.2. S Estimation
Rousseeuw and Yohai [15] proposed the regression estimates associated with M-scales to S estimation. S estimate is based upon the residual scale of M estimate. The estimate uses the residual standard deviation to tackle the weaknesses of the median. Function to minimize is given as follows:where in which and [16].
3.3. Least Trimmed Squares Estimation (LTS)
In LTS estimate, initially squared error terms are sorted. Then, the sum of the first of sorted error terms is taken, and function to minimize is given as follows:where and is the number of observation.
3.4. Least Median of Squares Estimation (LMS)
LMS estimate is improved by Rousseeuw and Leroy [17]. In LMS, the median of error squares is minimized. The following equation is minimized:
The estimate is robust against unusual observations in the direction of both and , and its breakdown point is [17]. For more detailed information about robust regression techniques, Zaman and Bulut [3] work can be investigated.
4. Suggested Estimators
In the presence of unusual observations, we propose utilizing the following 4 robust ratio estimators based on LTS, S, LMS, and Huber-M estimations instead of the ratio estimators stated in (2):
The MSE equations of the suggested estimators where are assumed to be the same as the expression for MSE in (3), but it is clear that and in (3) should be substituted by and , whose values as computed by LTS, S, LMS, and Huber-M estimates, respectively. The MSE equations for the suggested robust regression-ratio-type estimators belonging to LTS, S, LMS, and Huber-M estimates are obtained as follows:where and are obtained from LTS, S, LMS, and Huber-M estimates, respectively (k = LTS, S, LMS, and Huber-M).
5. Efficiency Comparisons
We compare the MSE equation for the traditional estimator, presented in (3), with the MSE equations for the suggested robust regression-ratio-type estimators, given in (14), to derive the conditions for which the suggested estimators will perform better than traditional estimator for two auxiliary variables for simple random sampling.
The suggested robust regression-ratio-type estimators in (10)–(13) perform better than the ratio estimator in (2) when condition (15) is satisfied.
6. Numerical Illustrations
In this part, we performed numerical examples on two real datasets. The first dataset (Education) was collected to model the expenditure for public education [18]. The second dataset (Crime) was used to predict the violent crimes in states [19]. These datasets were previously used for investigating the robust techniques in the literature. Education dataset exists in robustbase package in R software [20].
We employed four robust estimators: LTS, LMS, S, and Huber-M, respectively. We compared the efficiency of robust techniques with LSE. While assessing the performance of the estimators, we used the efficiencies relative to LSE. We used R programming language in the implementation phase [21]. We utilized MASS and robustreg packages for robust regression analysis [20, 22]. We obtained the MSE values for each estimator for performance evaluation.
The definitions of the variables existing in datasets are shown in Table 1. The datasets contain two auxiliary and one study variables. Table 2 indicates the descriptive statistics for the real datasets.
In Table 3–6 , the covariance and correlation matrices of the variables are given for both datasets. The high positive correlation values satisfy the condition of the applicability for the ratio estimators.
We calculated the Mahalanobis distances for checking the existence of potential outliers. For each observation “i,” Mahalanobis distances are calculated as follows:where “” shows the location matrix and “” indicates the covariance matrix. The cut-off value is where p is the number of variables. The term represents the error level. The observation is considered as the potential outlier when exceeds the cut-off value.
We used minimum covariance determinant (MCD) estimators for avoiding the masking effect [23]. Figures 1 and 2 represent the Mahalanobis distance plots for the real datasets. The cut-off value is given with a straight line in plots. Clearly, we can see that Education and Crime datasets include some potential outliers. Also, these two datasets were used to evaluate the robustness issues in the literature.


Table 7 indicates the regression coefficients which were obtained using the robust estimates and LSE. The coefficients are rather different from LSE for robust estimates. Although there is a positive correlation between and in Education dataset, is negative. This case shows the corruptive effect of outliers in dataset. Robust estimates adroitly overcome this problem, and all regression coefficients are positive for each robust estimate.
We use the MSE values of the conventional and suggested robust regression-ratio-type estimators, as specified in Sections 2 and 4, to calculate the relative efficiency of each suggested robust regression-ratio-type estimator in (10)–(13) in comparison to the classical estimators in (2), utilizing the formulae:
Table 8 denotes the performance results of the estimators. According to efficiencies, the suggested robust regression-ratio-type estimator is better than the estimator presented by Kadilar and Cingi [11]. Especially the suggested robust regression-ratio-type estimators based on S and LTS estimates produced the lowest MSE values for Education and Crime dataset, respectively. The classical estimator has the highest MSE value when comparing with four robust regression-ratio-type estimators in both two datasets. These results are not surprising because condition (15) is satisfied.
7. Simulation
A simulation study is performed to evaluate the efficiencies of the suggested robust regression-ratio-type estimators. Epilepsy dataset is used for the simulation [24]. The dataset include two auxiliary and one study variable. There are observations in epilepsy dataset. The purpose of this dataset is to predict the epilepsy attacks. Epilepsy dataset exists in “robustbase” package of R programming language.
The description of the variables for epilepsy dataset is shown in Table 9. Sum of is the study, and the other variables are auxiliary variables.
Figure 3 demonstrates the Mahalanobis distance plot for epilepsy dataset. The distances are obtained similar to the previous application section. Obviously, we can see that this dataset contains some possible outlier observations.

We randomly selected sample from the datasets for 10000 times randomly and estimated the population means using the traditional and proposed estimators. We computed the MSE equation as follows:where indicates the estimation of mean for and represents the priorly known population mean of the study variable. In Table 10, we reported the MSE ratios of the suggested robust regression-ratio-type estimators with respect to the traditional estimator for each dataset. These values are obtained using (18). Numerical results were conducted for the sizes of sample . Computations were run in R software.
Table 10 denotes the simulation results. According to results, the suggested robust regression-type estimators apparently outperform the estimator presented in Kadilar and Cingi [11] in all sample sizes. In most cases, the suggested robust regression-type estimator based on LTS estimate has the lowest MSE value. As the sample size grows, all estimators produce lower MSE values. Generally, the suggested regression-type estimators based on robust techniques perform better than traditional estimator and overcome the outliers problem. These simulation findings support the theoretical results in Table 8.
8. Discussion
Tables 8 and 10 clearly show that proposed robust regression-ratio-type estimators for estimating by utilizing outliers data for simple random sampling that are more efficient. The estimators of (10)–(13) provide lower MSEs than the MSE of traditional ratio estimator of (3). This means that the estimators of (10)–(13) show high performance than the estimator presented by Kadılar and Cingi [11]. These results have been demonstrated theoretically and supported by both empirical and simulation results.
9. Conclusion
Traditional ratio estimators suffer from the outliers because of the distorting effect. In this study, we proposed robust regression-ratio-type estimators using several robust estimates to handle the robustness task for the estimator proposed by Kadılar and Cingi [11]. We aimed to improve the performance of the suggest estimators by adopting robust coefficients. We used two real dataset examples containing possible outliers for comparing the suggested robust regression-ratio-type estimators with traditional estimator. According to our findings, the suggested robust regression-ratio-type estimators based on all robust techniques have lower MSE values when comparing with the traditional estimator. Numerical results demonstrate that the suggested robust regression-ratio-type estimators provide more efficient results than the traditional estimator. We hope that in the future we will expand the estimators presented here to other sampling designs.
Data Availability
The data used to support this study are included within the article.
Conflicts of Interest
The authors declare that there are no conflicts of interest.