Robust Regression-Ratio-Type Estimators of the Mean Utilizing Two Auxiliary Variables: A Simulation Study

Zaman, Tolga; Dünder, Emre; Audu, Ahmed; Alilah, David Anekeya; Shahzad, Usman; Hanif, Muhammad

doi:https://doi.org/10.1155/2021/6383927

Mathematical Problems in Engineering

On this page

Abstract Introduction Discussion Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Robust Estimation Methods in the Presence of Extreme Observations

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 6383927 | https://doi.org/10.1155/2021/6383927

Robust Regression-Ratio-Type Estimators of the Mean Utilizing Two Auxiliary Variables: A Simulation Study

Tolga Zaman,¹Emre Dünder,²Ahmed Audu,³David Anekeya Alilah,⁴Usman Shahzad,⁵and Muhammad Hanif⁵

Academic Editor: Bekir Sahin

Received22 May 2021

Accepted28 Aug 2021

Published06 Sept 2021

Abstract

Many authors defined the modified version of the mean estimator by using two auxiliary variables. These proposed estimators highly depend on the calculated regression coefficients. In the presence of outliers, these estimators do not give satisfactory results. In this study, we improve the suggested estimators using several robust regression techniques while obtaining the regression coefficients. We compared the efficiencies between the suggested estimators and the estimators presented in the literature. We used two numerical examples and a simulation study to support these theoretical results. Empirical results show that the modified ratio estimator performs well in the presence of outliers when adopting robust regression techniques.

1. Introduction

The classical ratio estimator is the most usual estimator of when the correlation between study variable and auxiliary variables is positively high. However, when there are outliers in dataset, classical estimators perform poorly by decreasing their efficiency. As a result, the following studies in ratio estimators are available in the literature to lessen the detrimental impact of outlier data. In ratio estimators, Kadilar et al. [1] introduced using Huber-M estimate instead of least squares estimation (LSE). Noor-ul-Amin et al. [2] proposed to use Huber estimate, instead of LSE under double sampling. In ratio estimators, Zaman and Bulut [3] improved the estimators by applying robust regression coefficients. Zaman [4] provided estimators combining ratio estimators for utilizing the ratio estimators given in Zaman and Bulut [3]. Subzar et al. [5] provided ratio estimators to predict by using various robust regression techniques. Shahzad et al. [6] presented some ratio estimators to predict by using of the Zaman and Bulut [3] ratio estimators for several cases of missing observations. Zaman and Bulut [7] presented ratio estimators by considering robust regression techniques and robust covariance estimations for stratified random sampling. Subzar et al. [8] presented various ratio estimators by considering robust regression tools. Ali et al. [9] provided robust regression-type estimators of the mean for simple random sampling. Grover and Kaur [10] improved the various ratio-type estimators by considering robust regression tools. In this paper, we are suggested to use of some robust regression estimates, instead of LSE, for the improved estimator of by considering two auxiliary variables for simple random sampling.

In the next section, we will go over ratio-type estimators for for simple random sampling, as well as their MSEs. In Section 4, we deduce the properties of the suggested estimators. Section 5 compares the efficiency of the estimators offered by Kadilar and Cingi [11] and the suggested estimators based on the MSEs. An empirical study utilizing two datasets and a simulation study are conducted, and we obtain satisfactory results, both theoretically and numerically. The numerical results are presented in Sections 6 and 7, respectively. We provide conclusion in the last section. In addition, the current study relies on robust estimates for two auxiliary variable studies for simple random sampling.

2. Traditional Ratio Estimator

Abu-Dayyeh et al. [12] provided the following estimator of utilizing two auxiliary variables for simple random sampling and assuming that and are known:where and are real numbers.

Considering the ratio estimator presented in (1), Kadilar and Cingi [11] provided the following estimator:

The MSE equation of the estimator given in (2) was obtained as follows:where , , , and .

and are obtained by the LS estimate, , and are the variances of population of and , respectively, and and are the covariance of population between and and between and , respectively [11].

3. Robust Regression Methods

Here, we describe some important and famous robust regression methods.

3.1. Huber-M Estimation

Huber [13] proposed an estimation by using different ρ functions known as M-estimators. This estimate is based on minimizing another function of outliers instead of error squares .

The objective function of M-estimation is presented asand is a symmetric function of outliers.

Huber’s function is designed as

A derivative of the function presented in (5) is equal in the following equation:where is sign function and specified as . For Huber’s estimate, and a constant value [14].

3.2. S Estimation

Rousseeuw and Yohai [15] proposed the regression estimates associated with M-scales to S estimation. S estimate is based upon the residual scale of M estimate. The estimate uses the residual standard deviation to tackle the weaknesses of the median. Function to minimize is given as follows:where in which and [16].

3.3. Least Trimmed Squares Estimation (LTS)

In LTS estimate, initially squared error terms are sorted. Then, the sum of the first of sorted error terms is taken, and function to minimize is given as follows:where and is the number of observation.

3.4. Least Median of Squares Estimation (LMS)

LMS estimate is improved by Rousseeuw and Leroy [17]. In LMS, the median of error squares is minimized. The following equation is minimized:

The estimate is robust against unusual observations in the direction of both and , and its breakdown point is [17]. For more detailed information about robust regression techniques, Zaman and Bulut [3] work can be investigated.

4. Suggested Estimators

In the presence of unusual observations, we propose utilizing the following 4 robust ratio estimators based on LTS, S, LMS, and Huber-M estimations instead of the ratio estimators stated in (2):

The MSE equations of the suggested estimators where are assumed to be the same as the expression for MSE in (3), but it is clear that and in (3) should be substituted by and , whose values as computed by LTS, S, LMS, and Huber-M estimates, respectively. The MSE equations for the suggested robust regression-ratio-type estimators belonging to LTS, S, LMS, and Huber-M estimates are obtained as follows:where and are obtained from LTS, S, LMS, and Huber-M estimates, respectively (k = LTS, S, LMS, and Huber-M).

5. Efficiency Comparisons

We compare the MSE equation for the traditional estimator, presented in (3), with the MSE equations for the suggested robust regression-ratio-type estimators, given in (14), to derive the conditions for which the suggested estimators will perform better than traditional estimator for two auxiliary variables for simple random sampling.

The suggested robust regression-ratio-type estimators in (10)–(13) perform better than the ratio estimator in (2) when condition (15) is satisfied.

6. Numerical Illustrations

In this part, we performed numerical examples on two real datasets. The first dataset (Education) was collected to model the expenditure for public education [18]. The second dataset (Crime) was used to predict the violent crimes in states [19]. These datasets were previously used for investigating the robust techniques in the literature. Education dataset exists in robustbase package in R software [20].

We employed four robust estimators: LTS, LMS, S, and Huber-M, respectively. We compared the efficiency of robust techniques with LSE. While assessing the performance of the estimators, we used the efficiencies relative to LSE. We used R programming language in the implementation phase [21]. We utilized MASS and robustreg packages for robust regression analysis [20, 22]. We obtained the MSE values for each estimator for performance evaluation.

The definitions of the variables existing in datasets are shown in Table 1. The datasets contain two auxiliary and one study variables. Table 2 indicates the descriptive statistics for the real datasets.

In Table 3–6 , the covariance and correlation matrices of the variables are given for both datasets. The high positive correlation values satisfy the condition of the applicability for the ratio estimators.

We calculated the Mahalanobis distances for checking the existence of potential outliers. For each observation “i,” Mahalanobis distances are calculated as follows:where “” shows the location matrix and “” indicates the covariance matrix. The cut-off value is where p is the number of variables. The term represents the error level. The observation is considered as the potential outlier when exceeds the cut-off value.

We used minimum covariance determinant (MCD) estimators for avoiding the masking effect [23]. Figures 1 and 2 represent the Mahalanobis distance plots for the real datasets. The cut-off value is given with a straight line in plots. Clearly, we can see that Education and Crime datasets include some potential outliers. Also, these two datasets were used to evaluate the robustness issues in the literature.

Table 7 indicates the regression coefficients which were obtained using the robust estimates and LSE. The coefficients are rather different from LSE for robust estimates. Although there is a positive correlation between and in Education dataset, is negative. This case shows the corruptive effect of outliers in dataset. Robust estimates adroitly overcome this problem, and all regression coefficients are positive for each robust estimate.

We use the MSE values of the conventional and suggested robust regression-ratio-type estimators, as specified in Sections 2 and 4, to calculate the relative efficiency of each suggested robust regression-ratio-type estimator in (10)–(13) in comparison to the classical estimators in (2), utilizing the formulae:

Table 8 denotes the performance results of the estimators. According to efficiencies, the suggested robust regression-ratio-type estimator is better than the estimator presented by Kadilar and Cingi [11]. Especially the suggested robust regression-ratio-type estimators based on S and LTS estimates produced the lowest MSE values for Education and Crime dataset, respectively. The classical estimator has the highest MSE value when comparing with four robust regression-ratio-type estimators in both two datasets. These results are not surprising because condition (15) is satisfied.

7. Simulation

A simulation study is performed to evaluate the efficiencies of the suggested robust regression-ratio-type estimators. Epilepsy dataset is used for the simulation [24]. The dataset include two auxiliary and one study variable. There are observations in epilepsy dataset. The purpose of this dataset is to predict the epilepsy attacks. Epilepsy dataset exists in “robustbase” package of R programming language.

The description of the variables for epilepsy dataset is shown in Table 9. Sum of is the study, and the other variables are auxiliary variables.

Figure 3 demonstrates the Mahalanobis distance plot for epilepsy dataset. The distances are obtained similar to the previous application section. Obviously, we can see that this dataset contains some possible outlier observations.

We randomly selected sample from the datasets for 10000 times randomly and estimated the population means using the traditional and proposed estimators. We computed the MSE equation as follows:where indicates the estimation of mean for and represents the priorly known population mean of the study variable. In Table 10, we reported the MSE ratios of the suggested robust regression-ratio-type estimators with respect to the traditional estimator for each dataset. These values are obtained using (18). Numerical results were conducted for the sizes of sample . Computations were run in R software.

Table 10 denotes the simulation results. According to results, the suggested robust regression-type estimators apparently outperform the estimator presented in Kadilar and Cingi [11] in all sample sizes. In most cases, the suggested robust regression-type estimator based on LTS estimate has the lowest MSE value. As the sample size grows, all estimators produce lower MSE values. Generally, the suggested regression-type estimators based on robust techniques perform better than traditional estimator and overcome the outliers problem. These simulation findings support the theoretical results in Table 8.

8. Discussion

Tables 8 and 10 clearly show that proposed robust regression-ratio-type estimators for estimating by utilizing outliers data for simple random sampling that are more efficient. The estimators of (10)–(13) provide lower MSEs than the MSE of traditional ratio estimator of (3). This means that the estimators of (10)–(13) show high performance than the estimator presented by Kadılar and Cingi [11]. These results have been demonstrated theoretically and supported by both empirical and simulation results.

9. Conclusion

Traditional ratio estimators suffer from the outliers because of the distorting effect. In this study, we proposed robust regression-ratio-type estimators using several robust estimates to handle the robustness task for the estimator proposed by Kadılar and Cingi [11]. We aimed to improve the performance of the suggest estimators by adopting robust coefficients. We used two real dataset examples containing possible outliers for comparing the suggested robust regression-ratio-type estimators with traditional estimator. According to our findings, the suggested robust regression-ratio-type estimators based on all robust techniques have lower MSE values when comparing with the traditional estimator. Numerical results demonstrate that the suggested robust regression-ratio-type estimators provide more efficient results than the traditional estimator. We hope that in the future we will expand the estimators presented here to other sampling designs.

Data Availability

The data used to support this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

References

C. Kadılar, M. Candan, and H. Cıngı, “Ratio estimators using robust regression,” Hacettepe Journal of Mathematics and Statistics, vol. 36, no. 2, pp. 181–188, 2007.
View at: Google Scholar
M. Noor-ul-Amin, M. Q. Shahbaz, and C. Kadılar, “Ratio estimators for population mean using robust regression in double sampling,” Gazi University Journal of Science, vol. 29, no. 4, pp. 793–798, 2016.
View at: Google Scholar
T. Zaman and H. Bulut, “Modified ratio estimators using robust regression methods,” Communications in Statistics - Theory and Methods, vol. 48, no. 8, pp. 2039–2048, 2019.
View at: Publisher Site | Google Scholar
T. Zaman, “Improvement of modified ratio estimators using robust regression methods,” Applied Mathematics and Computation, vol. 348, pp. 627–631, 2019.
View at: Publisher Site | Google Scholar
M. Subzar, C. N. Bouza, and A. I. Al-Omari, “Utilization of different robust regression techniques for estimation of finite population mean in SRSWOR in case of presence of outliers through ratio method of estimation,” Investigación Operacional, vol. 40, no. 5, pp. 600–609, 2019.
View at: Google Scholar
U. Shahzad, N. H. Al-Noor, M. Hanif, I. Sajjad, and M. Muhammad Anas, “Imputation based mean estimators in case of missing data utilizing robust regression and variance-covariance matrices,” Communications in Statistics - Simulation and Computation, vol. 44, pp. 1–20, 2020.
View at: Publisher Site | Google Scholar
T. Zaman and H. Bulut, “Modified regression estimators using robust regression methods and covariance matrices in stratified random sampling,” Communications in Statistics - Theory and Methods, vol. 49, no. 14, pp. 3407–3420, 2020.
View at: Publisher Site | Google Scholar
M. Subzar, A. Ibrahim Al-Omari, and A. R. A. Alanzi, “The Robust regression methods for estimating of finite population mean based on SRSWOR in case of outliers,” Computers, Materials & Continua, vol. 65, no. 1, pp. 125–138, 2020.
View at: Publisher Site | Google Scholar
N. Ali, I. Ahmad, M. Hanif, and U. Shahzad, “Robust-regression-type estimators for improving mean estimation of sensitive variables by using auxiliary information,” Communications in Statistics - Theory and Methods, vol. 50, no. 4, pp. 979–992, 2021.
View at: Publisher Site | Google Scholar
L. K. Grover and A. Kaur, “An improved regression type estimator of population mean with two auxiliary variables and its variant using robust regression method,” Journal of Computational and Applied Mathematics, vol. 382, pp. 1–18, 2021.
View at: Publisher Site | Google Scholar
C. Kadilar and H. Cingi, “A new estimator using two auxiliary variables,” Applied Mathematics and Computation, vol. 162, no. 2, pp. 901–908, 2005.
View at: Publisher Site | Google Scholar
W. A. Abu-Dayyeh, M. S. Ahmed, R. A. Ahmed, and H. A. Muttlak, “Some estimators of a finite population mean using auxiliary information,” Applied Mathematics and Computation, vol. 139, no. 2-3, pp. 287–298, 2003.
View at: Publisher Site | Google Scholar
P. J. Huber, “Robust regression: asymptotics, conjectures and monte carlo,” Annals of Statistics, vol. 1, no. 5, pp. 799–821, 1973.
View at: Publisher Site | Google Scholar
J. Fox, Robust Regression: Appendix to an R and S-PLUS Companion to Applied Regression, 2002.
P. Rousseeuw and V. Yohai, “Robust regression by means of S-estimators,” Robust and Nonlinear Time Series Analysis, vol. 26, pp. 256–272, 1984.
View at: Publisher Site | Google Scholar
M. Salibian-Barrera and V. J. Yohai, “A fast algorithm for S-regression estimates,” Journal of Computational & Graphical Statistics, vol. 15, no. 2, pp. 414–427, 2006.
View at: Publisher Site | Google Scholar
P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier Detection, John Wiley & Sons, Hoboken, NJ, USA, 2005.
S. Chatterjee and A. S. Hadi, Regression Analysis by Example, John Wiley & Sons, Hoboken, NJ, USA, 2015.
A. Agresti and B. Finlay, Statistical Methods for the Social Sciences, Pearson Education, London, UK, 4th edition, 2013.
M. Maechler, P. Rousseeuw, C. Croux et al., Basic Robust Statistics R Package Version 0.92-7, 2016.
R. R Core Team, A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2017, https://www.R-project.org/.
W. N. Venables and B. D. Ripley, “Tree-based methods,” Modern Applied Statistics with S, vol. 45, pp. 251–269, 2002.
View at: Publisher Site | Google Scholar
P. J. Rousseeuw and B. C. Van Zomeren, “Unmasking multivariate outliers and leverage points,” Journal of the American Statistical Association, vol. 85, no. 411, pp. 633–639, 1990.
View at: Publisher Site | Google Scholar
P. F. Thall and S. C. Vail, “Some covariance models for longitudinal count data with overdispersion,” Biometrics, vol. 46, no. 3, pp. 657–671, 1990.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Tolga Zaman et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies