Minimum Covariance Determinant-Based Quantile Robust Regression-Type Estimators for Mean Parameter

Shahzad, Usman; Al-Noor, Nadia H.; Afshan, Noureen; Alilah, David Anekeya; Hanif, Muhammad; Anas, Malik Muhammad

doi:https://doi.org/10.1155/2021/5255839

Mathematical Problems in Engineering

On this page

Abstract Introduction Discussion Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Robust Estimation Methods in the Presence of Extreme Observations

View this Special Issue

Research Article | Open Access

Volume 2021 | Article ID 5255839 | https://doi.org/10.1155/2021/5255839

Minimum Covariance Determinant-Based Quantile Robust Regression-Type Estimators for Mean Parameter

Usman Shahzad,¹Nadia H. Al-Noor,²Noureen Afshan,¹David Anekeya Alilah,³Muhammad Hanif,¹and Malik Muhammad Anas¹

Academic Editor: Adnan Maqsood

Received07 Jun 2021

Accepted10 Jul 2021

Published21 Jul 2021

Abstract

Robust regression tools are commonly used to develop regression-type ratio estimators with traditional measures of location whenever data are contaminated with outliers. Recently, the researchers extended this idea and developed regression-type ratio estimators through robust minimum covariance determinant (MCD) estimation. In this study, the quantile regression with MCD-based measures of location is utilized and a class of quantile regression-type mean estimators is proposed. The mean squared errors (MSEs) of the proposed estimators are also obtained. The proposed estimators are compared with the reviewed class of estimators through a simulation study. We also incorporated two real-life applications. To assess the presence of outliers in these real-life applications, the Dixon chi-squared test is used. It is found that the quantile regression estimators are performing better as compared to some existing estimators.

1. Introduction

The use of auxiliary information in survey sampling is as ancient as survey sampling itself (Bulut and Zaman [1]). Neyman’s work [2] mentioned the early works in which auxiliary information was used. The problem of improving the efficiency of parameter estimation by the use of auxiliary information has received a lot of attention in sampling theory and practice. The common examples of such methods are ratio, product, and regression estimators. Under the simple random sampling scheme, ratio and product estimation techniques are widely used. For more details about these two estimation techniques, studies by Cochran [3], Murthy [4], Singh [5], and Shalabh and Tsai [6] can be referred. Furthermore, there is a wealth of literature on ratio estimators for population mean. Studies by Koyuncu [7], Abid et al. [8, 9], Irfan et al. [10], Shahzad et al. [11], Ali et al. [12], and Yadav and Zaman [13] are examples of such works.

Both ratio and product techniques have benefits and drawbacks. For instance, a ratio estimator is appropriate when the study and the auxiliary variables have a positive linear relationship/correlation, whereas a product estimator is appropriate when they have a negative linear relationship. This problem is solved by using the regression estimation technique, which yields significantly improved results for both positive and negative correlations. It should be noted that the typical regression estimator is based on the linear least square regression coefficient. For regression estimation, interested readers may refer to Ijaz et al. [14] and Tanış et al. [15].

The linear least squares or ordinary least squares (OLS) regression is the most conventional statistical method commonly used for parameter estimation due to its easy implementation. This method aims to reduce the sum of the squares of the differences or residuals between the observed dependent variable and the predictions made by the independent variable’s linear function. The OLS strategy produces the best estimation results for straight-line regression under the ideal circumstances revealed by OLS. On the other hand, the parameter estimates based on OLS is influenced by outliers or extreme values and, therefore, do not produce significantly productive results. The OLS-fitting threshold point is (, sample size) or , indicating that a single outlier can have a significant impact (Rousseeuw and Leroy [16]). As a result, in the presence of outliers, the mean estimation based on OLS is also affected (Zaman and Bulut [17]). To overcome this issue in sense of mean estimation, the authors used robust regression tools, robust quantile regression tools, and robust covariance matrices (see, among others, recent works of Zaman and Toksoy [18], Zaman [19], and Zaman and Bulut [17, 20] who developed robust regression techniques to monitor the effect of extreme values). In this study, we have attempted to utilize the quantile regression with minimum covariance determinant estimator-based measures of location and propose a class of quantile regression-type mean estimators.

The main parts of this study are organized as follows. First, a review of robust regression tools and the MCD-based ratio is provided. The proposed class of quantile regression-ratio type mean estimators is then introduced, as are its MSE. In addition, numerical illustrations of the existing and proposed class of estimators are included. Finally, the conclusion is provided.

2. Review of MCD-Based Ratio Estimators

In attempts to find estimators for the location parameter that are efficient across a wide range of datasets, the robust approach as the main approach has been developed (Hogg [21]). The purpose of the robust technique is to discover a single estimator that is efficient across a wide range of datasets, even if it is not exactly ideal for any population. For this article’s purposes, the following lines give a precise description of some of the robust regression tools and MCD estimator utilized by authors to overcome the issue of extreme values in mean estimation.

As robust regression tools, we considered the following:

LAD: the least absolute deviation estimator is based on minimizing the sum of absolute squared errors (SE).

LMS: the least median of squares estimator is based on minimizing the SE median.

LTS: the least trimmed squares estimator is based on applying OLS to specified initial observations of sorted SE that is; hence, their computations are not affected by extreme values.

M-estimation: the M-estimation, M stands for maximum likelihood type, is based upon the minimization of the objective function . Some of the designed formulae for the objective function of residuals are as follows:

Huber M-estimator: Huber [22] investigated the objective function with , or as

Hampel M-estimator: Hampel [23] investigated the objective function with , , and as

Tukey M-estimator: Tukey [24] investigated the objective function with , or as

MM-estimator: the MM-estimator, MM stands for modified M-estimator, is another type of a robust regression tool that is also developed in the presence of outliers. It is proposed based on combining the high resistance to outliers of S-estimators with the high efficiency of M-estimators. The MM-estimator is a regression M-estimator with a redescending function, with the initial values of the regression coefficients and the scale estimate coming from the S-estimator, which is based on a robust scale M-estimator minimization. For details, Yohai [25] may be viewed.

MCD: the minimum covariance determinant estimator is one of the earliest and most robust estimators of multivariate location and spread. Although MCD was first introduced in 1984 (Rousseeuw [26, 27]), the development of the computationally efficient fast MCD algorithm in 1999 (Rousseeuw and Van Driessen [28]) marked the beginning of its principal application. The MCD location estimate is the mean of the observations, and , for which the sample covariance matrix determinant is as small as possible. For more details, refer studies by Al-Noor and Mohammad [29], Hubert et al. [30], Bulut and Zaman [1], and Zaman and Bulut [17, 20].

Zaman and Bulut [17] incorporated traditional measures of location such as coefﬁcient of variation , coefﬁcient of kurtosis , and arithmetic mean . They calculated these characteristics through MCD estimation which is highly sensitive in the absence of normality and in presence of outliers. They deﬁned the following family of MCD-based ratio-type estimatorswhere are the population means and are the sample means when a simple random sample of size is gathered from the population. is based on robust regression tools as discussed in previous lines of the current section. Furthermore, and are either or some known population measures. The family members of are provided in Table 1. The MSE of is given bywhere and for . Furthermore, and are the unbiased variances, and be the covariance of and . All these quantities are calculated through MCD estimation as Bulut and Zaman [1].

3. Proposed Class of Quantile Regression-Ratio-Type Estimators

Quantile regression is a variant of standard linear regression that calculates the conditional median of the result variable and can be used when the assumptions of linear regression are not fulfilled. Quantiles are points in a distribution that corresponds to the rank order of the distribution’s values. The median is the value in the sorted sample that falls in the middle (middle quantile, ). Interested readers may refer to studies by Koenker and Bassett [31], Koenker and Hallock [32], Koenker [33], Hao et al. [34], and Korkmaz and Chesneau [35].

So, with the presence of outliers and based on equation (4), and are , and we propose a class of quantile regression-ratio-type estimators aswithwhere are based on MCD, is the quantile regression, and is a continuous piecewise linear function (or asymmetric absolute loss function), for quantile , but nondifferentiable at . Note that is the quantile regression coefﬁcient for variables. The MSE of the proposed family of estimators is

For the purposes of the current study, it is worth mentioning that we are using , and quantiles. We see from the consequences of the numerical study conducted in next section that utilizing the quantile regression coefﬁcients, based on these referenced quantiles, incredibly enhance the efﬁciencies of proposed estimators. Note that investigation of these five referenced quantiles leads to propose a class containing five members. For sack of readability, let us provide five members of the proposed class with their MSE in a compact form, as follows:

4. Numerical Illustration

In this section, the performance of proposed and existing estimators through two real-life applications and simulation study is presented.

4.1. Real-Life Applications

Population 1. In this dataset, “amount of nonreal estate farm loans during 1977” is taken as auxiliary variable , while “amount of real estate farm loans during 1977” is taken as study variable. Furthermore, is selected for . For remaining characteristics of the population interested readers may refer to Singh [36].

Population 2. We use “UScereals” dataset, which describes 65 widely available breakfast cereals in the USA, depending on the information available on the mandatory food label on the packet. The measurements are normalized here to a serving size of one American cup. The data come from ASA Statistical Graphics Exposition and are used by Venables and Ripley [37]. The dataset contains a number of variables. So, “grams of sodium in one portion” is taken as auxiliary variable , while “Number of calories” is taken as study variable . Furthermore, is selected for . For remaining characteristics of the population, interested readers may refer to Venables and Ripley [37].
For diagnostic checking, we should apply robust regression techniques on the referenced dataset; let us check the presence of outliers in individually through the Dixon chi-squared test for outliers presented by the Dixon [38, 39] test. An package “outliers” is used for this purpose. The results are provided in Table 2.
Table 2 provides signiﬁcant results in terms of values, hence providing clear indication of the presence of outliers in our considered datasets. In light of DT, we can say that traditional OLS is not suitable for our dataset. So, there is a need to incorporate some sort of techniques, which can provide us better results in the presence of outliers. Therefore, we apply robust and quantile regression with MCD estimation. The results associated with Populations 1 and 2 are available, respectively, in Tables 3 and 4.

5. Simulation Study

In the current subsection, an assessment of proposed and some existing estimators through Monte Carlo simulation is considered. The simulation design is organized as follows. A random variable and random variable is deﬁned as . Here, it is assumed that , and has standard normal distribution with population of size . The simple random sampling is considered for . The sampling has been replicated times. We examine empirical MSEs of , and as . The results are available in Table 5.

6. Discussion

As we see, the MSE results real-life examples and simulation study for each existing estimator based on , while their corresponding proposed estimators are based on in Tables 3–5. Each row of these tables structured as given follows:

The determination of best estimators is related to minimum values of MSE associated with each estimator.

Regarding Table 3, the MSE of proposed and existing estimators for Pop-1 associated to can be ordered as follows:

On the other hand, within five values of , quantile estimators perform best with , LAD performs best with , and all other estimators perform best with .

Regarding Table 4, the proposed estimators record the best performance among all competitors’ estimators for Pop-2, where the MSE of proposed and existing estimators associated to can be ordered as follows:

Also, within five values of , quantile estimators perform best with , LAD and LMS estimators perform best with , and all other estimators perform best with .

Based on Table 5, again the proposed estimators record the best performance among all compared estimators in this simulation study, where the MSE of proposed and existing estimators associated to can be ordered as follows:

Furthermore, within five values of , all estimators appear the best performance with .

Overall, with real data and simulation, the proposed estimators record the best or near-best performance compared to other robust regression estimators. In view of both these results, the proposed estimators can be considered as very robust and reasonable estimators.

7. Conclusion

Bulut and Zaman used robust minimum covariance determinant (MCD) estimation to create a new class of robust regression-type ratio estimators. In this article, drawing inspiration from Bulut and Zaman’s work, we propose to use quantile regression with MCD estimator-based location measures under a simple sampling scheme to introduce a class of quantile regression-type mean estimators to the assessment of the population mean with the appearance of outliers. The MSEs of the proposed estimators are also obtained. The performance of the proposed and some existing robust regression estimators are assessing through simulation and two real-life applications, where the Dixon chi-squared test is considered to assess the existence of outliers in the real-life datasets. Based on numerical comparisons, it is obvious that proposed estimators outperform or near outperform existing robust estimators across the considered variety of datasets. As a result, the proposed estimators may be of interest, and they will almost certainly increase the possibility of getting additional accurate estimates of the population mean when outliers exist. Also, these estimators may be developed under the ranked set sampling method, as given by Al-Omari [40], Al-Omari and Almanjahie [41], and Haq et al. [42].

Data Availability

The datasets used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

H. Bulut and T. Zaman, “An improved class of robust ratio estimators by using the minimum covariance determinant estimation,” Communications in Statistics—Simulation and Computation, pp. 1–7, 2019.
View at: Publisher Site | Google Scholar
J. Neyman, “Contribution to the theory of sampling human populations,” Journal of the American Statistical Association, vol. 33, no. 201, pp. 101–116, 1938.
View at: Google Scholar
W. G. Cochran, “The estimation of the yields of cereal experiments by sampling for the ratio of grain to total produce,” The Journal of Agricultural Science, vol. 30, no. 2, pp. 262–275, 1940.
View at: Publisher Site | Google Scholar
M. N. Murthy, “Product method of estimation,” The Indian Journal of Statistics, Series A, vol. 26, no. 1, pp. 69–74, 1964.
View at: Google Scholar
M. P. Singh, “Ratio cum product method of estimation,” Metrika, vol. 12, no. 1, pp. 34–42, 1967.
View at: Publisher Site | Google Scholar
S. Shalabh and J. R. Tsai, “Ratio and product methods of estimation of population mean in the presence of correlated measurement errors,” Communications in Statistics—Simulation and Computation, vol. 46, no. 7, pp. 5566–5593, 2017.
View at: Publisher Site | Google Scholar
N. Koyuncu, “Efﬁcient estimators of population mean using auxiliary attributes,” Applied Mathematics and Computation, vol. 218, no. 22, pp. 10900–10905, 2012.
View at: Publisher Site | Google Scholar
M. Abid, N. Abbas, and M. Riaz, “Improved modiﬁed ratio estimators of population mean based on deciles,” Chiang Mai Journal of Science, vol. 43, no. 1, pp. 1311–1323, 2016.
View at: Google Scholar
M. Abid, N. Abbas, H. Z. Nazir, and Z. Lin, “Enhancing the mean ratio estimators for estimating population mean using non-conventional location parameters,” Revista Colombiana de Estadística, vol. 39, no. 1, pp. 63–79, 2016.
View at: Publisher Site | Google Scholar
M. Irfan, M. Javed, M. Abid, and Z. Lin, “Improved ratio type estimators of population mean based on median of a study variable and an auxiliary variable,” Hacettepe Journal of Mathematics and Statistics, vol. 47, no. 3, pp. 659–673, 2018.
View at: Google Scholar
U. Shahzad, P. F. Perri, and M. Hanif, “A new class of ratio-type estimators for improving mean estimation of nonsensitive and sensitive variables by using supplementary information,” Communications in Statistics—Simulation and Computation, vol. 48, no. 9, pp. 2566–2585, 2019.
View at: Publisher Site | Google Scholar
N. Ali, I. Ahmad, M. Hanif, and U. Shahzad, “Robust-regression-type estimators for improving mean estimation of sensitive variables by using auxiliary information,” Communications in Statistics—Theory and Methods, vol. 50, 2019.
View at: Publisher Site | Google Scholar
S. K. Yadav and T. Zaman, “Use of some conventional and non-conventional parameters for improving the efficiency of ratio-type estimators,” Journal of Statistics and Management Systems, pp. 1–24, 2021.
View at: Publisher Site | Google Scholar
M. Ijaz, T. Zaman, H. Bulut, A. Ullah, and S. M. Asim, “An improved class of regression estimators using the auxiliary information,” Journal of Science and Arts, vol. 20, no. 4, pp. 789–800, 2020.
View at: Publisher Site | Google Scholar
C. Tanış, B. Saraçoğlu, C. Kuş, A. Pekgör, and K. Karakaya, “Transmuted lower record type fréchet distribution with lifetime regression analysis based on type I-censored data,” Journal of Statistical Theory and Applications, vol. 20, no. 1, pp. 86–96, 2021.
View at: Google Scholar
P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier Detection, Wiley Series in Probability and Mathematical Statistics, New York, NY, USA, 1987.
T. Zaman and H. Bulut, “Modiﬁed regression estimators using robust regression methods and covariance matrices in stratiﬁed random sampling,” Communications in Statistics—Theory and Methods, vol. 49, no. 14, pp. 3407–3420, 2020.
View at: Publisher Site | Google Scholar
T. Zaman and E. Toksoy, “Improvement in estimating the population mean in simple random sampling using information on two auxiliary attributes and numerical application in agricultural engineering,” Fresenius Environmental Bulletin, vol. 28, no. 6, pp. 4584–4590, 2019.
View at: Google Scholar
T. Zaman, “Improvement of modiﬁed ratio estimators using robust regression methods,” Applied Mathematics and Computation, vol. 348, pp. 627–631, 2019.
View at: Publisher Site | Google Scholar
T. Zaman and H. Bulut, “Modified ratio estimators using robust regression methods,” Communications in Statistics—Theory and Methods, vol. 48, no. 8, pp. 2039–2048, 2019.
View at: Publisher Site | Google Scholar
R. V. Hogg, “Adaptive robust procedures: a partial review and some suggestions for future applications and theory,” Journal of the American Statistical Association, vol. 69, no. 348, pp. 909–923, 1974.
View at: Publisher Site | Google Scholar
P. J. Huber, “Robust regression: asymptotics, conjectures and monte carlo,” The Annals of Statistics, vol. 1, no. 5, pp. 799–821, 1973.
View at: Publisher Site | Google Scholar
F. R. Hampel, “A general qualitative definition of robustness,” The Annals of Mathematical Statistics, vol. 42, no. 6, pp. 1887–1896, 1971.
View at: Publisher Site | Google Scholar
J. W. Tukey, Exploratory Data Analysis, Addison-Wesley, Boston, MA, USA, 1977.
V. J. Yohai, “High breakdown-point and high efficiency robust estimates for regression,” Annals of Statistics, vol. 15, no. 2, pp. 642–656, 1987.
View at: Publisher Site | Google Scholar
P. J. Rousseeuw, “Least median of squares regression,” Journal of the American Statistical Association, vol. 79, no. 388, pp. 871–880, 1984.
View at: Publisher Site | Google Scholar
P. Rousseeuw, “Multivariate estimation with high breakdown point,” in Mathematical statistics and applications, W. Grossmann, G. Pflug, I. Vincze, and W. Wertz, Eds., vol. B, pp. 283–297, Reidel Publishing Company, Dordrecht, Netherlands, 1985.
View at: Publisher Site | Google Scholar
P. J. Rousseeuw and K. V. Driessen, “A fast algorithm for the minimum covariance determinant estimator,” Technometrics, vol. 41, no. 3, pp. 212–223, 1999.
View at: Publisher Site | Google Scholar
N. H. Al-Noor and A. A. Mohammad, “Model of robust regression with parametric and non-parametric methods,” Mathematical Theory and Modeling, vol. 3, no. 5, pp. 27–39, 2013.
View at: Google Scholar
M. Hubert, M. Debruyne, and P. J. Rousseeuw, “Minimum covariance determinant and extensions,” WIREs Computational Statistics, vol. 10, no. 3, pp. 1–11, 2018.
View at: Publisher Site | Google Scholar
R. Koenker and G. Bassett, “Regression quantiles,” Econometrica, vol. 46, no. 1, pp. 33–50, 1978.
View at: Publisher Site | Google Scholar
R. Koenker and K. F. Hallock, “Quantile regression,” Journal of Economic Perspectives, vol. 15, no. 4, pp. 143–156, 2001.
View at: Publisher Site | Google Scholar
R. Koenker, Quantile Regression, Cambridge University Press, New York, NY, USA, 2005.
L. Hao, D. Q. Naiman, and D. Q. Naiman, Quantile regression, Sage Publications, Thousand Oaks, CA, USA, 2007.
M. Ç. Korkmaz and C. Chesneau, “On the unit burr-XII distribution with the quantile regression modeling and applications,” Computational and Applied Mathematics, vol. 40, no. 1, pp. 1–26, 2021.
View at: Publisher Site | Google Scholar
S. Singh, Advanced Sampling Theory with Applications. How Michael Selected Amy, Kluwer Academic Publishers, Dordrecht, Netherlands, 2003.
W. N. Venables and B. D. Ripley, Modern Applied Statistics with S-PLUS, Springer, Berlin, Germany, Third Edition edition, 1999.
W. J. Dixon, “Analysis of extreme values,” The Annals of Mathematical Statistics, vol. 21, no. 4, pp. 488–506, 1950.
View at: Publisher Site | Google Scholar
W. J. Dixon, “Ratios involving extreme values,” The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 68–78, 1951.
View at: Publisher Site | Google Scholar
A. I. Al-Omari, “Maximum likelihood estimation in location-scale families using varied L ranked set sampling,” RAIRO — Operations Research, vol. 55, pp. S2759–S2771, 2021.
View at: Publisher Site | Google Scholar
A. Ibrahim Al-Omari and I. M. Almanjahie, “New improved ranked set sampling designs with an application to real data,” Computers, Materials and Continua, vol. 67, no. 2, pp. 1503–1522, 2021.
View at: Publisher Site | Google Scholar
A. Haq, J. Brown, E. Moltchanova, and A. I. Al-Omari, “Paired doubleranked set sampling,” Communications in Statistics-Theory and Methods, vol. 45, no. 1, pp. 2873–2889, 2016.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Usman Shahzad et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies