Abstract

Robust regression tools are commonly used to develop regression-type ratio estimators with traditional measures of location whenever data are contaminated with outliers. Recently, the researchers extended this idea and developed regression-type ratio estimators through robust minimum covariance determinant (MCD) estimation. In this study, the quantile regression with MCD-based measures of location is utilized and a class of quantile regression-type mean estimators is proposed. The mean squared errors (MSEs) of the proposed estimators are also obtained. The proposed estimators are compared with the reviewed class of estimators through a simulation study. We also incorporated two real-life applications. To assess the presence of outliers in these real-life applications, the Dixon chi-squared test is used. It is found that the quantile regression estimators are performing better as compared to some existing estimators.

1. Introduction

The use of auxiliary information in survey sampling is as ancient as survey sampling itself (Bulut and Zaman [1]). Neyman’s work [2] mentioned the early works in which auxiliary information was used. The problem of improving the efficiency of parameter estimation by the use of auxiliary information has received a lot of attention in sampling theory and practice. The common examples of such methods are ratio, product, and regression estimators. Under the simple random sampling scheme, ratio and product estimation techniques are widely used. For more details about these two estimation techniques, studies by Cochran [3], Murthy [4], Singh [5], and Shalabh and Tsai [6] can be referred. Furthermore, there is a wealth of literature on ratio estimators for population mean. Studies by Koyuncu [7], Abid et al. [8, 9], Irfan et al. [10], Shahzad et al. [11], Ali et al. [12], and Yadav and Zaman [13] are examples of such works.

Both ratio and product techniques have benefits and drawbacks. For instance, a ratio estimator is appropriate when the study and the auxiliary variables have a positive linear relationship/correlation, whereas a product estimator is appropriate when they have a negative linear relationship. This problem is solved by using the regression estimation technique, which yields significantly improved results for both positive and negative correlations. It should be noted that the typical regression estimator is based on the linear least square regression coefficient. For regression estimation, interested readers may refer to Ijaz et al. [14] and Tanış et al. [15].

The linear least squares or ordinary least squares (OLS) regression is the most conventional statistical method commonly used for parameter estimation due to its easy implementation. This method aims to reduce the sum of the squares of the differences or residuals between the observed dependent variable and the predictions made by the independent variable’s linear function. The OLS strategy produces the best estimation results for straight-line regression under the ideal circumstances revealed by OLS. On the other hand, the parameter estimates based on OLS is influenced by outliers or extreme values and, therefore, do not produce significantly productive results. The OLS-fitting threshold point is (, sample size) or , indicating that a single outlier can have a significant impact (Rousseeuw and Leroy [16]). As a result, in the presence of outliers, the mean estimation based on OLS is also affected (Zaman and Bulut [17]). To overcome this issue in sense of mean estimation, the authors used robust regression tools, robust quantile regression tools, and robust covariance matrices (see, among others, recent works of Zaman and Toksoy [18], Zaman [19], and Zaman and Bulut [17, 20] who developed robust regression techniques to monitor the effect of extreme values). In this study, we have attempted to utilize the quantile regression with minimum covariance determinant estimator-based measures of location and propose a class of quantile regression-type mean estimators.

The main parts of this study are organized as follows. First, a review of robust regression tools and the MCD-based ratio is provided. The proposed class of quantile regression-ratio type mean estimators is then introduced, as are its MSE. In addition, numerical illustrations of the existing and proposed class of estimators are included. Finally, the conclusion is provided.

2. Review of MCD-Based Ratio Estimators

In attempts to find estimators for the location parameter that are efficient across a wide range of datasets, the robust approach as the main approach has been developed (Hogg [21]). The purpose of the robust technique is to discover a single estimator that is efficient across a wide range of datasets, even if it is not exactly ideal for any population. For this article’s purposes, the following lines give a precise description of some of the robust regression tools and MCD estimator utilized by authors to overcome the issue of extreme values in mean estimation.

As robust regression tools, we considered the following:

LAD: the least absolute deviation estimator is based on minimizing the sum of absolute squared errors (SE).

LMS: the least median of squares estimator is based on minimizing the SE median.

LTS: the least trimmed squares estimator is based on applying OLS to specified initial observations of sorted SE that is; hence, their computations are not affected by extreme values.

M-estimation: the M-estimation, M stands for maximum likelihood type, is based upon the minimization of the objective function . Some of the designed formulae for the objective function of residuals are as follows:

Huber M-estimator: Huber [22] investigated the objective function with , or as

Hampel M-estimator: Hampel [23] investigated the objective function with , , and as

Tukey M-estimator: Tukey [24] investigated the objective function with , or as

MM-estimator: the MM-estimator, MM stands for modified M-estimator, is another type of a robust regression tool that is also developed in the presence of outliers. It is proposed based on combining the high resistance to outliers of S-estimators with the high efficiency of M-estimators. The MM-estimator is a regression M-estimator with a redescending function, with the initial values of the regression coefficients and the scale estimate coming from the S-estimator, which is based on a robust scale M-estimator minimization. For details, Yohai [25] may be viewed.

MCD: the minimum covariance determinant estimator is one of the earliest and most robust estimators of multivariate location and spread. Although MCD was first introduced in 1984 (Rousseeuw [26, 27]), the development of the computationally efficient fast MCD algorithm in 1999 (Rousseeuw and Van Driessen [28]) marked the beginning of its principal application. The MCD location estimate is the mean of the observations, and , for which the sample covariance matrix determinant is as small as possible. For more details, refer studies by Al-Noor and Mohammad [29], Hubert et al. [30], Bulut and Zaman [1], and Zaman and Bulut [17, 20].

Zaman and Bulut [17] incorporated traditional measures of location such as coefficient of variation , coefficient of kurtosis , and arithmetic mean . They calculated these characteristics through MCD estimation which is highly sensitive in the absence of normality and in presence of outliers. They defined the following family of MCD-based ratio-type estimatorswhere are the population means and are the sample means when a simple random sample of size is gathered from the population. is based on robust regression tools as discussed in previous lines of the current section. Furthermore, and are either or some known population measures. The family members of are provided in Table 1. The MSE of is given bywhere and for . Furthermore, and are the unbiased variances, and be the covariance of and . All these quantities are calculated through MCD estimation as Bulut and Zaman [1].

3. Proposed Class of Quantile Regression-Ratio-Type Estimators

Quantile regression is a variant of standard linear regression that calculates the conditional median of the result variable and can be used when the assumptions of linear regression are not fulfilled. Quantiles are points in a distribution that corresponds to the rank order of the distribution’s values. The median is the value in the sorted sample that falls in the middle (middle quantile, ). Interested readers may refer to studies by Koenker and Bassett [31], Koenker and Hallock [32], Koenker [33], Hao et al. [34], and Korkmaz and Chesneau [35].

So, with the presence of outliers and based on equation (4), and are , and we propose a class of quantile regression-ratio-type estimators aswithwhere are based on MCD, is the quantile regression, and is a continuous piecewise linear function (or asymmetric absolute loss function), for quantile , but nondifferentiable at . Note that is the quantile regression coefficient for variables. The MSE of the proposed family of estimators is

For the purposes of the current study, it is worth mentioning that we are using , and quantiles. We see from the consequences of the numerical study conducted in next section that utilizing the quantile regression coefficients, based on these referenced quantiles, incredibly enhance the efficiencies of proposed estimators. Note that investigation of these five referenced quantiles leads to propose a class containing five members. For sack of readability, let us provide five members of the proposed class with their MSE in a compact form, as follows:

4. Numerical Illustration

In this section, the performance of proposed and existing estimators through two real-life applications and simulation study is presented.

4.1. Real-Life Applications

Population 1. In this dataset, “amount of nonreal estate farm loans during 1977” is taken as auxiliary variable , while “amount of real estate farm loans during 1977” is taken as study variable. Furthermore, is selected for . For remaining characteristics of the population interested readers may refer to Singh [36].

Population 2. We use “UScereals” dataset, which describes 65 widely available breakfast cereals in the USA, depending on the information available on the mandatory food label on the packet. The measurements are normalized here to a serving size of one American cup. The data come from ASA Statistical Graphics Exposition and are used by Venables and Ripley [37]. The dataset contains a number of variables. So, “grams of sodium in one portion” is taken as auxiliary variable , while “Number of calories” is taken as study variable . Furthermore, is selected for . For remaining characteristics of the population, interested readers may refer to Venables and Ripley [37].
For diagnostic checking, we should apply robust regression techniques on the referenced dataset; let us check the presence of outliers in individually through the Dixon chi-squared test for outliers presented by the Dixon [38, 39] test. An package “outliers” is used for this purpose. The results are provided in Table 2.
Table 2 provides significant results in terms of values, hence providing clear indication of the presence of outliers in our considered datasets. In light of DT, we can say that traditional OLS is not suitable for our dataset. So, there is a need to incorporate some sort of techniques, which can provide us better results in the presence of outliers. Therefore, we apply robust and quantile regression with MCD estimation. The results associated with Populations 1 and 2 are available, respectively, in Tables 3 and 4.

5. Simulation Study

In the current subsection, an assessment of proposed and some existing estimators through Monte Carlo simulation is considered. The simulation design is organized as follows. A random variable and random variable is defined as . Here, it is assumed that , and has standard normal distribution with population of size . The simple random sampling is considered for . The sampling has been replicated times. We examine empirical MSEs of , and as . The results are available in Table 5.

6. Discussion

As we see, the MSE results real-life examples and simulation study for each existing estimator based on , while their corresponding proposed estimators are based on in Tables 35. Each row of these tables structured as given follows:

The determination of best estimators is related to minimum values of MSE associated with each estimator.

Regarding Table 3, the MSE of proposed and existing estimators for Pop-1 associated to can be ordered as follows:

On the other hand, within five values of , quantile estimators perform best with , LAD performs best with , and all other estimators perform best with .

Regarding Table 4, the proposed estimators record the best performance among all competitors’ estimators for Pop-2, where the MSE of proposed and existing estimators associated to can be ordered as follows:

Also, within five values of , quantile estimators perform best with , LAD and LMS estimators perform best with , and all other estimators perform best with .

Based on Table 5, again the proposed estimators record the best performance among all compared estimators in this simulation study, where the MSE of proposed and existing estimators associated to can be ordered as follows:

Furthermore, within five values of , all estimators appear the best performance with .

Overall, with real data and simulation, the proposed estimators record the best or near-best performance compared to other robust regression estimators. In view of both these results, the proposed estimators can be considered as very robust and reasonable estimators.

7. Conclusion

Bulut and Zaman used robust minimum covariance determinant (MCD) estimation to create a new class of robust regression-type ratio estimators. In this article, drawing inspiration from Bulut and Zaman’s work, we propose to use quantile regression with MCD estimator-based location measures under a simple sampling scheme to introduce a class of quantile regression-type mean estimators to the assessment of the population mean with the appearance of outliers. The MSEs of the proposed estimators are also obtained. The performance of the proposed and some existing robust regression estimators are assessing through simulation and two real-life applications, where the Dixon chi-squared test is considered to assess the existence of outliers in the real-life datasets. Based on numerical comparisons, it is obvious that proposed estimators outperform or near outperform existing robust estimators across the considered variety of datasets. As a result, the proposed estimators may be of interest, and they will almost certainly increase the possibility of getting additional accurate estimates of the population mean when outliers exist. Also, these estimators may be developed under the ranked set sampling method, as given by Al-Omari [40], Al-Omari and Almanjahie [41], and Haq et al. [42].

Data Availability

The datasets used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.