Abstract

In some situations, the population of interest differs significantly in size, for example, in a medical study, the number of patients having a specific disease and the size of health units may vary. Similarly, in a survey related to the income of a household, the household may have a different number of siblings, and then in such situations, we use probability proportional to size sampling. In this article, we have proposed an improved class of estimators for the estimation of population mean on the basis of probability proportional to size (PPS) sampling, using two auxiliary variables. The mathematical expressions of the bias and mean square error (MSE) are derived up to the first order of approximation. Four real datasets and a simulation study are conducted to assess the efficiency of the improved class of estimators. It is found from the real datasets and a simulation study, that the proposed generalized class of estimators produced better results in terms of minimum MSE and higher PRE, as related to other considered estimators. An empirical study is given to support the theoretical results. The theoretical study also demonstrates that the proposed generalized class of estimators outperforms the existing estimators.

1. Introduction

In sampling surveys, estimating the finite population means is a frequent issue, and many attempts have been made to enhance the precision of estimators. A wide range of strategies for incorporating auxiliary variables by using ratio, product, and regression methods are described in the literature on survey sampling. Particularly, when there are two or more auxiliary variables, an extensive range of estimators have been presented, with each one combining a ratio, product, regression, and exponential-type estimator. Researchers have previously attempted to use the best statistical features to estimate population indices including mean, variance, coefficient of variation, and proportion. A representative sample from the concerned population is required. If the population of interest is homogeneous, then selecting units can be performed by using simple random sampling without replacement (SRSWOR). The population parameters of the auxiliary variable should also be previously known when using the ratio, product, and regression estimation methods. By appropriately modifying the auxiliary variables, many authors have suggested numerous estimators. The researcher can investigate the research findings by Kadilar and Cingi [1] who suggested improvement in estimating the population mean in simple random sampling using the auxiliary information. Al-Omari [2] proposed a ratio estimation of the population mean using auxiliary information in simple random sampling and median ranked set sampling. Ozturk [3] provided an estimation of the population mean and total in a finite population setting using multiple auxiliary variables. Yadav et al. [4] suggested use of auxiliary variables in searching efficient estimator of the population mean. Bhushan and Pandey [5], Kumar and Saini [6], and Singh and Nigam [7] provided a generalized class of estimators for the finite population mean using two auxiliary variables in sample surveys. Bhushan et al. [8] and Shahzad et al. [9] suggested some estimators using two auxiliary variables. Mahdizadeh and Zamanzade [10, 11] suggested some estimators for the estimation of the population mean in a ranked set sampling. Adichwal et al. [12] suggested the estimation of general parameters using auxiliary information in simple random sampling without replacement. Ullah et al. [13] suggested the estimation of a finite population mean in simple and stratified random sampling by utilizing the auxiliary, ranks, and square of the auxiliary information. Khalid and Singh [14] discussed some imputation methods to deal with the issue of missing data problems due to random nonresponse in two-occasion successive sampling. Shahzad et al. [15] discussed quantile regression-ratio-type estimators for mean estimation under complete and partial auxiliary information. Khalid and Singh [16] suggested general estimation procedures of the population mean in two-occasion successive sampling under random nonresponse. Shahzad et al. [17] suggested imputation-based mean estimators in case of missing data by utilizing robust regression and variance-covariance matrices. Ahmad et al. [18] suggested the estimation of finite population mean using the dual auxiliary variable for nonresponse using simple random sampling. Singh and Khalid [19] proposed exponential chain dual-to-ratio and regression-type estimators of the population mean in two-phase sampling. Singh et al. [20] discussed some imputation methods to deal with the problems of missing data in two-occasion successive sampling. Ahmad et al. [21] proposed using a dual ancillary variable to estimate the population mean under stratified random sampling. Singh and Nigam [22] discussed about the ratio cum product-type exponential estimator in double sampling for stratification of the finite population mean. Ahmad et al. [23] proposed an improved ratio-in-regression type variance estimator based on dual use of auxiliary variables under simple random sampling.

In many situations, the population varies considerably in size, for example, in a medical study, the number of patients having a specific disease, and the size of health units may vary. Similarly, in a survey related to the income of the household, a household may have a different number of siblings, then in such situations, the probability of units may change, and for allocating such unequal probability, we use PPS sampling. Considering the case where we need to assess the population in a province within a country, we take the auxiliary variable that has an association with the study variable. Let Y be a study variable and X be an auxiliary variable. For example, consider the case where we need to assess the population in a province within a country. Then, we would choose as our auxiliary variable a variable on which we have information, e.g., (a) population of all provinces within the country (correlation with a study variable = 0.95) and (b) the number of households in all communities within the province (correlated with a study variable = 0.99). On the basis of the above information, we would choose the auxiliary variable which has maximum correlation with the study variable. Thus, the variable at (b) may be a more useful auxiliary variable when selecting a sample using PPSWR sampling.. Many researchers have suggested several estimators by efficiently adjusting the auxiliary variables under PPS. The researcher can examine the research studies by Akpanta [24] that suggested the problems of PPS sampling in multicharacter surveys. Agarwal and Mannai [25] presented a linear combination of estimators in probability proportional to size sampling to estimate the population mean and its robustness to an optimum value. Abdulla et al. [26] and Andersen et al. [27] proposed optimal PPS sampling with vanishing auxiliary variables with applications in microscopy. Alam et al. [28] proposed the selection of the samples with probability proportional to size. Patel and Bhatt [29] gave an estimation of finite population total under PPS sampling in the presence of extra-auxiliary information; Singh et al. [30], Makela et al [31], and Ahmad and Shabbir [32] presented the use of an auxiliary variable to estimate the finite population means under PPS sampling. Ozturk [33] suggested poststratified probability proportional to size sampling from stratified populations. Latpate et al. [34] discussed in detail probability proportional to the size sampling. Sohil et al. [35] suggested optimum second call imputation in PPS sampling and Sinha and Khanna [36] discussed the estimation of the population mean under probability proportional to size sampling with and without measurement errors. Makela and Gelman [31] suggested Bayesian inference under cluster sampling with probability proportional to size. Zangeneh and Little [37] suggested Bayesian inference for the finite population total from a heteroscedastic probability proportional to the size sample. Abdulla et al. [26] discussed on the selection of samples in probability proportional to size sampling: cumulative relative frequency method. Sohail et al. [38] discussed homogeneous imputation under two-phase probability proportional to size sampling.

2. Notations and Symbols

We consider a finite population which contains N identifiable units. We suppose that denotes the study variable, and denotes the auxiliary variables X and Z. We assume a sample of size n is taken with the help of probability proportional to size sampling without6 replacement.

Let be the PPS sampling for obtaining the units, and we draw a sample of size n by adopting the PPS sampling with replacement.

We define

Let and be the sample mean corresponding to the population mean and .

We consider the following error terms for obtaining the properties of the estimators:where , .

3. Existing Estimators

In this section, we study some existing estimators that are available in the literature.(1)The traditional estimator isThe variance of is(2)Cochran [39] suggested the following estimator:(3)Murthy [40] suggested a product estimator, given by(4)The regression type estimator iswhere and(5)Bahl and Tuteja [41] suggested the following estimators:The biases and MSEs of and are given by(6)Haq and Shabbir [42] suggested the following three exponential-type estimators:whereThe MSE of at the optimal values is given bywhereThe MSE of at the optimal values is given bywhereThe MSE of at the optimal values is given by(7)Ekpenyong and Enang [43] suggested the following estimator:where

The MSE of , at the optimal values, is given by

4. Proposed Estimator

The use of auxiliary variables increases the efficiency of estimators both during the design and at the estimation stages. By taking motivation from Shabbir and Gupta [44], we propose a new generalized class of estimators using two auxiliary variables under probability proportional to size sampling. Mathematical properties such as bias and mean square error are obtained up to the first order of approximation. The main benefit of our generalized class of estimators under probability proportional to size sampling is that it is more flexible and efficient than the existing estimators, which are given bywhere a and b are the known constants and .

Substituting different values of a and b in equation (21), we can generate some new estimators from our generalized class of estimators, which are shown in Table 1.

By simplifying (22) and keeping terms up to the first-order approximation, we can write the equation as

From (23), the bias and MSE of , are given by

The optimal values of and are obtained by simplifying (24) as follows:

By putting the optimal values of and in (24), we get the minimum mean square error of , which is given by

5. Efficiency Comparisons

In this section, we compared the proposed generalized class estimators with some existing estimators which are considered here.(i)By taking (2) and (26), we get(ii)By taking (3) and (26), we get(iii)By taking (4) and (26), we get(iv)By taking (6) and (26), we get(v)By taking (8) and (26), we get(vi)From (12) and (26), we have(vii)From (14) and (26), we have(viii)From (16) and (26), we have(ix)From (18) and (26), we have(x)From (12) and (26), we have

6. Summary Statistics and Numerical Results

In this section, we consider different populations for numerical comparisons of the proposed and existing estimators. We consider four real datasets. Data descriptions of these datasets are given in Table 2. The performance of the considered estimators is compared in terms of percentage relative efficiency (PRE). The PRE of with respect to is expressed aswhere , , , , , , , , , , , , , , , , , , and .

Population: 1. (sSource: Punjab Bureau of Statistics (2021-2022)).
Y = total number of beds on 30th June 2021
X = total allocated beds for COVID-19
Z = beds used for COVID-19

Population 2. (source: Punjab Bureau of Statistics (2021-2022)).
Y = children below age 5 whose births are reported to be registered with a civil authority
X = children aged 5–17 years who are elaborate in child labor during the last week
Z = women aged 20–24 years who were first married before the age of 16

Population 3. (source: Punjab Bureau of Statistics (2021-2022)).
Y = chance of dying within one year
X = chance of dying between the first and the fifth birth dates
Z = chance of dying between the birth and fifth birth dates

Population 4. (source: Punjab Bureau of Statistics (2021-2022)).
Y = ASFR for women aged 15–19 years
X = women aged 20–24 years who have had a live birth before age 18
Z = currently married women aged 15–49 years who are using a contraceptive method

7. Simulation Study

We have initiated four populations of size 5,000 from a multivariate normal distribution with different covariance matrices. The population means and covariance matrices are as follows:Population I:Population II:Population III:Population IV:

8. Discussion

As mentioned above, to estimate the efficiency of the proposed generalized class of estimators in comparison to the existing estimators, four real datasets and a simulation analysis were performed. We also consider different sample sizes from the populations. The proposed generalized class of estimators and the adopted PPS estimators were compared to each other with respect to their mean square error and percentage relative efficiency. In Table 2, we present the summary statistics for the available populations. The results of the mean square error and PRE are shown in Tables 3 and 4, which are based on the real datasets. From the numerical findings which are shown in Tables 3 and 4, it is validated that the proposed estimators were more precise in terms of minimum MSE and higher PRE as compared to the existing estimators. Tables 5 and 6 include the mean square error and PRE results using simulated datasets. The results based on real datasets and simulation study clearly show that the PRE of the proposed class of estimators is higher as compared to the existing estimators, which are considered in this study. Thus, we acclaim categorically that the use of our proposed class of estimators over the existing estimators is better as compared to other considered estimators.

9. Conclusion

This article uses two auxiliary variables to propose a new generalized class of estimators for the estimation of the finite population mean under the probability proportional to size sampling. We generate ten new estimators from the proposed class of estimators. The bias and mean square error can be derived up to the first order of approximation. The proposed class of estimators performs well as compared to its existing estimators as shown by the results of four real datasets. Also to evaluate the strength and generalizability of the proposed class of estimators, a simulation study is also taken into account. It has been demonstrated in both the theoretical and numerical findings that the proposed estimator is more efficient than the usual estimators. The current work can be easily extended to estimate the population mean by using auxiliary variables based on measurement error, nonresponse, and population proportion.

Data Availability

All data supporting the current study are available in the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.