Enhanced Estimation of the Population Mean Using Two Auxiliary Variables under Probability Proportional to Size Sampling

Ahmad, Sohaib; Zahid, Erum; Shabbir, Javid; Aamir, Muhammad; Onyango, Ronald

doi:https://doi.org/10.1155/2023/5564360

Mathematical Problems in Engineering

On this page

Abstract Introduction Discussion Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Research Article | Open Access

Volume 2023 | Article ID 5564360 | https://doi.org/10.1155/2023/5564360

Enhanced Estimation of the Population Mean Using Two Auxiliary Variables under Probability Proportional to Size Sampling

Sohaib Ahmad,¹Erum Zahid,²Javid Shabbir,^3,4Muhammad Aamir,¹and Ronald Onyango⁵

Academic Editor: Isabella Torcicollo

Received28 Dec 2022

Revised02 Mar 2023

Accepted09 Mar 2023

Published17 Apr 2023

Abstract

In some situations, the population of interest differs significantly in size, for example, in a medical study, the number of patients having a specific disease and the size of health units may vary. Similarly, in a survey related to the income of a household, the household may have a different number of siblings, and then in such situations, we use probability proportional to size sampling. In this article, we have proposed an improved class of estimators for the estimation of population mean on the basis of probability proportional to size (PPS) sampling, using two auxiliary variables. The mathematical expressions of the bias and mean square error (MSE) are derived up to the first order of approximation. Four real datasets and a simulation study are conducted to assess the efficiency of the improved class of estimators. It is found from the real datasets and a simulation study, that the proposed generalized class of estimators produced better results in terms of minimum MSE and higher PRE, as related to other considered estimators. An empirical study is given to support the theoretical results. The theoretical study also demonstrates that the proposed generalized class of estimators outperforms the existing estimators.

1. Introduction

In sampling surveys, estimating the finite population means is a frequent issue, and many attempts have been made to enhance the precision of estimators. A wide range of strategies for incorporating auxiliary variables by using ratio, product, and regression methods are described in the literature on survey sampling. Particularly, when there are two or more auxiliary variables, an extensive range of estimators have been presented, with each one combining a ratio, product, regression, and exponential-type estimator. Researchers have previously attempted to use the best statistical features to estimate population indices including mean, variance, coefficient of variation, and proportion. A representative sample from the concerned population is required. If the population of interest is homogeneous, then selecting units can be performed by using simple random sampling without replacement (SRSWOR). The population parameters of the auxiliary variable should also be previously known when using the ratio, product, and regression estimation methods. By appropriately modifying the auxiliary variables, many authors have suggested numerous estimators. The researcher can investigate the research findings by Kadilar and Cingi [1] who suggested improvement in estimating the population mean in simple random sampling using the auxiliary information. Al-Omari [2] proposed a ratio estimation of the population mean using auxiliary information in simple random sampling and median ranked set sampling. Ozturk [3] provided an estimation of the population mean and total in a finite population setting using multiple auxiliary variables. Yadav et al. [4] suggested use of auxiliary variables in searching efficient estimator of the population mean. Bhushan and Pandey [5], Kumar and Saini [6], and Singh and Nigam [7] provided a generalized class of estimators for the finite population mean using two auxiliary variables in sample surveys. Bhushan et al. [8] and Shahzad et al. [9] suggested some estimators using two auxiliary variables. Mahdizadeh and Zamanzade [10, 11] suggested some estimators for the estimation of the population mean in a ranked set sampling. Adichwal et al. [12] suggested the estimation of general parameters using auxiliary information in simple random sampling without replacement. Ullah et al. [13] suggested the estimation of a finite population mean in simple and stratified random sampling by utilizing the auxiliary, ranks, and square of the auxiliary information. Khalid and Singh [14] discussed some imputation methods to deal with the issue of missing data problems due to random nonresponse in two-occasion successive sampling. Shahzad et al. [15] discussed quantile regression-ratio-type estimators for mean estimation under complete and partial auxiliary information. Khalid and Singh [16] suggested general estimation procedures of the population mean in two-occasion successive sampling under random nonresponse. Shahzad et al. [17] suggested imputation-based mean estimators in case of missing data by utilizing robust regression and variance-covariance matrices. Ahmad et al. [18] suggested the estimation of finite population mean using the dual auxiliary variable for nonresponse using simple random sampling. Singh and Khalid [19] proposed exponential chain dual-to-ratio and regression-type estimators of the population mean in two-phase sampling. Singh et al. [20] discussed some imputation methods to deal with the problems of missing data in two-occasion successive sampling. Ahmad et al. [21] proposed using a dual ancillary variable to estimate the population mean under stratified random sampling. Singh and Nigam [22] discussed about the ratio cum product-type exponential estimator in double sampling for stratification of the finite population mean. Ahmad et al. [23] proposed an improved ratio-in-regression type variance estimator based on dual use of auxiliary variables under simple random sampling.

In many situations, the population varies considerably in size, for example, in a medical study, the number of patients having a specific disease, and the size of health units may vary. Similarly, in a survey related to the income of the household, a household may have a different number of siblings, then in such situations, the probability of units may change, and for allocating such unequal probability, we use PPS sampling. Considering the case where we need to assess the population in a province within a country, we take the auxiliary variable that has an association with the study variable. Let Y be a study variable and X be an auxiliary variable. For example, consider the case where we need to assess the population in a province within a country. Then, we would choose as our auxiliary variable a variable on which we have information, e.g., (a) population of all provinces within the country (correlation with a study variable = 0.95) and (b) the number of households in all communities within the province (correlated with a study variable = 0.99). On the basis of the above information, we would choose the auxiliary variable which has maximum correlation with the study variable. Thus, the variable at (b) may be a more useful auxiliary variable when selecting a sample using PPSWR sampling.. Many researchers have suggested several estimators by efficiently adjusting the auxiliary variables under PPS. The researcher can examine the research studies by Akpanta [24] that suggested the problems of PPS sampling in multicharacter surveys. Agarwal and Mannai [25] presented a linear combination of estimators in probability proportional to size sampling to estimate the population mean and its robustness to an optimum value. Abdulla et al. [26] and Andersen et al. [27] proposed optimal PPS sampling with vanishing auxiliary variables with applications in microscopy. Alam et al. [28] proposed the selection of the samples with probability proportional to size. Patel and Bhatt [29] gave an estimation of finite population total under PPS sampling in the presence of extra-auxiliary information; Singh et al. [30], Makela et al [31], and Ahmad and Shabbir [32] presented the use of an auxiliary variable to estimate the finite population means under PPS sampling. Ozturk [33] suggested poststratified probability proportional to size sampling from stratified populations. Latpate et al. [34] discussed in detail probability proportional to the size sampling. Sohil et al. [35] suggested optimum second call imputation in PPS sampling and Sinha and Khanna [36] discussed the estimation of the population mean under probability proportional to size sampling with and without measurement errors. Makela and Gelman [31] suggested Bayesian inference under cluster sampling with probability proportional to size. Zangeneh and Little [37] suggested Bayesian inference for the finite population total from a heteroscedastic probability proportional to the size sample. Abdulla et al. [26] discussed on the selection of samples in probability proportional to size sampling: cumulative relative frequency method. Sohail et al. [38] discussed homogeneous imputation under two-phase probability proportional to size sampling.

2. Notations and Symbols

We consider a finite population which contains N identifiable units. We suppose that denotes the study variable, and denotes the auxiliary variables X and Z. We assume a sample of size n is taken with the help of probability proportional to size sampling without6 replacement.

Let be the PPS sampling for obtaining the units, and we draw a sample of size n by adopting the PPS sampling with replacement.

We define

Let and be the sample mean corresponding to the population mean and .

We consider the following error terms for obtaining the properties of the estimators:where , .

3. Existing Estimators

In this section, we study some existing estimators that are available in the literature.(1)The traditional estimator is The variance of is(2)Cochran [39] suggested the following estimator:(3)Murthy [40] suggested a product estimator, given by(4)The regression type estimator is where and(5)Bahl and Tuteja [41] suggested the following estimators: The biases and MSEs of and are given by(6)Haq and Shabbir [42] suggested the following three exponential-type estimators: where The MSE of at the optimal values is given by where The MSE of at the optimal values is given by where The MSE of at the optimal values is given by(7)Ekpenyong and Enang [43] suggested the following estimator:where

The MSE of , at the optimal values, is given by

4. Proposed Estimator

The use of auxiliary variables increases the efficiency of estimators both during the design and at the estimation stages. By taking motivation from Shabbir and Gupta [44], we propose a new generalized class of estimators using two auxiliary variables under probability proportional to size sampling. Mathematical properties such as bias and mean square error are obtained up to the first order of approximation. The main benefit of our generalized class of estimators under probability proportional to size sampling is that it is more flexible and efficient than the existing estimators, which are given bywhere a and b are the known constants and .

Substituting different values of a and b in equation (21), we can generate some new estimators from our generalized class of estimators, which are shown in Table 1.

By simplifying (22) and keeping terms up to the first-order approximation, we can write the equation as

From (23), the bias and MSE of , are given by

The optimal values of and are obtained by simplifying (24) as follows:

By putting the optimal values of and in (24), we get the minimum mean square error of , which is given by

5. Efficiency Comparisons

In this section, we compared the proposed generalized class estimators with some existing estimators which are considered here.(i)By taking (2) and (26), we get(ii)By taking (3) and (26), we get(iii)By taking (4) and (26), we get(iv)By taking (6) and (26), we get(v)By taking (8) and (26), we get(vi)From (12) and (26), we have(vii)From (14) and (26), we have(viii)From (16) and (26), we have(ix)From (18) and (26), we have(x)From (12) and (26), we have

6. Summary Statistics and Numerical Results

In this section, we consider different populations for numerical comparisons of the proposed and existing estimators. We consider four real datasets. Data descriptions of these datasets are given in Table 2. The performance of the considered estimators is compared in terms of percentage relative efficiency (PRE). The PRE of with respect to is expressed aswhere , , , , , , , , , , , , , , , , , , and .

Population: 1. (sSource: Punjab Bureau of Statistics (2021-2022)).
Y = total number of beds on 30^th June 2021
X = total allocated beds for COVID-19
Z = beds used for COVID-19

Population 2. (source: Punjab Bureau of Statistics (2021-2022)).
Y = children below age 5 whose births are reported to be registered with a civil authority
X = children aged 5–17 years who are elaborate in child labor during the last week
Z = women aged 20–24 years who were first married before the age of 16

Population 3. (source: Punjab Bureau of Statistics (2021-2022)).
Y = chance of dying within one year
X = chance of dying between the first and the fifth birth dates
Z = chance of dying between the birth and fifth birth dates

Population 4. (source: Punjab Bureau of Statistics (2021-2022)).
Y = ASFR for women aged 15–19 years
X = women aged 20–24 years who have had a live birth before age 18
Z = currently married women aged 15–49 years who are using a contraceptive method

7. Simulation Study

We have initiated four populations of size 5,000 from a multivariate normal distribution with different covariance matrices. The population means and covariance matrices are as follows: Population I: Population II: Population III: Population IV:

8. Discussion

As mentioned above, to estimate the efficiency of the proposed generalized class of estimators in comparison to the existing estimators, four real datasets and a simulation analysis were performed. We also consider different sample sizes from the populations. The proposed generalized class of estimators and the adopted PPS estimators were compared to each other with respect to their mean square error and percentage relative efficiency. In Table 2, we present the summary statistics for the available populations. The results of the mean square error and PRE are shown in Tables 3 and 4, which are based on the real datasets. From the numerical findings which are shown in Tables 3 and 4, it is validated that the proposed estimators were more precise in terms of minimum MSE and higher PRE as compared to the existing estimators. Tables 5 and 6 include the mean square error and PRE results using simulated datasets. The results based on real datasets and simulation study clearly show that the PRE of the proposed class of estimators is higher as compared to the existing estimators, which are considered in this study. Thus, we acclaim categorically that the use of our proposed class of estimators over the existing estimators is better as compared to other considered estimators.

9. Conclusion

This article uses two auxiliary variables to propose a new generalized class of estimators for the estimation of the finite population mean under the probability proportional to size sampling. We generate ten new estimators from the proposed class of estimators. The bias and mean square error can be derived up to the first order of approximation. The proposed class of estimators performs well as compared to its existing estimators as shown by the results of four real datasets. Also to evaluate the strength and generalizability of the proposed class of estimators, a simulation study is also taken into account. It has been demonstrated in both the theoretical and numerical findings that the proposed estimator is more efficient than the usual estimators. The current work can be easily extended to estimate the population mean by using auxiliary variables based on measurement error, nonresponse, and population proportion.

Data Availability

All data supporting the current study are available in the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

C. Kadilar and H. Cingi, “Improvement in estimating the population mean in simple random sampling,” Applied Mathematics Letters, vol. 19, no. 1, pp. 75–79, 2006.
View at: Publisher Site | Google Scholar
A. I. Al-Omari, “Ratio estimation of the population mean using auxiliary information in simple random sampling and median ranked set sampling,” Statistics and Probability Letters, vol. 82, no. 11, pp. 1883–1890, 2012.
View at: Google Scholar
O. Ozturk, “Estimation of population mean and total in a finite population setting using multiple auxiliary variables,” Journal of Agricultural, Biological, and Environmental Statistics, vol. 19, no. 2, pp. 161–184, 2014.
View at: Publisher Site | Google Scholar
S. K. Yadav, D. K. Sharma, S. S. Mishra, and A. K. Shukla, “Use of auxiliary variables in searching efficient estimator of population mean,” International Journal of Multivariate Data Analysis, vol. 1, no. 3, pp. 230–244, 2018.
View at: Publisher Site | Google Scholar
S. Bhushan and A. P. Pandey, “Optimality of ratio-type imputation methods for estimation of population mean using higher order moment of an auxiliary variable,” Journal of Statistical Theory and Practice, vol. 15, no. 2, pp. 48–35, 2021.
View at: Publisher Site | Google Scholar
A. Kumar and M. Saini, “A predictive approach for finite population mean when auxiliary variables are attributes,” Thailand Statistician, vol. 20, no. 3, pp. 575–584, 2022.
View at: Google Scholar
H. P. Singh and P. Nigam, “A generalized class of estimators for finite population mean using two auxiliary variables in sample surveys,” Journal of Reliability and Statistical Studies, pp. 61–104, 2022.
View at: Publisher Site | Google Scholar
S. Bhushan, A. Kumar, R. Onyango, and S. Singh, “Some improved classes of estimators in stratified sampling using bivariate auxiliary information,” Journal of Probability and Statistics, vol. 2022, Article ID 2660114, 23 pages, 2022.
View at: Publisher Site | Google Scholar
U. Shahzad, M. Hanif, I. Sajjad, and M. M. Anas, “Quantile regression-ratio-type estimators for mean estimation under complete and partial auxiliary information,” Scientia Iranica, vol. 10, no. 1, pp. 0–1715, 2020.
View at: Publisher Site | Google Scholar
M. Mahdizadeh and E. Zamanzade, “On interval estimation of the population mean in ranked set sampling,” Communications in Statistics - Simulation and Computation, vol. 51, no. 5, pp. 2747–2768, 2022.
View at: Publisher Site | Google Scholar
M. Mahdizadeh and E. Zamanzade, “Estimation of a symmetric distribution function in multistage ranked set sampling,” Statistical Papers, vol. 61, no. 2, pp. 851–867, 2020.
View at: Publisher Site | Google Scholar
N. Kumar Adichwal, A. Ali H Ahmadini, Y. Singh Raghav, R. Singh, and I. Ali, “Estimation of general parameters using auxiliary information in simple random sampling without replacement,” Journal of King Saud University Science, vol. 34, no. 2, Article ID 101754, 2022.
View at: Publisher Site | Google Scholar
K. Ullah, Z. Hussain, I. Hussain, S. A. Cheema, Z. Almaspoor, and M. El-Morshedy, “Estimation of finite population mean in simple and stratified random sampling by utilizing the auxiliary, ranks, and square of the auxiliary information,” Mathematical Problems in Engineering, vol. 2022, Article ID 5263492, 12 pages, 2022.
View at: Publisher Site | Google Scholar
M. Khalid and G. N. Singh, “Some imputation methods to deal with the issue of missing data problems due to random non-response in two-occasion successive sampling,” Communications in Statistics - Simulation and Computation, vol. 51, no. 12, pp. 7266–7286, 2022.
View at: Publisher Site | Google Scholar
U. Shahzad, N. H. Al-Noor, M. Hanif, I. Sajjad, and M. Muhammad Anas, “Imputation based mean estimators in case of missing data utilizing robust regression and variance–covariance matrices,” Communications in Statistics - Simulation and Computation, vol. 51, no. 8, pp. 4276–4295, 2020.
View at: Publisher Site | Google Scholar
M. Khalid and G. N. Singh, “General estimation procedures of population mean in two-occasion successive sampling under random non-response,” Proceedings of the National Academy of Sciences, India, Section A: Physical Sciences, vol. 92, no. 2, pp. 205–215, 2022.
View at: Publisher Site | Google Scholar
U. Shahzad, I. Ahmad, I. M. Almanjahie, M. Hanif, and N. H. Al-Noor, “L-Moments and calibration based variance estimators under double stratified random sampling scheme: an application of covid-19 pandemic,” Scientia Iranica, vol. 23, no. 3, 2022.
View at: Publisher Site | Google Scholar
S. Ahmad, S. Hussain, M. Aamir, F. Khan, M. N. Alshahrani, and M. Alqawba, “Estimation of finite population mean using dual auxiliary variable for non-response using simple random sampling,” Aims Mathematics, vol. 7, no. 3, pp. 4592–4613, 2022.
View at: Publisher Site | Google Scholar
G. N. Singh and M. Khalid, “Exponential chain dual to ratio and regression type estimators of population mean in two-phase sampling,” Statistica, vol. 75, no. 4, pp. 379–389, 2015.
View at: Google Scholar
G. N. Singh, M. Khalid, and J. M. Kim, “Some imputation methods to deal with the problems of missing data in two-occasion successive sampling,” Communications in Statistics - Simulation and Computation, vol. 50, no. 2, pp. 557–580, 2021.
View at: Publisher Site | Google Scholar
S. Ahmad, S. Hussain, U. Yasmeen et al., “A simulation study: using dual ancillary variable to estimate population mean under stratified random sampling,” PLoS One, vol. 17, no. 11, Article ID e0275875, 2022.
View at: Publisher Site | Google Scholar
H. P. Singh and P. Nigam, “Ratio cum product-type exponential estimator in double sampling for stratification of finite population mean,” Afrika Matematika, vol. 33, no. 3, p. 79, 2022.
View at: Publisher Site | Google Scholar
S. Ahmad, S. Hussain, K. Ullah et al., “A simulation study: improved ratio-in-regression type variance estimator based on dual use of auxiliary variable under simple random sampling,” PLoS One, vol. 17, no. 11, Article ID e0276540, 2022.
View at: Publisher Site | Google Scholar
A. C. Akpanta, “On the problems of PPS sampling in multi-character surveys,” Global Journal of Mathematical Sciences, vol. 8, no. 1, pp. 31–42, 2010.
View at: Publisher Site | Google Scholar
S. K. Agarwal and M. Al Mannai, “Linear combination of estimators in probability proportional to sizes sampling to estimate the population mean and its robustness to optimum value,” Statistica, vol. 69, no. 1, 2009.
View at: Google Scholar
F. Abdulla, M. Hossain, and M. Rahman, “On the selection of samples in probability proportional to size sampling: cumulative relative frequency method,” Mathematical Theory and Modeling, vol. 4, no. 6, Article ID 102Á7, 2014.
View at: Google Scholar
I. T. Andersen, U. Hahn, and E. B. Vedel Jensen, “Optimal PPS sampling with vanishing auxiliary variables–with applications in microscopy,” Scandinavian Journal of Statistics, vol. 42, no. 4, pp. 1136–1148, 2015.
View at: Publisher Site | Google Scholar
M. Alam, S. A. Sumy, and Y. A. Parh, “Selection of the samples with probability proportional to size,” Science Journal of Applied Mathematics and Statistics, vol. 3, no. 5, pp. 230–233, 2015.
View at: Publisher Site | Google Scholar
P. A. Patel and S. Bhatt, “Estimation of finite population total under PPS sampling in presence of extra auxiliary information,” International Journal of Statistics and Analysis, vol. 6, no. 1, pp. 9–16, 2016.
View at: Google Scholar
H. P. Singh, A. C. Mishra, and S. K. Pal, “Improved estimator of population total in PPS sampling,” Communications in Statistics - Theory and Methods, vol. 47, no. 4, pp. 912–934, 2018.
View at: Publisher Site | Google Scholar
S. Makela, Y. Si, and A. Gelman, “Bayesian inference under cluster sampling with probability proportional to size,” Statistics in Medicine, vol. 37, no. 26, pp. 3849–3868, 2018.
View at: Publisher Site | Google Scholar
S. Ahmad and J. Shabbir, “Use of extreme values to estimate finite population mean under pps sampling scheme,” Journal of Reliability and Statistical Studies, vol. 12, pp. 99–112, 2018.
View at: Google Scholar
O. Ozturk, “Post-stratified probability-proportional-to-size sampling from stratified populations,” Journal of Agricultural, Biological, and Environmental Statistics, vol. 24, no. 4, pp. 693–718, 2019.
View at: Publisher Site | Google Scholar
R. Latpate, J. Kshirsagar, V. Kumar Gupta, and G. Chandra, “Probability proportional to size sampling,” In Advanced Sampling Methods, Springer, Singapore, 2021.
View at: Google Scholar
F. Sohil, M. U. Sohail, and J. Shabbir, “Optimum second call imputation in PPS sampling,” PLoS One, vol. 17, no. 1, Article ID e0261834, 2022.
View at: Publisher Site | Google Scholar
R. R. Sinha and B. Khanna, “Estimation of population mean under probability proportional to size sampling with and without measurement errors,” Concurrency and Computation: Practice and Experience, vol. 21, no. 1, Article ID e7023, 2022.
View at: Google Scholar
S. Z. Zangeneh and R. J. A. Little, “Bayesian inference for the finite population total from a heteroscedastic probability proportional to size sample,” Journal of Survey Statistics and Methodology, vol. 3, no. 2, pp. 162–192, 2015.
View at: Publisher Site | Google Scholar
U. Sohail, J. Shabbir, and C. Kadilar, “Homogeneous imputation under two phase probability proportional to size sampling,” Hacettepe Journal of Mathematics and Statistics, vol. 48, no. 5, pp. 1–23, 2019.
View at: Publisher Site | Google Scholar
W. G. Cochran, “The estimation of the yields of cereal experiments by sampling for the ratio of grain to total produce,” The Journal of Agricultural Science, vol. 30, no. 2, pp. 262–275, 1940.
View at: Publisher Site | Google Scholar
M. N. Murthy, “Product method of estimation,” Sankhya: The Indian Journal of Statistics, Series A, vol. 26, no. 1, pp. 69–74, 1964.
View at: Google Scholar
S. Bahl and R. Tuteja, “Ratio and Product type exponential estimators,” Journal of Information and Optimization Sciences, vol. 12, no. 1, pp. 159–164, 1991.
View at: Publisher Site | Google Scholar
A. Haq and J. Shabbir, “Improved exponential type estimators of finite population mean under complete and partial auxiliary information,” Hacettepe Journal of Mathematics and Statistics, vol. 43, no. 3, pp. 1–1093, 2014.
View at: Publisher Site | Google Scholar
E. J. Ekpenyong and E. I. Enang, “Efficient exponential ratio estimator for estimating the population mean in simple random sampling,” Hacettepe Journal of Mathematics and Statistics, vol. 44, no. 19, pp. 1–705, 2014.
View at: Publisher Site | Google Scholar
J. Shabbir and S. Gupta, “Using rank of the auxiliary variable in estimating variance of the stratified sample mean,” International Journal of Computational and Theoretical Statistics, vol. 6, no. 2, 2019.
View at: Google Scholar

Copyright

Copyright © 2023 Sohaib Ahmad et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies