Abstract

The purpose of any statistical analysis is to uncover the trend using real situations and provide an accurate model for future policies. In order to fulfill this aim, it is important to collect and use the data carefully and minimize the possibility of various types of errors. Nonresponse (NR) and measurement error (ME) are the two major types of nonsampling error that occur in almost every survey. We propose an estimator of the population mean and study its efficacy in the combined and separate effects of NR and ME. Some existing estimators are also considered for comparison. A simulation study was performed to see its efficiency numerically.

1. Introduction

We make inferences about the population by using samples. There is, therefore, a possibility of error that occurs due to the limited nature of sampling, which inevitably leaves out a major part of the population from its representation. This error is called a sampling error. An error that is different from the sampling error is called a nonsampling error. Sampling error tends to decrease when the sample size increases, but nonsampling error increases when the sample size increases. The two main factors that contribute to nonsampling error are nonresponse (NR) and measurement error (ME). The unavailability of data due to NR is a common problem in the survey. At present, most of the surveys are performed online (through e-mail, social media, etc.). Online surveys are cost-effective, but the chances of NR are also high. The likelihood of NR is at its minimum when the sample is collected by directly interviewing the respondents. Hansen and Hurwitz [1] present a method to deal with NR. We first send the questionnaire to selected units in the sample through online mode, and subsequently take a subsample from the group of nonrespondents and conduct a direct interview. Cochran [2] applies the method of Hansen and Hurwitz [1] to ratio and regression estimator. Moreover, the input through the auxiliary information is expected to increase the efficiency of the estimators. Khare and Srivastava [3] study ratio estimator when complete information is available about the auxiliary variable and NR is present only on the study variable. Okafor and Lee [4] consider the ratio and regression estimator under NR in the double sampling. Some other conspicuous works on the estimation under NR are Singh and Kumar [5], Kreuter et al. [6], Unal and Kadilar [7], Bii et al. [8], and Pandey et al. [9].

Another nonsampling error is measurement error (ME), which is recognized as a difference between the observed value and the true value. There are many sources that can lead to ME, such as equipment error, error in data entry, bias in the survey questionnaire, biased processing, false information, and respondent error. Hansen et al. [10] present a study on ME in the census and show that ignoring ME would generate unreliable results. Chandhok [11] points out various perspectives on ME in his thesis. Fuller [12] presents various models for ME. Shalabh [13] studies the classical ratio estimator in the presence of ME. Some more works on ME are by Shukla et al. [14], Shalabh and Tsai [15], and Kumar et al. [16].

In the majority of the work, the study is either on NR or ME. But in real practice, both appear together, usually. Azeem [17] works on the simultaneous presence of NR and ME. In his thesis, he studies various well-known estimators and also suggests some new estimators under NR and ME. Further, Kumar et al. [18] propose a transformation-based exponential type estimator of the population mean in the presence of NR and ME. Some recent contributions on the topic are Singh and Sharma [19], Azeem and Hanif [20], Kumar, Trehan, and Joorel [21], Singh, Bhattacharyya, and Bandyopadhyay [22], and Audu et al. [23].

Continuing the work, we propose an estimator of the population mean and study it in situations such as when there is no presence of NR and ME or it is negligible, when there is only NR, when there is only ME, and when there is both NR and ME together. It will be interesting to see the results of all four cases together and the comparison of their effects.

2. Sampling Strategy and Notations

Let be a finite population of size . Let be a sample of the size taken from using simple random sampling without replacement (SRSWOR) scheme. Let and be the study and the auxiliary variables, respectively. With this initial information, we use the following sampling strategies.

Some usual notations are , be the sample means and , be the population means of the study variable and the auxiliary variable , respectively. and be the population variances for and . is correlation coefficient between and . , .

2.1. Sampling Strategy

First, we consider the default sampling strategy. That is, there is no existence of NR and ME or it is minute enough to be ignored. Let and be the observed values of and at units in the sample. We denote this sampling strategy by . , , and .

2.2. Sampling Strategy

Let measurement error (ME) be present in the sample. That is, the sample is selected, recorded, or processed with some unknown ME. Let be the ME on and be the ME on . We use to mention the presence of ME. So, in the presence of ME, the observed unit for study variable and auxiliary variable in the sample is and , . and , where and be the true values of and at unit in the sample and and be the corresponding ME. Since ME present on and are independent of each other, so and are uncorrelated. Also, we assume that the average of and are zero as the ME caused by both under-reporting and over-reporting. and be the population variance for and , respectively. We denote this sampling strategy by . , , .

2.3. Sampling Strategy

Let there is a nonresponse (NR). Let and be the group of respondents and nonrespondents in the population such that and and be the group of respondent and nonrespondent in the sample such that . We use Hansen and Hurwitz’s [1] technique to handle the problem of NR. In this method, we take a sample of units using SRSWOR and mail a questionnaire. Among units, units respond to the questionnaire while from nonresponding units, we select a subsample of size and arrange a direct interview to them by assuming they will respond this time. With this method, Hansen and Hurwitz [1] redefined the sample mean per unit estimator in the presence of NR as ; where , , , and . The variance of is ; where , . is population variance for for the group of nonrespondent units.

Similarly, ; where and . The variance of is ; is population variance of for the group of nonrespondent units. We denote this sampling strategy by . , , and ; is the correlation between and for the group of nonrespondents.

2.4. Sampling Strategy

Let there is both NR and ME. That is, both the sampling strategies and are combined. We use to mention the presence of NR and ME. In the presence of NR and ME, the observed unit for the study variable and the auxiliary variable in the sample is and , , respectively. and , where and be the true values of and at unit in the sample and and be the corresponding ME. , , , and . and be population variances of and for the group of nonrespondents. We denote this sampling strategy by . , , and .

3. Some Existing Estimators

The usual unbiased estimator is the most basic estimator of the population mean. Hansen and Hurwitz [1] modify it for situation of NR. Grover and Kaur [24] propose an estimator under sampling strategy as , where are suitable constants. They study various properties of the estimator and show that this estimator performs better than the classical ratio, product and regression estimator, Bahl and Tuteja’s [25] exponential estimator, and some more. To increase the efficiency of the mean estimators, Ekpenyong and Enang [26] suggest an estimator under sampling strategy as , where are suitable constants. This estimator proves its worth over Gupta and Shabbir [27] and Singh and Solanki [28] estimators. Singh and Pal [29] propose an exponential ratio type estimator as . Study shows that performs efficiently than the usual unbiased estimator, classical ratio estimator, Kadilar and Cingi [30], and Bahl and Tuteja [25] estimator. Gupta and Yadav [31] propose an estimator as , where is a suitable constant. For the optimum value of , perform similar to regression estimator.

These estimators are defined under sampling strategy . We can redefine them in the sampling strategy aswhere .

The mean squared error (MSE) of the estimators , , , , and up to the first order of approximation under the sampling strategy ; are

4. Proposed Class of Estimators and Their MSE

We propose the following class of estimators to estimate the population mean of the study variable under sampling strategy .where , , , and are either constants or known parameters of the auxiliary variable.

Under sampling strategy , the proposed estimator is presented as

The MSE of the proposed estimator in various defined sampling strategies is presented in the following theorem.

Theorem 1. The bias and minimum MSE of up to the first order of approximation for the optimum values of and under the sampling strategy ; iswhere , ; , , , , ; and .

Proof. The proof is provided in the appendix section.

5. Theoretical Comparison

If and are the estimators of , then is said to be more efficient than whenever .

Here, we have to find the conditions under which the proposed estimator is more efficient than the existing estimators. Under the sampling strategy ; , the proposed estimator will be more efficient than , , , , and whenever the following respective condition satisfies.(1), when(2), when(3), when(4), when(5), when

6. Simulation Study

To study the performance of the estimators numerically, we have generated an artificial population using R software. The bivariate normal population is derived by input data , the population mean vector , and the variance-covariance matrix as

When required, we employ a ME on study variable and on auxiliary variable . and have been taken as normal distribution with mean zero and standard deviation 5. For the situation of NR, we assume there is 20 percent nonresponse, i.e., and among the nonrespondents units in the sample, 40 percent are selected in the subsample for callback, i.e., . The MSE of the estimators are calculated using equations (1)–(5) and (9). MSEs are calculated for three different samples of sizes , , and . We have repeated the process 10000 times to get stable result. Here, we are presenting the proposed estimator for limited values of . We have taken , where can be any real number. For , the member estimator will be . For any real number , the estimator will perform similar. That is, , , etc., will give same result. Similarly, , , etc., will give same result as for any real number . Infinitely many combinations of can be made that will give efficient results.

The percent relative efficiency (PRE) of the estimator is calculated with respect to using the following equation:

6.1. Interpretations from the Table

From Tables 13, we can conclude that(1) The MSEs of the proposed estimators are sufficiently smaller than the MSEs of the existing estimators in every situation. This proves that the proposed class of estimators is more efficient than the existing estimators.(2) MSEs of all the estimators increase when we approach from sampling strategy .(3) All the estimators have a minimum MSE under the sampling strategy and a maximum MSE under the sampling strategy . That is, MSE is the minimum when there is no presence of NR and ME, and MSE is the maximum when there is a presence of both NR and ME.(4) The MSEs in the sampling strategy is significantly larger than the MSEs in the sampling strategy . This shows that ignoring NR and ME can lead to unreliable results.(5) MSEs of all the estimators decrease as sample size increases in all the sampling strategies. For in Table 1 all the estimators have the highest MSE, while in Table 3, it is the lowest and MSEs for lie in between them.(6) It is interesting to see that the MSEs of the proposed estimators in Table 1 are almost similar to or better than the MSEs of existing estimators in Table 3. That is, the proposed estimator with sample size performs almost similar or better than the existing estimators with sample size . Hence, the proposed estimator is more efficient as well as cost-effective than the existing estimators, as the proposed estimators give better results with a small sample than the existing estimators with a large sample.

7. Conclusion

The present article discusses the problem of estimation to reduce the effect of nonsampling error. An estimator of the population mean is proposed, and its properties are discussed. The proposed estimator is studied for the separate and combined effects of NR and ME. A simulation study shows that the proposed estimator is far better than the existing estimators of Grover and Kaur [24], Ekpenyong and Enang [26], Singh and Pal [29], and Gupta and Yadav [31]. The study also shows that the presence of NR and ME had a huge impact on the result by contributing to MSE. And so, ignoring the presence of NR and ME can lead to an adverse conclusion.

Appendix

To prove Theorem 1, let us define the error terms in various sampling strategies.

A. Under Sampling Strategy

, and the expected values of errors are , , , and .

B. Under Sampling Strategy

Let , , , and . We add and and divide both sides by , The authors have . So . Similarly, . And the expected values are , , and .

C. Under Sampling Strategy

, and the expected values of errors are and , and .

D. Under Sampling Strategy

Let , , , . We add and and divide both sides by . We have . So . Similarly, . And the expected values are , , and .

In general, let , , , and , and . . Here represents just blank space i.e., , and . , ; and ; , ; , and .

Proof. The proposed estimator isExpressing equation (A.1) in terms of errors and , we getorwhere .
Expanding using Taylor’s series and neglecting the terms having ’s degree greater than two. We haveAfter simplification, we havewhere and .
Taking expectation on both sides of equation (A.5), we get the bias of asSquaring equation (A.5) and taking expectation on both sides, we get the MSE of asorwhere , , , , and .
Now, we minimize the for and . The optimum value of and are and . Let us put and in equation (A.9). We get the minimum MSE asThat is, for sampling strategy , ; the bias and the minimum MSE of arewhere , , , , and .

Data Availability

No data were used for the study.

Conflicts of Interest

The authors declare that there are no conflicts of interest in this study.

Acknowledgments

The authors are thankful to Dr. Anil Kumar Tewari, School of Philosophy and Culture, Shri Mata Vaishno Devi University, Katra, India and Dr. Bhavneet Singh, Department of Mathematics, Chandigarh University, India for improving the language of the article.