Abstract
The most frequent method for modeling count responses in numerous investigations is the Poisson regression model. Under simple random sampling, this paper offers utilizing Poisson regression-based mean estimator and discovers its associated formula of the mean square error (MSE). The MSE of the proposed estimator is compared to the MSE of traditional ratio estimators in theory. As a result of these evaluations, the proposed estimator has been proven to be more efficient than traditional estimators. Furthermore, the practical results corroborated the theoretical findings.
1. Introduction
The emphasis on using supplemental/auxiliary data to improve estimate precision might be the criterion that distinguishes sample survey theory from other statistical theories. Auxiliary data are used in almost every important phase of a sample survey, including stratification, selection probability creation, and the formula for the population parameter estimator of interest. The assessment of the total or mean of a population is a common goal of most surveys. Utilizing auxiliary data to improve the estimation of the mean population has been considered by numerous authors, such as Upadhyaya et al. [1], Upadhyaya and Singh [2], Koyuncu [3], Shahzad [4], Abid et al. [5], Shahzad et al. [6–8], Zaman and Bulut [9, 10], Ali et al. [11], and Zaman [12].
When a positive correlation between values of an auxiliary variable with the study variable is available, the ratio-type and regression estimators are good choices for mean estimation. In sampling theory, auxiliary variable’s population information, like the coefficient of variation or kurtosis, is frequently employed to improve the efficiency of the estimate for a population mean for ratio estimators. On the other hand, the presence of outliers or extreme values in the data degrades the efficiency of traditional estimation methods. For minimizing the impact of outliers in ratio-type mean estimators, Kadilar et al. [13] presented a robust regression strategy based on the Huber-M approach. For the mean estimation with a simple random sampling (SRS) technique, Oral and Kadilar [14, 15] used two approaches that are modified maximum likelihood and its integrated method. Through combining the ratio estimators given in Zaman and Bulut [9], Zaman [12] created a new class of robust ratio-type estimators. Using robust regression estimates as well as robust covariance matrices under stratified random sampling, Zaman and Bulut [10] proposed novel regression-type estimators. More recently, with the case of sensitive research under SRS, Ali et al. [11] proposed a class of robust regression-type estimators.
Furthermore, with the count data, when the mean is large enough, it is inconvenient to apply a linear regression model to such data, even though the Poisson distribution converges to the normal distribution. Negative prediction values are possible because the linear model links the predicted value with auxiliary (or explanatory or independent) variables. Also, in linear regression, the validity of hypothesis tests is contingent on the assumption of constant variance of the study (or response or dependent) variable. For count data, these assumptions are invalid. As a result, the Poisson regression model is the most commonly employed approach for modeling count data in the applied sciences.
In Poisson regression, the study variable is the number of events that occur at a particular period, with a Poisson distribution given byand its mean and variance are both the same, .
The natural log-likelihood function is defined as follows:
Let be the explanatory variable matrix of order . Then, the relationship between and row of matrix , associating with , iswhere are the coefficients of regression parameters. Such a model is well known as the Poisson regression model. represents the maximum likelihood estimator of , and it may be obtained by differentiating (3) with respect to .
Iterative approaches such as algorithms of Fisher scoring and Newton–Raphson are employed to solve these equations (see Cameron and Trivedi [16], Montgomery et al. [17], and Koҫ [18]).
This article focused on utilizing Poisson regression to estimate the parameter mean of the study variable under . The rest of the article is constructed as follows. First, we introduce the notations and review some existing ratio-type mean estimators with their MSE. Second, we define a novel mean estimator based on Poisson regression and its MSE. Furthermore, the proposed estimator’s condition efficiency is inspected theoretically. Then, using numerical illustrations based on two real datasets, we examine the relative efficiency of the proposed estimator over the adapted estimators. Finally, some concluding remarks are introduced.
2. Some Existing Ratio-Type Estimators of Mean with Their MSE
This section outlines some of the existing population mean estimators that employed SRS and relied on known information on the auxiliary variable’s conventional parameters to improve the mean estimators’ efficiency. Before delving into the specifics of the existing population mean estimators, the notations used are as follows. : population size; : sample size. : sample ratio, . : population means associated with study and auxiliary variables , respectively. : sample means associated with and , respectively. : population coefficients of variation associated with and , respectively. : population coefficient of the kurtosis associated with . : population variance associated with and , respectively. : sample variance associated with and , respectively. : population and sample covariance between and , respectively. : correlation coefficient between and . : coefficient of slope attained by the least squares method, and .
As one of the important estimators for the population mean in SRS, with assumed known, Kadilar and Cingi [19] proposed the following ratio estimators inspired by Sisodia and Dwivedi [20] and Upadhyaya et al. [1]:
Kadilar and Cingi [19] also provided the following formula for the MSE associated with their estimators:where the population ratios can be gained as follows:
Recently, Koҫ [18] offered his idea to improve the aforementioned estimators, introducing new ratio estimators based on Poisson regression as follows:where represents the coefficient of slopes attained by Poisson regression.
Koҫ [18] also provided the following formula for the MSE associated with his suggested estimators that is the same as the MSE equation in (6), but it is evident that should be changed by , whose value is acquired from the Poisson regression model.
In addition, Koҫ [18] demonstrated that his estimators are more efficient than Kadilar and Cingi [19] estimators if any of the following conditions are satisfied:
2.1. Proposed Estimator Based on Poisson Regression and Its MSE
The newly constructed estimator of the mean population can be arranged in the frameworks of Zaman and Bulut [10] and Ali et al. [11]. But we are implementing their frameworks based on Poisson regression as
Furthermore, taking advantage of established results, with some basic algebra, and eliminating tedious or futile calculations, we mention the MSE expression of the proposed mean estimator as
2.2. Efficiency Comparisons
The efficiency condition for the proposed mean estimator may be found by comparing the MSE of the proposed estimator in (12) to the MSE of Haydar’s estimators in (9):
Now
The proposed Poisson mean estimator is more efficient than Haydar’s estimators, if condition is satisfied.
2.3. Numerical Illustrations
Here, we use two real datasets to evaluate the performance of proposed and existing estimators.
Population I (Pop-I). We consider the dataset collected between 2006 and 2010 from the Afyon Respiratory Disease Hospital and the Afyon Environmental Department Air Pollution Unit, which was used in [18]. The number of patients admitted to the hospital on a weekly basis was taken as the dependent variable , and PM10 was taken as the explanatory variable .
Population II (Pop-II). We consider the dataset obtained from TUIK of 81 provinces in 2019 used in [18]. The number of people who died due to traffic accidents was taken as a dependent variable , and the number of motor vehicles was taken as explanatory variable .
For these datasets, using SRS, we consider and . The characteristics of the two populations, as well as the values of population ratios, are given in Tables 1 and 2.
Based on the MSE values of the proposed estimator and reviewed estimators, we calculate the relative efficiency (RE) values of the proposed estimators with respect to reviewed estimators, say , as follows:
Tables 3 and 4 give the outcomes for the relative efficiencies.
After assessing the relative efficiency values, in Tables 3 and 4, it is observed that all values exceed 100. As a result, the proposed Poisson regression estimator is more efficient than the reviewed estimators. On the other hand, it is observed that all relative efficiency values of proposed estimator with respect to Kadilar and Cingi [19] estimators are greater than corresponding values with respect to Haydar estimators, and thus
The proposed estimator and Haydar’s estimators outperform Kadilar and Cingi [19] estimators, which is an expected result due to the employment of the count data in this analysis, and the proposed estimator has the best performance. The consequences of relative efficiency values are also provided graphically in Figures 1–3 .



Furthermore, we investigated the efficiency condition of the proposed estimator as follows.
For Pop-I:
Therefore, the condition is fulfilled.
For Pop-II:
Therefore, the condition is fulfilled.
3. Concluding Remarks
In this article, the mean estimator based on the Poisson regression model has been proposed under simple random sampling. It is found that the proposed estimator produced more efficient results than the ratio estimators proposed in Koҫ [18] and Kadilar and Cingi [19]. For the numerical investigation, the performances of these estimators are compared based on two real populations, and it is seen that the new proposed estimator in terms of relative efficiency is more proficient than the reviewed estimators (see Figures 1–3), as all values surpass 100. As a result, we recommend emphatically using the proposed mean estimator over the other estimators considered in this study for such count data analysis. Furthermore, the estimator developed here can be utilized to calculate new estimates in other count models. The proposed estimator can also be derived in future studies using the concepts of Zaman and Bulut and Ali et al. [10, 11].
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that there are no conflicts of interest.