Abstract
In introductory statistics texts, the power of the test of a one-sample mean when the variance is known is widely discussed. However, when the variance is unknown, the power of the Student's -test is seldom mentioned. In this note, a general methodology for obtaining inference concerning a scalar parameter of interest of any exponential family model is proposed. The method is then applied to the one-sample mean problem with unknown variance to obtain a 100% confidence interval for the power of the Student's -test that detects the difference . The calculations require only the density and the cumulative distribution functions of the standard normal distribution. In addition, the methodology presented can also be applied to determine the required sample size when the effect size and the power of a size test of mean are given.
1. Introduction
Let be a random sample from a normal distribution with mean and variance . As presented in any introductory statistics text, such as Mandenhall et al. [1, page 425], a confidence interval for is where , , and is the percentile of the distribution with degrees of freedom. Moreover, for testing the null hypothesis will be rejected at significance level if where is the percentile of the Student's distribution with degrees of freedom. Although the power of this test is rarely discussed in introductory statistics texts, Lehmann [2] proved that the probability of committing Type II error of a size test with the hypotheses stated in (1.2) is where is the effect size and is the cumulative distribution function of the noncentral distribution with degrees of freedom and noncentrality . Note that the calculation of involves the unknown . A naive point estimate of is where . Thus, the corresponding point estimate of the power of the size test that detects the difference is .
In Section 2, a general methodology is proposed for obtaining inference concerning a scalar parameter of interest of an exponential family model. Applying the general methodology to the one-sample mean problem with unknown variance, a confidence interval for is derived. This interval estimate will depend only on the evaluation of the density and the cumulative distribution functions of the standard normal distribution. The methodology can also be used to determine the required sample size when the effect size and the power of a size test are fixed. Numerical examples are presented in Section 3 to illustrate the accuracy of the proposed method. Finally, some concluding remarks are given in Section 4.
2. Confidence Interval for the Power of the Test and Sample Size Calculation
From (1.1), for a given value, a confidence interval for is Hence, from (1.4), the corresponding confidence interval for is Finally, a confidence interval for the power of a size test that detects the difference is
Evaluating (2.3) requires the cumulative distribution function of the noncentral distribution, which is generally not discussed in introductory statistics texts. In statistics literature, various approximations of have been proposed. For the rest of this section, a simple and accurate approximation of will be derived.
Let be identically independently normally distributed random variables with mean and variance . It is well known that and are independently distributed as normal with mean and variance and with degrees of freedom, respectively. Let where denotes the percentile of the standard normal distribution, then follows a noncentral distribution with degrees of freedom and noncentrality .
Now, consider a sample from a normal distribution with mean and variance . Let the parameter of interest be where and , then the log-likelihood function can be written as where . Denote that
The overall maximum likelihood estimate (MLE) of , is obtained by solving , and the determinant of the observed information matrix evaluated at the overall mle is The constrained mle of at a fixed , , where is obtained by solving . Moreover, the determinant of the observed nuisance information matrix evaluated at the constrained mle is Hence, the signed log-likelihood ratio statistic is
It is well known that is asymptotically distributed as the standard normal distribution with rate of convergence . Hence, can be approximated by where is the cumulative distribution function of the standard normal distribution. It is important to note that is reparameterization invariant.
In statistics literatures, various likelihood-based small sample asymptotic methods have been proposed. In particular, if the model is a canonical exponential family model and the canonical parameter is , Lugannani and Rice [3] derive where is the density function of the standard normal distribution, is defined in (2.11), and takes the form This approximation has a rate of convergence . It is important to note that is reparameterization invariant whereas is not.
For a general exponential family model with canonical parameter and a scalar parameter , to obtain inference concerning based on the Lugannani and Rice (1980) [3] method, remains unchanged as in (2.11) because it is reparameterization invariant, but has to be re-expressed in the canonical parameter scale, scale. To achieve this, let and be the derivatives of with respect to and , respectively. Denote to be the row of that corresponds to , and is the square length of the vector . Let be a rotated coordinate of that agrees with at . Then can be viewed operationally as the scalar parameter of interest in scale.
Since , by the chain rule in differentiation, we have Hence, an estimated variance for in scale is Thus, , as defined in (2.13) and expressed in scale, is Therefore, can be obtained from (2.12) with and being defined in (2.11) and (2.17), respectively.
Note that the model being considered is an exponential family model with canonical parameter From (2.17), we have where Moreover, by obtaining the inverse of , we have Hence, from (2.14), we can obtain Thus, from (2.16), we have Finally, can be approximated from (2.12) with rate of convergence .
By reindexing all the necessary equations, we have where and are the density and cumulative distribution functions of the standard normal distribution, and Finally, with a predetermined effect size and power of a size test, the sample size can be obtained by iterations.
Note that DiCiccio and Martin [4] derived an asymptotic approximation of marginal tail probabilities for a real-valued function of a random vector where the function has continuous gradient that does not vanish at the mode of the joint density of the random vector. Applied to the noncentral distribution problem, the results are identical. Nevertheless, the approach of DiCiccio and Martin [4] is quite different from the proposed method. More specifically, DiCiccio and Martin [4] worked directly from the log density and treated the parameters as fixed whereas the proposed method works from the log-likelihood function where the data are observed.
3. Numerical Example
Figure 1 plots the power function of a one-sample test against the effect size for and . The exact method is obtained from the built-in cumulative distribution function of the noncentral distribution in . From the plot, it is clear that the signed log-likelihood ratio does not provide satisfactory results. The proposed method and the built-in function of are very close even when the sample size is 2. It is interesting to note that the built-in function of has a discontinuity point in the case.
(a) n=2, α = 0.05
(b) n=3, α = 0.05
(c) n=2, α = 0.01
(d) n=3, α = 0.01
Now, consider the data set recorded in Mandenhall et al. [1, page 103]
For testing the hypothesis the power function of a size test and the corresponding 95% confidence bands are plotted in Figure 2. From Figure 2, the approximated power at is 0.5764. Furthermore, the 95% confidence interval for the power of the above test when is . At first, the confidence interval seems too wide. However, by examining (2.3), the result is not too surprising because (2.3) depends on (1.1). Since distribution is a skewed distribution, by defining the confidence interval of to have equal tail coverage, (1.1) is a wide interval and hence (2.3) is a wide interval.
Finally, to illustrate the determination of the sample size, let the effect size be 0.8, and at , let the power be at least 0.9, then the proposed method gives with power .
4. Summary and Conclusion
The confidence interval for the power of the size Student's -test detecting the difference is presented. The major advantages of the presented confidence interval are that it depends only on the evaluations of the density and cumulative distribution functions of the standard normal distribution and that it is extremely accurate. The source code is available from the author upon request.
As a final note, the proposed method can be applied to any distribution that belongs to the exponential family model with known canonical parameters. Although the method depends on the correct specification of the underlying distribution, Fraser et al. [5] examined a special case when the error distribution of the regression model is misspecified and the likelihood-based method still gives results that are more accurate than the existing Central Limit Theorem-based approximations.