Abstract

A large body of the published literature in nuclear image analysis do not evaluate their findings on an independent data set. Hence, if several features are evaluated on a limited data set over‐optimistic results are easily achieved. In order to find features that separate different outcome classes of interest, statistical evaluation of the nuclear features must be performed. Furthermore, to classify an unknown sample using image analysis, a classification rule must be designed and evaluated. Unfortunately, statistical evaluation methods used in the literature of nuclear image analysis are often inappropriate. The present article discusses some of the difficulties in statistical evaluation of nuclear image analysis, and a study of cervical cancer is presented in order to illustrate the problems. In conclusion, some of the most severe errors in nuclear image analysis occur in analysis of a large feature set, including few patients, without confirming the results on an independent data set. To select features, Bonferroni correction for multiple test is recommended, together with a standard feature set selection method. Furthermore, we consider that the minimum requirement of performing statistical evaluation in nuclear image analysis is confirmation of the results on an independent data set. We suggest that a consensus of how to perform evaluation of diagnostic and prognostic features is necessary, in order to develop reliable tools for clinical use, based on nuclear image analysis.