Abstract

Finite mixture model (FMM) is being increasingly used for unsupervised image segmentation. In this paper, a new finite mixture model based on a combination of generalized Gamma and Gaussian distributions using a trimmed likelihood estimator (GGMM-TLE) is proposed. GGMM-TLE combines the effectiveness of Gaussian distribution with the asymmetric capability of generalized Gamma distribution to provide superior flexibility for describing different shapes of observation data. Another advantage is that we consider the spatial information among neighbouring pixels by introducing Markov random field (MRF); thus, the proposed mixture model remains sufficiently robust with respect to different types and levels of noise. Moreover, this paper presents a new component-based confidence level ordering trimmed likelihood estimator, with a simple form, allowing GGMM-TLE to estimate the parameters after discarding the outliers. Thus, the proposed algorithm can effectively eliminate the disturbance of outliers. Furthermore, the paper proves the identifiability of the proposed mixture model in theory to guarantee that the parameter estimation procedures are well defined. Finally, an expectation maximization (EM) algorithm is included to estimate the parameters of GGMM-TLE by maximizing the log-likelihood function. Experiments on multiple public datasets demonstrate that GGMM-TLE achieves a superior performance compared with several existing methods in image segmentation tasks.

1. Introduction

Segmenting an object from its background has an important role in machine learning and computer vision [1]. In recent years, several unsupervised image segmentation algorithms have been presented [2, 3]. Statistical approach, particularly finite mixture model (FMM), is one of the most widely known ones [4]. This can be used to model arbitrary univariate or multivariate observed data. In particular, modelling the probability density function of pixel attributes with Gaussian mixture model (GMM) has proved successful in segmentation tasks [5]. It is mainly because the parameters of GMM can be easily estimated by maximizing the maximum likelihood (ML) of the observed data using the expectation maximization (EM) algorithm. However, there remain limitations preventing GMM from achieving improved performance. The first challenge is sensitivity to noise, which is caused by the independence of the spatial relationship of pixels during parameter learning. The second challenge derives from the difficulty in fitting asymmetric observed data. In addition to these limitations, GMM is sensitive to outliers and can lead to excessive sensitivity to a small number of data points.

Recent studies have attempted to overcome the above disadvantages. The existing schemes can be categorised as follows.

(1) Schemes Based on Markov Random Field (MRF). A wide variety of approaches, especially Markov random field, have been introduced for resisting noise. These schemes utilize MRF smoothness prior to modelling the joint prior distribution of pixel labels; hence, the spatial information of the pixels is considered via the contextual constraints of the neighbouring pixels [6, 7]. Therefore, the MRF-based mixture model has stronger ability to resist noise. However, MRF suffers from the fact that parameter estimation is difficult and suffers high computational complexity. Recently, mean template is employed along with a spatially varying mixture model to alleviate the influence of noise in image segmentation [8]. It is a natural approach to prevent noise because it automatically filters noise using a mean filter.

(2) Schemes Based on Asymmetric Probability Distribution. In general, GMM does not fit well if the shapes of the observed data are asymmetric [9]. Indeed, in many real applications, the intensity distribution of the observed data is not symmetric. Thus, it would seem that FMM with asymmetric distribution such as Gamma distribution [3], Weibull distribution [10], and Rayleigh distribution [11] could overcome this limitation. Another typical case is to obtain the asymmetric distribution via the linear weighted aggregative method using two or more symmetric probability distributions. One typical example is asymmetric Student’s mixture model (NSMM) [12], where each component density is modelled with multiple Student’s distributions. Another example of this case is the Bayesian-bounded asymmetric mixture model (BAMM) [13], which was developed by a subset of the authors [12] and other coauthors, for unsupervised image segmentation. Each component of their approach can model different shapes of observed data with different bounded support regions. A close relative of this framework involves a bounded asymmetrical Student’s mixture model [14]. Peculiarly, we note that the mixture of two or more different distributions has caused great concern and yet has developed rapidly in recent years. Typical algorithms include Zhou et al.’s statistical model [15], which is a mixture of -distribution and lognormal distribution. This also includes De Angelis et al. [16] who offered a robust time interval measurement method based on a Gaussian-uniform mixture model. Browne et al. [17] incorporated a multivariate Gaussian and uniform distributions as the component density, which allowed for superior mixture possessing. These methods demonstrate a competitive performance in fitting different shapes of observed data.

(3) Schemes Based on Trimming Method. In general, for GMM-based algorithms, the parameters are estimated by the ML estimator through the EM algorithm. However, the ML estimator is overly sensitive to outliers and GMM cannot address outliers properly. Therefore, outliers seriously deteriorate the performance of Gaussian-based clustering algorithms. To overcome this shortcoming, a common approach is to consider a mixture model with Student’s distribution (SMM), which provides a longer-tailed alternative to the Gaussian distribution [18]. Therefore, SMM is more robust to outliers than the GMM for heavier tails. Another model-based method, which presents a theoretically well-based segmentation criterion in presence of outliers, is the trimming method [19]. The main principle in trimming is to locate and discard the outliers from the likelihood function. Segmentation results benefit from the trimming approach. Müller and Neykov proposed the fast trimmed likelihood estimator (FAST-TLE) [20]. Galimzianova et al. developed the confidence level ordering trimmed likelihood estimator (CLO-TLE) [21]. However, they do not function effectively in noisy samples, especially when each group has a different size of observations.

Motivated by the aforementioned considerations, in this paper, we present a two-step procedure (GGMM-TLE), beginning with a component-based confidence level ordering trimmed likelihood estimator. Because the majority of observed data contains outliers, it is necessary to discard these in a previous step before robustly estimating parameters. As a new algorithm, the proposed technique considers the components with lower mixture weights. This avoids eliminating the samples belonging to the components with a small number of observations as the outliers. Then, we propose a novel finite mixture model based on a mixture of generalized Gamma and Gaussian distributions (GGMM). The proposed GGMM with Markov random fields has high flexibility and can be used to fit the asymmetric data owing to the introduction of the asymmetric generalized Gamma distribution. Moreover, we theoretically prove the property of identifiability of the GGMM through the strategy presented by Atienza et al. [22, 23], which indicates that the GGMM’s mixture representation is unique. This property is crucial to ensure that the parameter estimation problem is well posed. Therefore, the proposed algorithm can be effectively applied for segmenting images. Moreover, by imposing spatial smoothness constraints among neighbouring pixels using MRF, the neighbouring pixels should have the same label. Therefore, the proposed model reduces the segmentation sensitivity to noise in a still image. We demonstrate through simulation study that the proposed framework is superior to other related methods in terms of the misclassification ratio and Dice similarity coefficient.

The remainder of this paper is organized as follows. Section 2 introduces the proposed mixture model in detail. In Section 3, we prove the identifiability of the proposed mixture model. The process of parameter learning is described in Section 4. The ordering method for likelihood trimming is reported in Section 5. Section 6 provides the experimental results and analysis. Finally, we conclude with a discussion in Section 7.

2. Model Formulation

Assume a set of data , where each denotes an observation at the th pixel of an image. is the total number of pixels in an image. The proposed mixture model assumes that the density function at pixel is given bywhere is the complete parameter set of the proposed mixture model; denotes the number of mixture components. The prior represents the probability that the observation belongs to the th label and is called the weighting factor; they satisfy the following constraints:In this paper, we set ; thus, is the generalized Gamma distribution defined bywhere is the parameter set of generalized Gamma distribution, is the power parameter, is the shape parameter, is the scale parameter, and denotes the Gamma function. The probability density function of Gaussian distribution is defined aswhere is the parameter set of Gaussian distribution, is the mean, and denotes the covariance.

According to Bayesian rules, we express the posterior probability density function of the proposed model as To train the proposed mixture model, based on the above formulation, we define the following maximum a posteriori log-likelihood function: The Markov random field based on Gibbs distribution can be characterized by where and are temperature and normalizing constants, respectively. In the proposed approach, a new energy function of the following form is chosen to enforce spatial smoothness: where where is the neighbourhood of the th pixel including the th pixel itself: for example, or . denotes the posterior probability. Eventually, we can formulate the segmentation problem as a maximum a posteriori problem using the log-likelihood function as

The above scheme contains two parts, where the first denotes the proposed mixture model and the last is the Markov model. In general, the EM algorithm is an efficient framework for estimating the mixture model parameters.

3. Identifiability of the Proposed Mixture Model

This section discusses the property of identifiability of the GGMM. This property implies that GGMM can only be expressed by specific components. Obviously, this property is important for finite mixture model because it can guarantee the estimation procedures of the parameter set to be well defined [3, 22]. The property of identifiability is described as follows.

We defined the following set: is the family of proposed distributions, where , , , . In this study, is the generalized Gamma distribution and is the Gaussian distribution; the parameters of the proposed distribution are mutually independent. The set of the proposed mixture model with satisfying (2) is

Theorem 1. The property of identifiability of means that for any two mixture models , that is, if , then , and .

Proof. According to [22], we prove that, for a linear transform, with domain . Let . For a given point and any two proposed distributions , there exists a total order on that satisfies Given the expression of the linear transform is as follows:where is the proposed density function . Let and ; then, . Obviously, if and , we can obtain . According to (15), we havewhere and . To facilitate the proof procedure, this study utilizes Stirling’s formula as follows: Thus, we haveThe sign “” indicates that the expressions on both sides are equivalent up to constant term when . Hence, for , we havewhere is a constant. From (19), we can derive , or , or , which is apparently a total order. Analogously, we have where and . For , we haveFrom (21), we can determine that , or , which is clearly a total order. Overall, we have ; there exists a total order on . Hence, we can draw the conclusion that the GGMMs are identifiable.

4. Parameter Learning

The main task of this section is to estimate the complete parameter set. In general, the EM algorithm provides an efficient scheme for unsupervised segmentation using iterative updating and guarantees that the log-likelihood function converges to a local maximum. Considering the complexity of (10), it is difficult to apply the EM algorithm directly for maximizing the log-likelihood function (10). Therefore, we employ Jensen’s inequality by defining the two hidden variables and , which are, respectively,Clearly, and satisfy the constraints and . Using Jensen’s inequality, one has ; thus, the log-likelihood function (10) can be rewritten as follows:Thus, we can define the following new objective function in terms of Jensen’s inequality.To realize clustering, we must maximize the log-likelihood function in (10), which is equivalent to maximizing the objective function in (25). In particular, to estimate the prior probability , we take the partial derivative of the objective function in (25) with respect to , yieldingwhere is the Lagrange multiplier in consideration of the constraint ; we haveSimilarly, to estimate the weighting factor , we take the partial derivative of the objective function in (25) with respect to where is the Lagrange multiplier in consideration of the constraint ; we haveIn the following, we estimate the power parameter . We calculate the partial derivative of the objective function (25) with this power parameter as follows:The solution of yields the estimates of as follows:where . Then, to derive the solution of the shape parameter , we must calculate the partial derivative with respect to it. We haveIt is clear to see that the solution yields the updates for shape parameter bywhere is the Digamma function; can be calculated by solving (33) via the bisection method [23]. In the same fashion, to obtain the estimate of the scale parameter , we must derive the partial derivative of over it.Equating to zero, we can obtain the update formulas for scale parameter byBy calculating the partial derivative of the objective function in (25) with parameter set , we can obtain the estimation of the parameters mean and covariance .Eventually, the final updates for these two parameters can be obtained byAt this point, the parameter learning procedure is complete.

5. Ordering Method for Likelihood Trimming

For observed data with heavy outliers, it is preferred to discard the outliers and to estimate the parameters of the proposed mixture model using the remaining data. Assume that is a sample with observations, is the trimming fraction, and is the subsample with a size . Theoretically, the trimming fraction should be higher than the real outlier fraction value. After cutting the outliers, we estimate the model parameters by maximizing the objective function in subsample . The most important step is to discard the outliers and select the subsample. This requires a specific ordering for all of the observations in the sample. Typically, the number of outliers is unpredictable. Thus, it is important for the proposed model to avoid allowing observed data that belongs to the labels with a small number of observations falling into outliers. This study presents an effective component-based confidence level ordering method. In the proposed GGMM-TLE, we do not calculate the density function value as in FAST-TLE [20], for every single observation. Rather, we only utilize the concept of confidence level for these observations to eliminate the effects of mixture weights and sample scales. Combined with the posterior probability in (22), we can order the observations that belong to the same group separately. Thus, it is more reasonable for ordering the observation with GGMM-TLE. Specifically, we derive the following increasing inequality based on component-based confidence level ordering:where ; ; is the th component determined by the posterior probability. is the number of observations belonging to the th component. Clearly, satisfies . is the ordering of sample indices of the th component. By sorting and discarding each component individually with the same trimming fraction , we can obtain the subsample of each component , where . Hence, the total subsample can be expressed by the union of the subsample of each component ,Finally, the parameters of the proposed mixture model can be estimated with subsample and objective function . In the proposed GGMM-TLE, by evaluating the interval integral rather than log-likelihood value of the observations, we can obtain superior performance compared with the classical FAST-TLE. This is because we can order the observations within every individual label to retain the consistency of trimming proportion of each label. Therefore, regardless of the mixture weights and sample scales, all observations of each label are equally considered. Finally, combined with the steps of GGMM in Section 2, we summarize the procedures of GGMM-TLE as follows.

Step 1. Input the trimming fraction . Initialize the parameter set , where and .

Step 2. Based on the current parameter set , evaluate the posterior probability using (22), compute the variable using (23), and classify the observations.

Step 3. Perform component-based confidence level ordering using (39) to obtain subsample .

Step 4. Compute the objective function in terms of (25). If , continue to Step 5; else, increase the value of the trimming fraction below the predefined threshold and obtain new subsample until the following condition is satisfied: ; set . If the condition is not satisfied, terminate the procedure.

Step 5. Update the prior probability and weighting factor using (27) and (29), respectively. Compute the power parameter , shape parameter , scale parameter , mean parameter , and the covariance parameter by solving (31), (33), (35), (37), and (38), respectively.

Step 6. Maximize the objective function using (25) and obtain the new parameter set . If the termination condition is satisfied, end the iterations. Otherwise, set and return to Step 2.

6. Experimental Results

This section experimentally evaluates the proposed GGMM-TLE by considering the problem of real-world image segmentation and compares GGMM-TLE with other related algorithms. All algorithms are initialized using -means. The experiments were developed in MATLAB R2012b and were executed on a personal computer with Intel(R) Core(TM) I7-6500U CPU @ 2.5 GHz, 8 GB RAM, 64-bit. To obtain an objective evaluation of the proposed method, this paper uses two measure criteria: the misclassification ratio (MCR) [24] and Dice similarity coefficient (DSC) [25]. The former has the following form:MCR is widely used in the literature to evaluate segmentation performance. For MCR, the smaller the value of the MCR, the higher the accuracy of the segmentation. The popular overlap-based metric DSC is also employed to evaluate the proposed mixture model.where denotes the shape of the automatic segmentation and indicates the shape of the manual segmentation obtained from the algorithm output. The range of DSC is from zero to one, with one denoting ideal segmentation and zero indicating poor segmentation.

6.1. Test of the Proposed Trimming Approach

The first experiment presented herein validates the behaviour of the proposed GGMM-TLE. For this purpose, we generated three labels of inlier observations and one label of outliers. The inlier observations consisted of 10,000 points from a 3-component bivariate GMM with prior probability . The means and variances of this bivariate GMM areLabels 1, 2, and 3 have 2,500, 3,500, and 4,000 points, respectively. Random noise with 1,000 points was added in the rectangle. This random noise is considered as outliers. For comparison, apart from the proposed approach, we also included the performance of FAST-TLE [20] and CLO-TLE [21]. The trimming fraction varied in the range . The segmentation results of the different methods are presented in Figure 1. From Figure 1, we can observe that the classical GMM was sensitive to outliers resulting in poor clustering performance in terms of visual interpretation. It is clear that the performance of the FAST-TLE method was less influenced by the outliers. However, it was also unstable, especially when the trimming fraction was high. The CLO-TLE ordering strategy exhibited a higher stability versus the previous two algorithms with the use of confidence level ordering. However, there remained misclassified outliers demonstrating that the fitting influence of CLO-TLE was not ideal. Conversely, we determined that the best performance was with the proposed GGMM-TLE; this is because every observation of the individual groups was considered. This figure indicates that the GGMM-TLE can extract clear point accumulations from noise data.

6.2. Segmentation of Noise-Degraded Images

To demonstrate the feasibility of GGMM-TLE, the following experiment used four real-world images (“Boat,” “Cow,” “House,” and “Man”) from the semantic boundaries dataset (SBD) [26] for comparison. These images were segmented into three labels. All these images were contaminated by Salt and Pepper noise with intensity . Figure 2 presents the visualization of the segmentation task with trimming fraction 0.2, where the second, third, and fourth columns correspond to the FAST-TLE, CLO-TLE, and GGMM-TLE algorithms, respectively. Figure 2 shows the detailed parts of the corresponding segmentation results using different approaches. Note that the GGMM-TLE eliminates obviously the noise as predicated. We demonstrate the log-likelihood function versus the number of iteration under trimming fractions 0.2 for the different test images in Figure 3. It can be clearly observed that the log-likelihood functions of FAST-TLE and CLO-TLE are similar to that of the proposed method. However, a closer inspection of the iteration ranges indicates that the GGMM-TLE method can moderately improve the convergence rate. When the iterations is low (), the convergence rate with GGMM-TLE is the biggest one. In the general case, the GGMM-TLE method converges after five iterations. In Figure 4, the MCR plots of each test image against different trimming fractions are displayed. This figure implies that the proposed scheme achieved superior segmentation accuracy as the basic scheme because the MCR of the GGMM-TLE method was the least of all test images.

The proposed algorithm was also assessed on a clinical MR image to label the white mass (WM) and grey mass (GM). For this purpose, a real MR image, slice 42 of IBSR2 from the IBSR dataset [27], was randomly selected to evaluate the performance of the proposed GGMM-TLE against FAST-TLE and CLO-TLE. Salt and Pepper noise with intensity and was considered in our experiment. Figure 5 presents the performance of these methods under Salt and Pepper noise and different trimming fractions. It is clear that FAST-TLE did not demonstrate improved results for heavier outliers in the segmentation task. The CLO-TLE tended to achieve superior performance with an increase of the trimming fraction and could maintain its stability and effectiveness. A closer inspection of Figure 5 indicates that the segmentation accuracy of GGMM-TLE was visually higher than the other methods. It is due to the fact that the proposed GGMM-TLE utilizes the advantages of confidence level for these observations so that the effects of mixture weights and sample scales are eliminated. Therefore, with the increasing of the trimming fractions, GGMM-TLE exhibits better stability to outliers than CLO-TLE. As shown in Figure 5, this is especially apparent in GGMM-TLE with high trimming fractions. Figure 6 displays the evaluation results using the MCR metric. It can be observed that the GGMM-TLE had the lowest MCR value; thus, its segmentation results were superior to those of FAST-TLE and CLO-TLE.

Further, we executed these algorithms 20 times, each time with different initialization. Then, we computed the (average) performance in terms of the number of correctly classified data points and the DSC for this MR image, including white matter and grey matter. Table 1 lists the mean values and the standard deviations of the DSC obtained from 20 executions. The experiment results demonstrate that the accuracy was moderately improved compared with the other methods.

To assess the robustness of the proposed GGMM-TLE at different levels of noise, a set of real-world images from the Berkeley image dataset [28] was considered to compare the performance of GMM [29], SMM [30], GMM [31], NSMM [12], and ACAP [8]. The ground-truth information was freely obtained from the website [31]. This was used for algorithm performance evaluation. The experiment was performed with noisy version images by adding Gaussian noise (zero mean, 0.01 variance) and Salt and Pepper noise () to the images, as indicated in the first row of Figures 7 and 8. The evaluated algorithms were initialized using the -means algorithm. The number of label was set according to human visual inspection. Figures 7 and 8 exhibit the results of image segmentation using the different methods. Owing to the application of a mean filter, we can observe that the performance of ACAP was superior to GMM, SMM, GMM, and NSMM. The results generated by the ACAP achieved similar results to GGMM-TLE; however, its performance was impaired when there was an abundance of rich details, for example, in test image 241004 (the sixth row of Figure 7). The GGMM-TLE provided a moderately improved performance under different noisy conditions and eliminated the influence of widely spread noise data. This characteristic is endemic to MRF and trimmed likelihood estimator. The resulting DSC is reported in Tables 2 and 3, providing a quantitative comparison among the algorithms. The DSC and standard deviation indicate that the proposed method outperformed the other methods by preserving the highest DSC.

To further demonstrate the goodness of GGMM-TLE against different noise, in Figure 9, we display the mean values and standard deviations of the MCR obtained from twenty runs on two Berkeley test images (24063 and 35010) under different noise environments. Considering the MCR, on average, the ACAP effectively eliminated the effects of noise during the segmentation processing and demonstrated acceptable segmentation results. We determined that classical GMM, SMM, and GMM were severely influenced by Gaussian noise and could not accurately separate a region from the background. In the majority of cases, the NSMM approach was superior to SMM and GMM, yet continued to be influenced by varying degrees of Salt and Pepper and Gaussian noise. As expected, compared to the other algorithms, the GGMM-TLE was stable and achieved the best segmentation results according to the quantitative criterion.

7. Concluding Remarks

In this paper, a robust estimation of the proposed GGMM-TLE using a trimmed likelihood estimator for real-world image segmentation was proposed. GGMM-TLE with MRF implements a mixture of generalized Gamma and Gaussian distributions.

The main contribution of this paper is the presentation of an asymmetric finite model GGMM-TLE based on MRF. With this model, we have high flexibility to fit different shapes of observed data. Further, this study discussed the property of identifiability of the proposed mixture model, guaranteeing that the estimation procedure for the parameters was correctly developed. Then, to ensure that GGMM-TLE was robust against heavy outliers, the paper offered an effective method to discard the outliers in advance, and therefore, GGMM-TLE demonstrated superior performance under modelling with samples contaminated with unknown outliers. Finally, combined with MRF, GGMM-TLE considered the spatial relationship between neighbouring pixels and demonstrated a stronger ability to resist different noise. The segmentation results of synthetic data and real-world images confirmed that the proposed method demonstrated superior competitiveness. The main limitation of this algorithm is that the segmentation task requires component-based confidence level ordering, which increases the computational cost.

As a future work, one direction is to obtain other finite mixture models by testing different probability density functions. Another possible direction is to extend the presented method to a higher dimension in a straightforward manner such as fMRI time-series clustering. We plan to address these topics in a separate paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant no. 61371150.