Bayesian Methods for Estimation the Parameters of Finite Mixture of Inverse Rayleigh Distribution

Ali, Fadhaa

doi:https://doi.org/10.1155/2023/2912584

Mathematical Problems in Engineering

On this page

Abstract Introduction Methods Conclusion and Discussion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2023 | Article ID 2912584 | https://doi.org/10.1155/2023/2912584

Bayesian Methods for Estimation the Parameters of Finite Mixture of Inverse Rayleigh Distribution

Fadhaa Ali¹

Academic Editor: Emilio Gómez-Déniz

Received25 Sept 2022

Revised11 Nov 2022

Accepted23 Dec 2022

Published11 Jan 2023

Abstract

Methods of estimating statistical distribution have attracted many researchers when it comes to fitting a specific distribution to data. However, when the data belong to more than one component, a popular distribution cannot be fitted to such data. To tackle this issue, mixture models are fitted by choosing the correct number of components that represent the data. This can be obvious in lifetime processes that are involved in a wide range of engineering applications as well as biological systems. In this paper, we introduce an application of estimating a finite mixture of Inverse Rayleigh distribution by the use of the Bayesian framework when considering the model as Markov chain Monte Carlo (MCMC). We employed the Gibbs sampler and Metropolis – Hastings algorithms. The proposed techniques are applied to simulated data following several scenarios. The accuracy of estimation has been examined by the average mean square error (AMSE) and the average classification success rate (ACSR). The results showed that the method was well performed in all simulation scenarios with respect to different sample sizes.

1. Introduction

Applications of the finite mixture model have increasingly been considered in many fields when data does not belong to a specific distribution. This can result from mixing several subpopulations in an unknown proportion to form one large population. Therefore, the commonly available distributions cannot be fitted to the data under study. As an example, the mixture model can be fitted to data that comes from many fields, such as engineering, physical sciences, biological sciences, chemical sciences, and other fields.

A mixture of normal and Laplace distributions is used to model wind shear data [1, 2]. Moreover, the crime and justice data also take part in the applications of the mixture model [3]. The failure time be modeled by the mixture of exponential distribution by which the whole population is divided into several subpopulations [4]. In the studies of image compression and pattern recognition, the Gaussian mixture model is used to improve the likelihood of the model by split-and-merge operations. This is of course depending on the nature of the data, as far as inferring some characteristics about it is concerned [5]. A mixture of t-distributions is proposed to model a set of data containing sets of observations with longer than normal tails [6]. In the blood sciences, the mixture model also played an important role in classifying individuals based on estimated density functions for each individual. In doing so, a classifier was built to classify new unlabeled histograms into normal and iron deficient classes of red blood cells (RBCs) [7].

A mixture model can be framed from densities following different families of distribution or the same ones. In either way, it is useful as a predicting or confirmatory tool. In fact, the classical method for inferring the model is by maximizing the likelihood (ML) that is used in a wide range of cases [8, 9]. However, the problematic issue is the unknown number of allocations and observation membership by which the accuracy of estimation is undermined. These issues have been tackled by many proposed techniques over the years. Testing the number of components to determine the exact structure of the mixture model was a vital tool to deal with the above problem [10, 11]. In much the same direction, the most used way of handling the above issues is adding a latent variable that results in a complete data log likelihood rather than using the incomplete one [12, 13], and then estimating the model parameters by the expectation maximization(EM) algorithm [14–16].

Moreover, the component indicator labels can be adopted to analyze dependent data by employing the Markov hidden model, which has been considered as an extension of the mixture model. This can clearly be seen in a study proposed to fit a mixture of auto-regressive of dependent data [17], or to model of nonlinear time series [18].

Bayesian inference is conducted in many studies in which inferring the model is applied by assuming prior information about the parameters and then using posterior simulation via Markov chain Monte Carlo (MCMC) methods [19, 20]. This framework of estimation can avoid many difficulties upon its applications that involve (ML) estimation [21–23].

In the lifetimes of certain engineering processes, some methods have been proposed to introduce mixtures of inverse Rayleigh distribution, one of which is used to model the heterogeneity existing in the lifetime processes by the Bayesian inference of the mixture model using noninformative (the Jeffreys and the uniform) and informative (gamma) priors by considering censoring data as an example for their work [24]. In the same direction, another method has been introduced to fit a 3-component mixture of the inverse Rayleigh distributions under a Bayesian perspective to a censored sampling scheme by the use of noninformative and informative priors under different loss functions [25, 26] introducing a general form of inverse Rayleigh distribution which is known as exponentiated inverse Rayleigh distribution (EIRD), which extends a more flexible distribution for modeling life data.

In this paper, we derived the posterior distributions of a mixture of inverse Rayleigh distributions. We then estimated model parameters by the use of Gibbs Sampler and the Metropolis ̶Hastings algorithms. The two algorithms are used with this model and noncensoring data for the first time. Classifying observations according to their groups is also considered in this article, which has not been considered in the literature till the date of this work. Thereafter, the accuracy of the model inferring has been examined by simulation based on different scenarios and a number of components. The success rate of clustering is also calculated accordingly. The rest of the paper is organized to give the theoretical details of the theoretical part in Section 2. In Section 3, the method is applied to simulated data by using several scenarios. In Sections 4 and 5, discussion and conclusion are involved to show the complete picture of the method and its results.

2. Methods

A mixture of inverse Rayleigh distributions is used to model a variety of data that come from more than or equal two-components. Data can be a failure time or working time of an equipment. Therefore, inference about its parameters is vital as far as maintenance is concerned. The probability density function of the inverse Rayleigh distribution is written as follows:

In many lifetime scenarios, data can come from more than one component, and then the known distributions become inaccurate to fit the data. In such a case, the mixture model can be used to handle the problem. When it comes to lifetime data, a finite mixture of the Rayleigh distribution is used to model the data [27]. The probability density function (p.d.f) of the mixture of the Rayleigh distribution with components can be written as follows:where , .

3. Inferential Method

The traditional way of estimating the parameters of a mixture model is by maximizing the likelihood function of the model. The incomplete data likelihood function of the k-components mixture model is defined as follows:where .

To this end, an unknown number of components is considering as a hidden state that needs to be defined by a latent vector , whereas . Here, and is defined as follows:

Hence, the complete data likelihood function is written as follows:

Let represent the set of all items of the component j, where

Simplifying the equation (4) leads towhere .

Many methods have been developed to estimate the parameters of equation (5). Generally, the most popular one is the expectation-maximization algorithm that is used with some mixture models [28]. In this paper, the Bayesian framework is developed to estimate the model parameters. In so doing, proper priors are used to derive the posterior distribution of the parameters using the Bayesian framework. The next section consists of the details of such methods.

3.1. Bayesian Framework

To apply Bayesian techniques, we have to consider proper priors for the parameters. To do so, let , and the probability density function is written as follows:

We also letand the probability density function is written as follows:

The joint probability density function is formulated by multiplying equations (5), (6), and (10), which results in

Simplifying the above equation will lead to

Hence, the posterior distribution of is

In the same way, the posterior distribution of is as follows:

The probability that the item i belongs to the component j can be determined by the below formula:where , and can be calculated by the mathematical expectations of the posterior distributions. The observation membership is then generated from the multinational distribution according to .

is also updated according to the estimated .

Note that the above framework is based on known allocation . Therefore, we use one of the well-known clustering algorithms to give us the possible number of mixture components and their item memberships . For this framework, we used the K-means algorithm [29] to determine the initial allocation, where k = 1, 2, …., K. Afterward, we assessed the appropriate number of components by calculating the Bayesian information criteria [13] according to the formula

Observation memberships can be updated after getting Bayesian estimators by using as mentioned above. Having updated the observation memberships and their components, model parameters can be evaluated by the mathematical expectations of the posterior distribution in equations (13) and (14). Of course, the above framework can be inaccurate in terms of finding the exact memberships of observations that the framework depends on. Therefore, we solved the problem by employing two iterative algorithms.

3.2. Gibbs Sampler

Bayesian framework for estimating parameters of a mixture of Inverse Rayleigh distribution is used by emplying Gibbs sampler given that it is considered as Markov chain Monte Carlo (MCMC). This scheme estimates the augmented parameter by sampling from the complete data posterior distribution . To this end, the complete data posterior is proportional to the complete data likelihood defined in equation (12). The steps of the Gibbs sampler algorithm start with some initial classification and fixed parameters , where and . Then, the following steps are repeated for .(a)Conditional on the classification , the weight parameters can be sampled from the posterior distribution, which represents the Dirichlet distribution , where .(b)Sample the component parameters from the posterior distribution, which represents .(c)Calculating by equation (15) as shown below:(d)Updating the classification of each observation conditional on knowing by sampling independently for each from the conditional distribution , which leads to the new classifications. Increase m by one, and return to step (a).

3.3. Metropolis Hastings

A Metropolis ̶Hastings algorithm is an important tool to generate a sample from the mixture posterior distribution (Celeux et al. [22]; Brooks [23]). Here, the posterior distributions of as in equations (10) and (11) will be used. This algorithm starts with some initial classification and some initial values for the parameters and fixed parameters , where and . We then repeated the following steps for .(a)(i) Propose a new parameter by sampling from a proposal density . Here will represent normal distribution density. (ii) Propose a new parameter , where by sampling from a proposal density , such that , where . We then calculate . Here will be defined in (b).(b)(i) Move the sampler to with probability , where If , where is a random number from the uniform distribution , then accept and set ; otherwise, reject and set . Note that the quantity is the density value of the posterior distribution with the parameters shown in equation (13). (ii) Move the sampler to with probability , where If , where is a random number from the uniform distribution , then accept and set , otherwise reject and set . Note that the quantity is the density value of the posterior distribution shown in equation (14). Here, is defined as , where is a density value of .(c)Calculating by equation (15) as shown:(d)Updating the classification of each observation conditional on knowing by sampling independently for each from the multinomial distribution , which leads to the new classification . Increase m by one and return to step (a).

4. Simulations

To apply the methods we proposed, data were generated in various cases. Here, to save the time and the area, we generate the data according to a two-component and three-component mixture of inverse Rayleigh distributions. The following scenarios were used to generate according to predefined parameters for the two-components model , where , whereas the following scenarios were used to generate according to predefined parameters for three-components model , where , , and sample sizes

A comparison was made by calculating the average of the mean square error (AMSE) of the estimated model and the average of the classification success rate (ACSR) for the number of replications equal to R by using the following formulas.

The average of MSE of all replications can be computed as where is the MSE of the model on the replication . For calculating ACSR, we let be the number of items with correctly estimated membership divided by the sample size on the replication ,where is the number of replications.

The results of Table 1 show that the two methods are good in terms of the AMSE. However, when it comes to the ACSR, the Gibbs sampler method shows high accuracy in classifying items to their true components specially when the difference between components parameters are large enough. However, the MH method is quietly outperforming the others when the component parameters are close in their value. The graphs of the p.d.f. of the model with respect to the true parameters and the estimators are shown in Figure 1. It can be seen that as the sample size increased, the accuracy of the estimation increased, as it obvious from the graphs when it became close to the true one.

The results of Table 2 show that all methods are good in terms of the AMSE. However, when it comes to the ACSR, the Gibbs sampler method shows high accuracy in classifying items to their true components specially when the difference between component parameters is large enough. However, the MH method is quietly outperforming the others when the component parameters are close in their value. The graphs of the p.d.f. of the model with respect to the true parameters and the estimators are shown in Figure 2. It can be seen that as the sample size increased, the accuracy of the estimation increased as it was obvious from the graphs when it became close to the true one.

5. Example on Real Data

The two methods that have been proposed in this study were applied to one type of elevators, which is so called Arkel. The data was represented the failure time in days of the two elevators of the same type in two 8-story buildings in Baghdad. The data of each elevator was labeled so that one can recognize each observation belong to which elevator. We then mix the observations. We assume that we do not know the certain allocation of the model. We assess the number of the components by applying the BIC. It can be seen from Table 3 that the lower value of BIC is 1168.299, which means that the observations belong to two groups. Afterward, we inferred the mixture model by applying the two methods and calculate the classification error rate for both methods. Table 4 shows the estimators and the of each method. The results were close to each other with an acceptable classification error rate . The graph of is also shown in Figure 3.

6. Conclusion and Discussion

We employed two iterative algorithms to infer the mixture of the inverse Rayleigh distribution by using conjugate priors. The novelty of this work is by deriving the estimator formulas according to the Bayesian framework for a finite mixture of the inverse Rayleigh distribution, which has not been done by anyone in the literature. This technique can be applied to any number of components for the finite mixture of inverse Rayleigh distributions. From this study, we conclude that using proper priors in Bayesian methods can result in accurate estimators and lead to the best classification of observations into their right clusters. The main aim of this work is to derive the mathematical formulas of the model estimators when we have k components. We proved these formulas in the simulation part by taking two and three components only as possible examples. Of course, we can apply to any value of k, but bear in mind that as k is increased, we need larger sample sizes, which takes a very long time when we run the code plus we have to involve more tables, more figures, and more space, which leads to an increase in the number of pages. In this work, in each scenario, we simulate a completely different data sets and replicated that for 1000 times. So it is not the matter that AMSE has to get smaller than the previous smaller sample sizes because the algorithm is run on different data sets. However, we achieved that in the results, which means that the applications were precise as far as estimators are concerned. Regarding the figures, in comparing the estimated curve of the pdf to the one with true values, we think the most important thing is getting a similar shapes. This is also difficult to get closer curves because we simulated different data sets for different sample sizes, which it is believed that it is more justified stuff than generating one large sample size and drawing from it sample sizes 90,120, and 200. The latter one, will of course result in closer curves as sample size increased. Note that, choosing sample sizes is completely arbitrary. When the sample size is small like n = 30, the accuracy of estimators will be decreased. For example, if we have k = 3 and n = 30, maybe one of the components will include a few observations, which makes estimating component parameters inaccurate.

Data Availability

The author has no objection about the real data to be publicly archived datasets analyzed or generated during the study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research did not receive specific funding but was performed as a part of my employment at College of Administration and Economics—University of Baghdad.

References

P. N. Jones and G. J. McLachlan, “Laplace-normal mixtures fitted to wind shear data,” Journal of Applied Statistics, vol. 17, no. 2, pp. 271–276, 1990.
View at: Publisher Site | Google Scholar
G. K. Kanji, “A mixture model for wind shear data,” Journal of Applied Statistics, vol. 12, no. 1, pp. 49–58, 1985.
View at: Publisher Site | Google Scholar
C. M. Harris, “On finite mixtures of geometric and negative binomial distributions,” Communications in Statistics - Theory and Methods, vol. 12, no. 9, pp. 987–1007, 1983.
View at: Publisher Site | Google Scholar
G. S. Rao, “Estimation of reliability in multicomponent stress-strength based on generalized exponential distribution,” Revista colombiana de Estad´ıstica, vol. 35, no. 1, pp. 67–76, 2012.
View at: Google Scholar
N. Ueda, R. Nakano, Z. Ghahramani, and G. E. Hinton, “SMEM algorithm for mixture models,” Neural computation, vol. 12, no. 9, pp. 2109–2128, 2000.
View at: Publisher Site | Google Scholar
D. Peel and G. J. McLachlan, “Robust mixture modelling using the t distribution,” Statistics and Computing, vol. 10, pp. 335–344, 2000.
View at: Google Scholar
I. V. Cadez, C. E. McLaren, P. Smyth, and G. J. McLachlan, “Hierarchical models for screening of iron deficiency anemia,” in Proceedings of the Sixteenth International Conference on Machine Learning, pp. 77–86, Morgan Kaufmann, San Francisco, 1999.
View at: Google Scholar
E. L. Lehmann, “Efficient likelihood estimators,” The American Statistician, vol. 34, no. 4, pp. 233–235, 1980.
View at: Publisher Site | Google Scholar
E. L. Lehmann, Theory of point estimation”, Wiley, New York, 1983.
D. Karlis and E. Xekalaki, “On testing for the number of components in a mixed Poisson model,” Annals of the Institute of Statistical Mathematics, vol. 51, no. 1, pp. 149–162, 1999.
View at: Publisher Site | Google Scholar
J. Henna, “On estimating of the number of constituents of a finite mixture of continuous distributions,” Annals of the Institute of Statistical Mathematics, vol. 37, no. 2, pp. 235–240, 1985.
View at: Publisher Site | Google Scholar
C. Liu and D. B. Rubin, “Maximum likelihood estimation of factor analysis using the ECME algorithm with complete and incomplete data,” Statistica Sinica, vol. 8, pp. 729–747, 1998.
View at: Google Scholar
N. Mohammed and F. Ali, “Estimation of parameters of finite mixture of Rayleigh distribution by the expectation-maximization algorithm,” Journal of Mathematics, vol. 2022, Article ID 7596449, pp. 1–7, 2022.
View at: Publisher Site | Google Scholar
T. A. Louis, “Finding the observed information matrix when using the EM algorithm,” Journal of the Royal Statistical Society: Series B, vol. 44, no. 2, pp. 226–233, 1982.
View at: Publisher Site | Google Scholar
C. Liu and D. B. Rubin, “ML estimation of the t distribution using EM and its extensions, ECM and ECME,” Statistica Sinica, vol. 5, pp. 19–39, 1995.
View at: Google Scholar
F. Ali and J. Zhang, “Mixture model-based association analysis with case-control data in genome wide association studies,” Statistical Applications in Genetics and Molecular Biology, vol. 16, no. 3, pp. 173–187, 2017.
View at: Publisher Site | Google Scholar
C. S. Wong and W. K. Li, “On a mixture autoregressive model,” Journal of the Royal Statistical Society: Series B, vol. 62, no. 1, pp. 91–115, 2000.
View at: Publisher Site | Google Scholar
N. D. Le, R. D. Martin, and A. E. Raftery, “Modeling Flat Stretches, Bursts, and Outliersin time series using mixture transition distribution models,” Journal of the American Statistical Association, vol. 91, pp. 1504–1514, 1996.
View at: Google Scholar
M. A. Tanner and W. H. Wong, “The calculation of posterior distributions by data augmentation (with discussion),” Journal of the American Statistical Association, vol. 82, pp. 528–550, 1987.
View at: Google Scholar
A. E. Gelfand and A. F. M. Smith, “Sampling-based approaches to calculating marginal densities,” Journal of the American Statistical Association, vol. 85, pp. 398–409, 1987.
View at: Google Scholar
M. K. Cowles and B. P. Carlin, “Markov chain Monte Carlo convergence diagnostics: a comparative review,” Journal of the American Statistical Association, vol. 91, pp. 883–904, 1996.
View at: Google Scholar
G. Celeux, M. Hurn, and C. P. Robert, “Computational and inferential difficulties with mixture posterior distributions,” Journal of the American Statistical Association, vol. 95, pp. 957–970, 2000.
View at: Google Scholar
S. P. Brooks, “On Bayesian analysis and finite mixtures for proportions,” Statistics and Computing, vol. 11, pp. 179–190, 2001.
View at: Google Scholar
S. Ali, “Mixture of the inverse Rayleigh distribution: Properties and estimation in a Bayesian framework,” Applied Mathematical Modelling, vol. 39, pp. 515–530, 2015.
View at: Publisher Site | Google Scholar
T. Sultana and M. Aslam, “A 3-component mixture of inverse Rayleigh distributions: properties and estimation in Bayesian framework,” International Journal of Basic and Applied Sciences, vol. 5, no. 2, p. 120, 2016.
View at: Google Scholar
G. S. Rao and S. Mbwambo, “Exponentiated Inverse Rayleigh Distribution and an Application to Coating Weights of Iron Sheets Data,” Journal of Probability and Statistics, vol. 2019, Article ID 7519429, 13 pages, 2019.
View at: Publisher Site | Google Scholar
S. L. Miller and D. Childers, Probability and Random Processes, Academic Press, Cambridge, MA, USA, 2012.
J. M. G. Taylor, “Semi-parametric estimation in failure time mixture models,” Biometrics, vol. 51, pp. 899–907, 1995.
View at: Google Scholar
J. A. Hartigan and M. A. Wong, “Algorithm as 136: a K-means clustering algorithm,” Applied Statistics, vol. 28, no. 1, pp. 100–108, 1979.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2023 Fadhaa Ali. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies