Abstract

This article focuses on the application of wind speed data of 4 coastal areas of Baluchistan, that is, Gawadar, Jiwani, Ormara, and Pasni on 4-Component Rayleigh Mixture Model (4-CRMM) under Bayesian context. Type I right censoring scheme is used because it is popular in reliability theory and survival analysis. To accomplish the objective, the Bayes estimates (BEs) of the parameter of the mixture model along with their posterior risks (PRs) using informative prior (IP) and noninformative prior (NIP) are obtained. Hyperparameters are obtained by employing the prior predictive method. BEs are calculated under two distinct loss functions, squared error loss function (SELF) and modified squared error loss function (MSELF). The statistical properties and performance of the BEs under said loss functions are also evaluated by simulation study for different sample sizes.

1. Introduction

Wind plays an important role in modifying and controlling climate and weather on our globe. It is originated by irregular heating patterns of sun on the surface of the Earth. This is rich and clean enough to produce electricity and used for different purposes. Due to power demand and lack of fossil fuels presently in the world, the use of wind resource plays a vital role in power supply. Therefore, the use of wind energy is expanding and it is increasingly becoming a new source to generate power energy. Wind is extensively available all over the world as an important resource. It has been proven fact that China, USA, India, Spain, and Germany are the sole producers of wind energy. Many other countries have also invented new and resourceful regulation for wind power, Azhar et al. [1]. In advancing wind energy plan, Pakistan is also playing a vital role. It has the capability to generate electrical power through wind energy. According to World Energy Statistics, published by IEA, Pakistan’s per capita electricity consumption is one-sixth of the world average. World average per capita electricity consumption is compared to Pakistan’s per capita electricity consumption; 40% of Pakistanis still have no access to electricity [2]. As we know, to utilize the power in the wind reasonably, strong wind is required, and developing wind energy clear knowledge of wind resources like location of the site, performance, condition of turbine, physical impacts of turbulence, and energy extraction is necessary. Normally, the coastal belts are considered favorable for the consistent wind resources. The geological structure, climate, and geographical position of Pakistan favors the great wind potential, Azhar et al. [1]. The theoretical and technical capability of existing wind data of the coastal belt of Baluchistan is estimated by considering population density at high wind areas Harijan et al. [3]. Pakistan Meteorological department performs vital role in spreading the network of regular meteorological stations all over the country which record wind data for research purpose.

Wind is changing with respect to space and time; therefore, regionalized study according to statistical point of view is necessary to model wind speed data. Many probability functions were fitted to represent wind data. Rayleigh distribution has proven an important candidate to model wind speed. Kiss and Jánosi [4] applied Rayleigh, Weibull, and Gamma distributions to model wind speeds over both land and sea. Abbas et al. [5] applied two-parameter Gamma, Weibull, Lognormal, and Rayleigh and three-parameter Burr and Frechet distribution on wind data and applied different goodness of fit tests. Azhar et al. [1] presented wind data analysis of coastal region of Baluchistan using Weibull and Rayleigh models.

The combination of different probability distributions is marked as mixture model. It is used to represent a statistical population with subpopulation. In recent few years due to rapid growing computational techniques, applications of mixture models in various fields are spreading day by day. Mixture models can be formed by using the same functional form for each component, known as type I mixture models, while it may consist of different probability distributions known as type II mixture models. For the continence and simplicity, type I mixture models are more frequently used. Several authors have applied mixture modeling in various practical situations. To model the crime and justice data, Harris [6] applied mixture distributions. Mixture of normal and Laplace distributions to wind shear data was fitted by Jones and McLachlan [7]. Applications of mixture models in many fields has inspired the researchers and they have performed the classical and Bayesian analysis on two or three-component mixtures. Noor and Aslam [8] presented Bayesian inference of the Inverse Weibull mixture distribution using type I censoring. Noor et al. [9] have analyzed a mixture model formed by mixing Rayleigh and Burr XII distribution under a Bayesian setup. Aslam et al. [10] discussed 3-component mixture of Rayleigh distributions, properties and estimation under the Bayesian framework. Inspired by the stated applications of mixture models, which mainly focus on two- or three-component mixture models, we intend to introduce advancement in the field by presenting a four-component mixture model of Rayleigh distribution. Though many authors have considered the problem of modeling wind data using different probability models, it is noticed that these types of data have not been analyzed using mixture models that are far more important and flexible than simple probability distributions. So, the proposed mixture model is analyzed by applying wind speed data of four coastal areas, that is, Gawadar, Jiwani, Ormara, and Pasni of Baluchistan as coastal areas are good/competitive source of wind that can be utilized to convert into energy. The parameters of component distributions are assumed to be unknown. Two different priors and two distinct loss functions are used for Bayesian analysis.

The rest of the paper is organized as follows. The 4-CRMM, likelihood function along with expressions of posterior distributions for both NIP and IP are derived in Section 2. The elicitation of hyperparameters and expressions of Bayes estimators and posterior risks are also presented in Section 2. Simulation study and real data application is in the Results and Discussions section. Finally, conclusion of the study is presented.

2. Materials and Methods

2.1. Four-Component Mixture of Rayleigh Distribution

If Y is a Rayleigh distributed random variable with parameter , its probability density function (PDF) and cumulative distribution function (CDF), respectively, iswhere y ≥ 0 and is the scale parameter of the distribution.

A finite 4-CRMM with the unknown mixing proportions , and is defined as

The CDF of the 4-CRMM is given by

In Figures 1 and 2, the shape of the PDF and CDF of 4-CRMM is depicted for different values of component parameters.

From Figure 1, it is noted that PDF curve of 4-CRMM for different parametric values is positively skewed and can be considered suitable to model mixture models and particularly phenomena that have such trend like wind speed, flood extremes, and so on.

2.2. The Posterior Distribution Using Noninformative and Informative Priors

We use uniform prior as a noninformative prior (NIP) and Square Root Inverted Gamma (SRIG), which has compatible functional form with Rayleigh distribution and most often used as informative prior (IP) for determining the posterior distribution such as [11].

2.3. The Likelihood Function

Suppose a life testing experiment is performed on 4-CRMM and n units are used in the experiment. The predetermined test termination time is t. Let k out of n units be failed and the remaining n − k units work till fixed test termination time. Out of k failures, , and failures are considered to belong to subpopulation I, subpopulation II, subpopulation III, and subpopulation IV, respectively. Uncensored observations depend upon different failure reasons, which are . Now, let a random variable such that, 0 <  ≤ t be the observed failure time of the unit that belongs to the subpopulation, where j = 1, 2, 3, 4 and i = 1, 2, …, .

The likelihood function of the 4-CRMM when data is type I right censored (see Everitt and Hand [12]) is

After simplifying, the likelihood function of 4-CRMM becomeswhere are the recorded failing times for the uncensored observations and .

2.4. The Posterior Distribution Using Uniform Prior (UP)

The NIP is expected to be the UP when slight prior information is specified. Laplace [13] and Geisser [14] proposed that it is possible to select UP for unknown parameters. A UP for a parameter θ is symbolized as p(θ)  1. Ups over the intervals and are taken for the parameters of Rayleigh distribution and for the mixing proportions , respectively. Assuming parameters to be independent and joint prior distribution of parameters is given by

The joint posterior distribution of given data y using UP is expressed bywhere ,

2.5. The Posterior Distribution Using SRIGP

Bayesian estimation needs specifying independent priors for the parameters of the model. Informative prior may provide more efficient Bayes estimates with lower posterior risks. We use Square Root Inverted Gamma prior SRIGP as an informative prior for determining the posterior distributions of the 4-CRMM for the component parameters and bivariate beta prior for proportion parameters (). Assuming independence of all parameters, the joint prior distribution of is given as

The joint posterior distribution of given data y by using SRIGP is given by

Simplification leads towhere

2.6. Elicitation of Hyperparameters

Elicitation is an important step in subjective Bayesian. It is the method, which specifies the prior distribution of random parameters. It is the way of quantifying prior information of the random parameters. Aslam [15] and Hanh [16] have suggested different procedures of elicitation based upon prior predictive distribution.

2.6.1. The Prior Predictive Distribution

It is the distribution of unobserved data point and is the product of the prior and the single variable density. Here, the uncertainty in the parameter is averaged and a distribution is obtained for the unobserved data point and is defined as

2.6.2. Elicitation of Hyperparameters Using SRIG Prior

The prior predictive distribution assuming the SRIGP for a random variable Y is given as

By substituting (4) and (12) into (16), we get

Using the prior predictive distribution given in (16), we have considered twelve intervals (0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9), (9, 10), (10, 11), and (11, 12) having probabilities 0.12, 0.26, 0.24, 0.15, 0.10, 0.05, 0.03, 0.02, 0.01, 0.06, 0.20, and 0.35, respectively, that express an experienced point of view. The following twelve equations using (17) are solved simultaneously in Mathematica package for eliciting the hyperparameters and as

The elicited values of hyperparameters and are 2.795, 2.641, 2.397, 2.346, 2.179, 2.026, 1.872, 1.718, 1.564, 1.410, 1.256 and 1.103 respectively.

2.7. Bayes Estimators and Posterior Risks Using the UP and SRIGP under SELF and MSELF

If is a Bayes estimator, then is called posterior risk and is defined as . The purpose of this study is to look for efficient Bayes estimators of the different parameters of the mixture model used to analyze wind speed. There is no hard and fast rule to decide about loss function, so to fulfil the required purpose, two different loss functions, namely, SELF and MSELF, are used to obtain Bayes estimators and their posterior risks. A suitable loss function is chosen on the basis of obtained posterior risk. The SELF, defined as , was introduced by Legendre [17] to develop the least squares theory. MSELF was presented by DeGroot [18], and it is defined as . For a given prior, the Bayes estimator and posterior risk under SELF is calculated as and , respectively. Similarly, the Bayes estimators and posterior risks with MSELF are calculated as and , respectively.

Bayes estimators and posterior risks using the UP and SRIGP for parameters under SELF are given as follows:

Bayes estimators and posterior risks using the UP and the SRIGP under MSELF are

3. Results and Discussions

3.1. Simulation Study

To investigate the behavior of the estimators under NIP and IP, the simulation consisting of 500 repetition is conducted and the average results are obtained. Random sample for 4-CRMM is obtained from , respectively. The observations higher than the fixed test termination time t = 8 are declared as censored observations. Failed observations are classified either as member of subpopulation I, subpopulation II, subpopulation III, or subpopulation IV of 4-CRMM. BEs of 4-CRMM are estimated based upon different sample sizes. Simulated results for two sets of parameters and are taken for n = 200, 300, 400, and 500 under SELF and MSELF. The BEs and PRs have been computed using Mathematica 11.0. The results are provided in Tables 14, respectively.

In Table 1, BEs and PRs are estimated assuming UP, and it is observed that while increasing the sample sizes, the values of the BEs become closer to the true parametric values. Under SELF and MSELF, overestimation in the values of and underestimation in are seen. Among SELF and MSELF the estimators of SELF are closer to parametric values. It is also observed that under MSELF we have less PRs as compared to the SELF. For the same parametric values Bes and PRs under SRIGP are estimated in (Table 3). Under estimation is observed in the estimates obtained from SELF while a slight over estimation is noticed in the values of and under estimation in which are obtained from MSELF. Here, it is again observed that increase in sample size allows the estimated values move towards the parametric values. MSELF is showing minimum risks. In (Table 2) second set of parametric values are considered assuming UP. Minimal over estimation under SELF is noticed for and and under estimation can be seen for n = 200 and 300 for , at n = 400 and 500 under both loss functions estimated values are equal to the parametric values. Fluctuation is observed in the values of , and again MSELF provides less PRs than SELF. For the same parametric values, BEs and PRs under SRIGP are estimated in Table 4. It is concluded from the simulated results that for both sets of parametric values, UP is observed better as it provides minimum risks.

3.2. Real Data Application

As wind is changing with respect to space and time, so regionalized study according to statistical point of view is necessary to model wind speed data. Rayleigh model has been found more flexible and suitable to analyze wind speed data as compared to other distributions; therefore, we consider 4-CRMM to conduct Bayesian analysis of wind speed data. Mixture models can be formed by using same functional form for each component, known as type I mixture models, while it may consist of different probability distributions known as type II mixture models. For the convenience and simplicity, type I mixture models are more frequently used. The wind data is statistically analyzed by selecting four locations of coastal belt (Gawadar, Jiwani, Ormara, and Pasni) of Baluchistan for the period of sixteen years from 2003 to 2018. The daily wind speed data (in knots) of four coastal belts including Gawadar, Jiwani, Ormara, and Pasni of the Province Baluchistan are taken from the Pakistan Metrology Department from 2003 to 2018. The wind data (in knots) is converted into km/h. This data is used to analyze the proposed four components mixture model assuming type I censoring scheme. In type I censoring experiment is stopped at some predetermined time or on unavailability of test equipment. As the wind of speed more than 54 km/h damages the turbine, hence, the turbine is stopped immediately when this threshold level is encountered. So, 54 km/h is fixed as censoring point. Further, more different seasons have different trends in wind; the data for each year are divided into four quarters which are considered to form four components of mixture model. For every quarter, the maximum value of the month is selected. According to this arrangement of values for 16 years, each quarter has 48 values and, overall, 192 values in each region. Furthermore, goodness of fit of data to Rayleigh model is checked using a command in Mathematica, which returned test statistic from Cramér–von Mises test of 0.11 against value 0.278. So, the null hypothesis that the data is distributed according to the Rayleigh distribution is not rejected at the 5 percent level based on the Cramér-von Mises test.

Summary information from real data (Gawadar) for 4CRMM is

Summary information from real data (Jiwani) for 4CRMM is

Summary information from real data (Ormara) for 4CRMM is

Summary information from real data (Pasni) for 4CRMM is

The BEs and the PRs using the UP and the SRIGP under SELF and MSELF are presented in Tables 5 and 6.

The values of BEs are representing the average wind speed of all four quarters. Censoring point is fixed as 54 km/h because wind speed above this limit damages the wind turbine. represents the estimates for the months of April, May, and June, which shows higher estimated values than , and . From this, it is observed that in these months, wind speed is high in all 4 regions that will be the favorable condition for the power generation. It is also observed that while assuming both priors (UP and SRIGP), the performance of MSEL is a better choice.

From real data results, it is observed that assuming UP under loss functions, the average wind speed is greater in the second quarter (April, May, and June), and it is smaller in the fourth quarter (October, November, and December). Under SELF, the average wind speed in the second quarter is 25.174 km/h, 21.391 km/h, 36.902 km/h, and 31.292 km/h, and in the fourth quarter, it is 18.375 km/h, 15.848 km/h, 23.757 km/h, and 24.003 km/h for Gawadar, Jiwani, Ormara, and Pasni, respectively. The PRs are comparatively higher for all four regions. For UP under MSELF in the second quarter, average wind speed is noticed as 25.335 km/h, 21.545 km/h, 37.255 km/h, and 31.496 km/h, and in the fourth quarter, it is 18.495 km/h, 15.947 km/h, 23.933 km/h, and 24.170 km/h for respective four areas. The posterior risks for Gawadar and Pasni are almost equal and smaller than Jiwani and Ormara.

Real data results assuming SRIGP under loss functions shows that the second quarter (April, May, and June) is showing higher average speed, and it is smaller in the fourth quarter (October, November, and December). Under SELF, the average wind speed in the second quarter is 24.399 km/h, 20.633 km/h, 27.308 km/h, and 28.532 km/h, and in the fourth quarter, it is 17.787 km/h, 15.528 km/h, 22.621 km/h, and 22.405 km/h for Gawadar, Jiwani, Ormara, and Pasni, respectively. The PRs are again higher for all four regions under SELF. For SRIGP under MSELF in the second quarter, the average wind speed is noticed as 24.643 km/h, 20.897 km/h, 27.544 km/h, and 28.771 km/h, and in the fourth quarter, it is 17.994 km/h, 15.735 km/h, 22.829 km/h, and 22.614 km/h for respective four areas. The posterior risks for Ormara and Pasni are almost equal with slight difference than other regions.

4. Conclusion

Motivated by the widespread applications of the mixture models, a 4-CRMM is presented by applying wind speed data from four coastal areas of province Baluchistan Pakistan. Wind speed data is often modeled by appropriate probability model, but due to different seasonal trends reflected in the speed of wind, it can also be applied to mixture models. To accomplish the desired objective, the Bayes estimates (BEs) of the parameter of the mixture model along with their posterior risks (PRs) using informative prior (IP) and noninformative prior (NIP) are obtained. The values of BEs represent the average wind speed of all four quarters. It is observed from the analysis that winds are observed strong enough in the second quarter. So, it can be concluded that in the second quarter (April, May, and June), wind speed is comparatively stronger so these months would be better for power generation and should be effectively utilized. It is also concluded that assuming both priors, the PRs under MSELF are almost equal with slight difference, but overall MSELF is considered better loss function and hence is suggested for such analysis.

Data Availability

The data are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.