Abstract
In this article, we focused on predictive modeling for real data by means of a new statistical model and applying different machine learning algorithms. The importance of statistical methods in various research fields is modeling the real data and predicting the future behavior of data. For modeling and predicting real-life data, a series of statistical models have been introduced and successfully implemented. This study introduces another novel method, namely, a new generalized exponential-X family for generating new distributions. This method is introduced by using the T-X approach with the exponential model. A special case of the new method, namely, a new generalized exponential Weibull model, is introduced. The applicability of the new method is illustrated by means of a real application related to the alumina (Al2O3) data set. Acceptance sampling plans are developed for this distribution using percentiles when the life test is truncated at the pre-assigned time. The minimum sample size needed to make sure that the required lifetime percentile is determined for a specified customer’s risk and producer’s risk simultaneously. The operating characteristic value of the sampling plans is also provided. The plan methodology is illustrated using Al2O3 fracture toughness data. Using the same data set, we implement various machine learning approaches including the support vector machine (SVR), group method of data handling (GMDH), and random forest (RF). To evaluate their forecasting performances, three statistical measures of accuracy, namely, root-mean-square error (RMSE), mean absolute error (MAE), and Akaike information criterion (AIC) are computed.
1. Introduction
In the class of traditional/classical distributions, the Weibull model is an interesting model. It has been frequently implemented for dealing and modeling data in different sectors. The DF (cumulative distribution function) of the two-parameter Weibull model iswhere , , and .
The Weibull model and its different generalized/modified variants have been used by researchers for modeling data in numerous sectors. For example, (i) Ghorbani et al. [1] and Moreau [2] applied it to the medical science phenomena; (ii) Zaindin and Sarhan [3], Lai [4], Almalki and Yuan [5], and Singh [6] used it for reliability engineering applications; and (iii) Ahmad et al. [7] studied its applications in the finance sector.
The probability density function (PDF) of the Weibull model iswith hazard function (HF) given by
From equation (3), we can see that the HF of the Weibull model can be (i) constant for (in this case, the Weibull captures the properties of the exponential model), (ii) increasing for (in this case, the Weibull captures the properties of the Rayleigh model), and (iii) decreasing for .
The Weibull distribution provides impressive results when the failure behavior of the data is either increasing, decreasing, or constant (i.e., monotonic behavior). However, in many cases, the HF of the data behaves nonmonotonically [8]. In such scenarios, the Weibull model is not a popular distribution to use. In the literature, numerous authors have addressed different applications in various fields by developing a flexible version of the Weibull distribution with additional parameters (see Pham and Lai [9]; Nadarajah et al. [10]; and Wais [11]).
Here, we introduce a new method, namely, a new generalized exponential-X (for short “NGExp-X”) family to obtain modified versions of the existing distributions. The NGExp-X family is introduced by implementing the exponential distribution with PDF and mixing it with the T-X distribution approach [12].
If X has the NGExp-X family, then its DF is given bywhere and is a baseline DF with parameter vector .
In order to show that is a compact DF, we have the following two propositions.
Proposition 1. For the expression in equation (1), we must prove that and .
Proof. And
Proposition 2. The DF is differentiable and RC (right continuous).
Proof. By taking the differentiation of equation (1), we getFrom the proofs of Propositions 1 and 2, it is obvious that the function provided in equation (4) is a valid DF.
For , and , the PDF and HF of the NGExp-X family are given byandrespectively.
2. A New Generalized Exponential-Weibull Distribution
Let X be the proposed NGExp-Weibull distribution with parameters , and , if its DF and PDF are given byandrespectively.
Corresponding to and , the SF , HF , and CHF are given byandrespectively.
Different visual behaviors of for (i) (gold curve), (ii) (blue curve), (iii) (red curve), and (iv) (green curve) are presented in Figure 1.

From the visual illustration of in Figure 1, we can see that possess different behaviors. For example, it takes (i) the left-skewed form (gold curve), (ii) the right-skewed (blue curve), (iii) the symmetrical shape (red curve), and (iv) the reverse-J shape (green curve).
3. Modeling the Al2O3 Data Set
This section offers a practical illustration of the NGExp-Weibull distribution by analyzing data from the engineering sector. We implement the NGExp-Weibull distribution to analyze the (in the units of MPa m1/2) data set (see Nadarajah and Kotz) [13]. The data set is given by 5.5, 5, 4.9, 6.4, 5.1, 5.2, 5.2, 5, 4.7, 4, 4.5, 4.2, 4.1, 4.56, 5.01, 4.7, 3.13, 3.12, 2.68, 2.77, 2.7, 2.36, 4.38, 5.73, 4.35, 6.81, 1.91, 2.66, 2.61, 1.68, 2.04, 2.08, 2.13, 3.8, 3.73, 3.71, 3.28, 3.9, 4, 3.8, 4.1, 3.9, 4.05, 4, 3.95, 4, 4.5, 4.5, 4.2, 4.55, 4.65, 4.1, 4.25, 4.3, 4.5, 4.7, 5.15, 4.3, 4.5, 4.9, 5, 5.35, 5.15, 5.25, 5.8, 5.85, 5.9, 5.75, 6.25, 6.05, 5.9, 3.6, 4.1, 4.5, 5.3, 4.85, 5.3, 5.45, 5.1, 5.3, 5.2, 5.3, 5.25, 4.75, 4.5, 4.2, 4, 4.15, 4.25, 4.3, 3.75, 3.95, 3.51, 4.13, 5.4,5, 2.1, 4.6, 3.2, 2.5, 4.1, 3.5, 3.2, 3.3, 4.6, 4.3, 4.3, 4.5, 5.5, 4.6, 4.9, 4.3, 3, 3.4, 3.7, 4.4, 4.9, 4.9, and 5.
Corresponding to the data, the summary measures are as follows: minimum = 1.680, quartile = 3.850, median = 4.380, mean = 4.325, quartile = 5.000, maximum = 6.810, variance = 1.037332, range = 5.13, standard deviation = 1.018495, skewness = −0.4167136, and kurtosis = 5.13.
The box plot and histogram of the data are sketched in Figure 2. Additionally, the corresponding curve of the TTT (total test time) is also displayed in Figure 2.

The NGExp-Weibull model is applied to the data, and its results are compared with (i) the exponentiated Weibull (Exp-Weibull) model with DF given by , and (ii) Kumaraswamy Weibull (Kum-Weibull) model with DF given by .
Furthermore, to figure out a suitable model for the data, three statistical tests with -value are considered. These tests are given by (i) AD (Anderson–Darling) test given by (ii) CM (Cramer–von Mises) test expressed by and (iii) Kolmogorov–Smirnov (KS) test obtained as
Corresponding to the data, the values of , and are provided in Table 1. In the same table, the standard errors (SEs) (numerical values in the parentheses) of , and are also presented.
Corresponding to the data, the -value along with the selected tests CM, AD, and KS of the fitted models is provided in Table 2. Based on the results of CM, AD, KS, and value in Table 2, it is clear that the NGExp-Weibull model has the smallest values of CM, AD, and KS, and largest -value. The values of these statistics show that the NGExp-Weibull model is the best competitor. In addition to the numerical illustration, a visual display of the performances of the NGExp-Weibull model is provided in Figure 3.

4. A New Acceptance Sampling Plans
In the usual practice of life testing experiment, the test will be terminated by the pre-assigned time and the number of failures observed. To establish a lower confidence limit on the mean life/percentile lifetime is the aim of the experiments. To protect the consumer’s risk, the test has to establish the definite mean life with a certain probability. There are various methods for testing in the literature of acceptance sampling. Epstein [14] was first considered truncated life tests in the exponential distribution. Several authors described about truncated life tests for various distributions, for example, Soble and Tischendrof [15]; Gupta and Groll [16]; Gupta [17]; Baklizi and El-Masri [18]; Tsai and Wu [19]; Balakrishnan et al. [20]; and Kantam et al. [21].
In fact for life distributions, percentiles furnish more information than the mean life does. When the given life distribution is symmetric, the mean life, the median, and the 50th percentile are alike. Thus, developing acceptance sampling plans based on the mean life is a generalized case of developing the acceptance sampling plans based on percentiles of life distribution. Balakrishnan et al. [20] suggested that the acceptance sampling plans could be used for the quantiles and derived the formulae, whereas Lio et al. [22, 23] established for the acceptance sampling plans for any other percentiles of the Birnbaum–Saunders (BS) and Burr type XII models. They have developed the acceptance sampling plans for percentile by replacing the scale parameter by the 100qth percentile.
Rao and Kantam [24] developed acceptance sampling plans from truncated life tests based on the log-logistic and inverse Rayleigh distributions for percentiles. Balamurali et al. [25] developed acceptance sampling plans based on median life for exponentiated half logistic distribution. Rao et al. [26] developed a new acceptance sampling plans based on percentiles for odds exponential log logistic distribution. This persuades to develop the acceptance sampling plans (ASP) based on the percentiles for NGExp-Weibull distribution, and it is a skewed distribution.
This section deals with study of new acceptance sampling plans (ASP) based on new distribution proposed in Section 2. The 100p-th percentile of NGExp-Weibull distribution is given by
The expression in equation (14) can also be written aswhere
Hence, the quantile given in equation (16) at a specified value of , , and , the quantile is a function of the scale parameter , and at a prespecified value of , which say , we may obtain the value of , which say as . It is remarkable that could rely on , , and to develop ASP for the NGExp-Weibull distribution determine that exceeds equivalently exceeds .
At this juncture, the aim of the study was to obtain the minimum sample size needed to acquire percentile life if the life test ended at predestined time and if the amount of nonconformities noticed does not go beyond agreed acceptance number . The judgment method is to accept a lot only if the given percentile of the lifetime is recognized with a pre-assigned high probability that furnishes security to the consumer. The life testing arrives ended at the time at which failure is observed or at quantile of time , whatsoever is prior.
The chance of accepting lot based on the number of failures is given bywhere is the sample size, is the acceptance number, and is the chance of receiving a failure within the life test schedule time . If the product lifetime follows an NGExp-Weibull distribution, then . Frequently, it would be suitable to define the testing conclusion time as for a constant and the targeted 100p-th lifetime percentile, . Suppose is the true 100p-th lifetime percentile. Then, can be transliterated as:
In this study, we adopt two points on the OC curve methodology by means of taking into account the both consumer’s and producer’s risks to obtain the design parameters of the proposed ASP. In this plan, the ratio of percentile lifetime to the lifetime, , is measured the quality intensity of the product. From producer’s opinion, the chance of lot acceptance must be at least at the acceptable reliability level (ARL), that is, . Hence, the producer requires that a lot must be accepted at different levels, say in equation (18). On the other hand, from consumer’s point of view, the chance of rejection of a lot must be at most at the lot of tolerance reliability level (LTRL), . Thus, the consumer believes that a lot must be rejected when .
Therefore, from equation (18), we getandwhere and are given, respectively, byandwhere
The design parametric measures for different values of parameters , and are constructed. The parameter of the suggested sampling design under the truncated life test at the pre-assigned time with , and is attained for the given producer’s risk and test termination scheduled time with according to the consumer’s confidence levels for percentile, and the OC values are determined, and the results are given in Tables 1–3. The design parameter value is tabulated in Tables 3 and 4, for and with 50th percentiles, while Table 5 displays the design parameter for and is the maximum-likelihood estimates from the fracture toughness data set at percentile. It is noticed that from Tables 3–5 that the when parametric values are increases the sample size decreases and when the percentile ratio increases the sample size decreases.
4.1. Description of the Proposed Plan
To ensure that, the 50th percentile life of products under inspection is at least 1000 hours under the assumption that the producer wants to enforce a single sampling plan at the percentile ratio . Researcher needs to run this life test for 1000 hours. If the lifetime of the product follows NGExp-Weibull distribution observed from the previous data with , and . The best plan from Table 3 for stated demands such as , and is attained as and with the probability of acceptance, which is 0.9651. Most of the researchers are studied earlier based on one point on the operating characteristic curve method for assuring mean or percentile life time under various life distributions. The present study is deal with sampling plans based on two-point approach on the operating characteristic curve approach for ensuring percentile lifetime of the products under NGExp-Weibull distribution.
4.2. Real Data Illustration
Here, we consider the suggested ASP application for NGExp-Weibull distribution using fracture toughness (in the units of MPa m1/2) data set. The goodness of fit for the given model is shown in Table 2, and to emphasize the goodness of fit, we have proved the visuals in Figure 3. The MLEs of the parameters of NGExp-Weibull distribution for the fracture toughness data set are , and and the Kolmogorov–Smirnov test we found that the maximum distance between the data and the fitted of the NGExp-Weibull distribution is 0.07204 with -value 0.56730. This shows that fracture toughness data set is well fitted for NGExp-Weibull distribution with estimated parameters, and the plan parameters for these estimated parameters are given in Table 3.
Let us fix that the consumer’s risk is at when the true percentile is fracture toughness 2 units of MPa m1/2 and the producer’s risk is when the true percentile is fracture toughness 4 units of MPa m1/2. For , the consumer’s risk is , , and . The minimum sample size and acceptance number are given from Table 3. Thus, the design can be implemented as follows: we select a sample of 5 fracture toughness units, and we will accept the lot when no more than 1 fracture toughness 2 units. Hence, by applying the proposed sampling plan, the fracture toughness lot has been rejected because there is more than one unit before the termination fracture toughness 4 units.
5. Data and Modeling Procedures
The forecasting is very crucial and attributes remarkable growth to the country’s economy. Knowing its future trajectory is beneficial for the government and policymakers. Therefore, this section provides the multistep ahead forecast of by applying different machine learning algorithms, namely, support vector machine (SVM), group method of data handling (GMDH) neural network, and random forest (RF).
5.1. Support Vector Regression
The SVR method was first introduced by Cortes and Vapnik [27], and to date, it is frequently used for regression and classification problems. The SVR is based on statistical theory and structured risk minimization principle, due to which it circumvents the problem of overfitting, and produces an accurate forecast. In practice, it can precisely approximate the linear and nonlinear problems in the real world [28]. The performance of SVR is based on the kernel function to be utilized. Kernel functions are utilized to carry out operations in the input space instead of higher-dimensional space. Various kernel functions are appeared in past studies, namely, linear, polynomial, splines, and sigmoid radial basis functions [29]. RBF has achieved great attention due to its outstanding performance in capturing nonlinear association [30, 31]. The SVR helps to disclose the margin of error, which is acceptable in a model [32, 33]. Let we have a set of training data, having output and input variables. Mathematically, the SVR seeks to find a function that fitted the data more appropriately. In other words, it minimizes the error between output variable and predicted values in the following way:where is a weight parameter, represents the nonlinear function, and is a constant parameter. In equation (25), the loss functions demonstrate the -insensitive model, where the loss tends to 0, if the difference between actual values and predicted values is less than , and is illustrated by the Vapnik’s e-insensitive loss function
The problem of SVR is then constructed as the following optimization problem.where and represent the slack variables describing the lower and upper training errors subject to the error tolerance , and is a positive constant that discovers the extent of penalized loss when a training error occurs. In our case, we set the values of and are 1 and 0.01, respectively.
The set of Lagrange multipliers including and is utilized for optimization problem solution. Thus, the approximate function is illustrated as follows:where represents the support vector, h represents the size of support vector, represents the kernel function, and represents the threshold value. Herein, the radial basis function (RBF) with a parameter is expressed as follows:where indicates the Euclidean distance between the two predictors in squared form, and indicates the width of RBF [34]. Hence, in this study, we focus on the RBF kernel function for SVR.
5.2. Random Forest
Breiman [35] developed a nonparametric approach known as random forest (RF). The development of the RF approach is based on decision tree algorithms; however, this is the modified form of classification and regression trees (CART). The RF is, therefore, to be utilized for handling both types of issues, classification, and regression. In general, the RF is a part of the supervised learning class and its output is based on decision trees’ forest. The forecast for the RF approach is basically obtained by taking the mean of various trees. A large number of trees in the forest can help to improve the forecasting accuracy and prevent the problem of overfitting. The RF algorithm utilizes the popular procedure of bagging, also known as bootstrap aggregation, in order to train the tree learners. The random sample is taken repeatedly (M time) with replacement of the training set and estimates the trees to each repeated sample [32]. To estimate the RF, we utilize three hyperparameters including number of tress, number of nodes, and sample repetition. The number of tress and nodes is used as 500 and 3, respectively.
5.3. Group Method of Data Handling (GMDH)
The GMDH neural network (GMDH-NN) was initially introduced by Ivakhnenko [36] in analyzing complex systems, which incorporates an output and a set of inputs. The core aim of GMDH-NN is simply to formulate a function in a feed-forward network based on second degree transfer function. The productive input variables, the set of neurons and layers within a hidden variable, and the optimal model framework are established automatically in the GDMH algorithm [37]. In our study, we select these parameters through error and trial approach, followed by Peng et al. [38]. The mapping amid target and input variables is carried out via GMDH-NN, and a nonlinear function is so-called Volterra series, given in equation (30) as follows:
For two variables, the Volterra series can be described in terms of second-degree polynomial in equation (31) aswhere denotes an output of the model, denotes the input variables, and the corresponding weight is shown by .
The network neurons are recursively connected to each other through the partial quadratic equation, which reveals the nexus while estimating the unknown parameters using the training data. The key aim of GMDH is to find out the unknown coefficients provided in equation (31), as they demonstrate the least difference amid the actual data and forecasted values. The unknown parameters are computed utilizing the regression tool [39, 40]. Thus, under the rule of principle of least square error, the parameters of each quadratic equation are optimized in the following way:
We seek to reduce the squared difference between predicted valued and actual values; in order to achieve an accurate forecast, we havewhere is the response variable, is the vector of unknown parameters to be estimated from the data at hand, and is computed as follows:
5.4. Out-of-Sample Al2O3 Forecasting
The data are utilized in this work to assess the predictive capability of different ML algorithms. The data consist of 119 observations. For estimation and prediction, the data set is decomposed into two parts: for model estimation, we use the data points from 1 to 95, and 96 to 119 for evaluating the models’ multistep ahead out-of-sample predictive accuracy utilizing the expanding window methods. This study adopts popular three statistical measures, namely, root-mean-square error (RMSE), mean absolute error (MAE), and Akaike information criterion (AIC) to evaluate the forecasting accuracy of ML algorithms. The smaller the values of RMSE, MAE, and AIC, the better the forecast. Mathematically, they can be, respectively, illustrated aswhere shows the number of observations, sse indicates the sum of squared error, indicates the number of parameters, represents actual value of , and represents the predicted values of .
Figure 4 indicates the data trend, which does not follow any particular pattern and is highly uncertain. The data are plotted in Figure 4 where the vertical blue dotted line separates the estimation (80 percent) and out-of-sample forecasting (20 percent) parts. The accuracy measures for the data are reported in Table 6.

The accuracy measures for the data are reported in Table 6. The RMSE and MAE values for ML algorithms include GMDH, SVR, and RF. It can be observed that SVR beats the rival counterparts. The RMSE and MAE values associated with SVR are 0.07 and 0.031, the RF resulted in 0.09 and 0.064, and the GMDH resulted in 0.233 and 0.159, respectively. In addition, according to AIC, the best model is SVR among the all.
6. Concluding Remarks
In this piece of study, a new generalized exponential- family of distributions is well thought-out as a real-life data model. The attempts in this paper lead to another approach to developing a new statistical model. Employing the proposed model, a new modification of the Weibull model called a new generalized exponential Weibull model is studied. The effectiveness of the new generalized exponential Weibull model is shown by considering the fracture toughness of the data set. A statistical product control application of the developed model is also studied by developing the new single acceptance sampling plan based on the NGExp-Weibull distribution. The plan parameters are determined such that both consumer’s risk and producer’s risk satisfy simultaneously. Some tables are given for industrial application purposes. The developed plan is exemplified by the fracture toughness data set, which is well fitted for the proposed NGExp-Weibull. Furthermore, using the same data set, we implemented several ML algorithms including SVR, GMDH, and RF. The out-of-sample forecast accuracy was assessed using three statistical measures of accuracy, namely, the RMSE, MAE, and AIC. After the analysis, we found that the SVR produces a more efficient forecast than the competitor counterparts. The findings clearly reveal that the predictive power of the SVR method is superior in a class of ML algorithms while predicting the data.
Data Availability
The data set is given in the main body of the paper.
Conflicts of Interest
The authors declare that there are no conflicts of interest.