Distributional Censored and Uncensored Validation Testing under a Modified Test Statistic with Risk Analysis and Assessment

Tashkandy, Yusra; Emam, Walid; Cordeiro, Gauss M.; Ali, M. Masoom; Aidi, Khaoula; Yousof, Haitham M.; Ibrahim, Mohamed

doi:https://doi.org/10.1155/2023/8852528

Journal of Mathematics

On this page

Abstract Introduction Appendix Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Review Article | Open Access

Volume 2023 | Article ID 8852528 | https://doi.org/10.1155/2023/8852528

Distributional Censored and Uncensored Validation Testing under a Modified Test Statistic with Risk Analysis and Assessment

Yusra Tashkandy,¹Walid Emam,¹Gauss M. Cordeiro,²M. Masoom Ali,³Khaoula Aidi,⁴Haitham M. Yousof,⁵and Mohamed Ibrahim⁶

Academic Editor: Antonio Di Crescenzo

Received07 Jan 2023

Revised28 Apr 2023

Accepted26 May 2023

Published30 Jun 2023

Abstract

This paper introduces and studies a unique probability distribution. The maximum likelihood estimation, the ordinary least squares, the weighted least squares, and the Anderson–Darling estimation methods all take into account a number of financial risk indicators, including the value-at-risk, tail-value-at-risk, tail variance, tail mean-variance, and mean excess loss function. These four approaches were used in a simulation study and an application to insurance claims data for the actuarial evaluation. The well-known Nikulin–Rao–Robson statistic is taken into consideration for distributional validation under the whole set of data. Three complete actual datasets and a simulation study are used to evaluate the Nikulin–Rao–Robson test statistic. An updated version of the Nikulin–Rao–Robson statistic is taken into consideration for censored distributional validation. Three censored actual datasets and a thorough simulation analysis are used to evaluate the novel Nikulin–Rao–Robson test statistic.

1. Introduction

A novel continuous distribution will be introduced and studied in this work, but we will approach it from new viewpoints that depart from those often covered by specialists. We will not focus on numerous theoretical findings and algebraic derivations—not because they are unimportant, but rather to give us the chance to highlight more practical features of risk analysis, distributive verification, and their associated applications in both complete and censored data. We will discuss certain theoretical facets of the new distribution, though. But, we will pay particular attention to features that are applicable and useful in the following areas:(1)A set of frequently used financial indicators, such as value-at-risk (VAR), tail-value-at-risk (TVAR) (also known as conditional tail expectation), conditional-value-at-risk, tail variance (TV), tail mean-variance (TMV), and the mean excess loss (MEL) function, is studied when examining and evaluating the risks that insurance companies face. The maximum likelihood estimation (MLE), the ordinary least squares (OLS), the weighted least squares estimation (WLSE), and the Anderson–Darling estimation (ADE) are all described as estimate strategies for the main key risk indicators (KRIs). These four methodologies are applied in two distinct ways for financial and actuarial evaluation, including simulation with three confidence levels (CLs), under different sample sizes, and applications to data from insurance claims.(2)We present a simulation experiment to compare the performance of the estimators of VAR based on insurance data in order to satisfy the requirements of the actuarial analysis of risks.(3)The well-known Nikulin–Rao–Robson (NRR) statistic , which is based on the uncensored maximum likelihood estimators (UMLEs) on initial non-grouped data, is considered under the new Rayleigh generalized gamma (RGG) model in the framework of distributional validation and statistical hypothesis tests for complete data. Three real datasets and a simulation study evaluate the statistic .(4)The RGG model considers a modified NRR statistic , which is based on the censored maximum likelihood estimators (CMLEs) on the original non-grouped data. This statistic is used in the framework of distributional validation and statistical hypothesis testing for censored data. The statistic is evaluated using three real datasets and a thorough simulation analysis.(5)It is worth noting that risk indicators can be applied in the field of engineering, especially in the field of structural engineering, with the aim of developing mathematical measurement and statistical modelling processes in these fields. In engineering, maximum likelihood estimation can be used to estimate the parameters of a distribution for the failure time of a product or system, which can be used to assess reliability and inform maintenance decisions. Moreover, in engineering, right censored maximum likelihood estimation can be used to estimate the reliability function of a system, such as a machine or a bridge. For example, if the failure time of a machine follows a Weibull distribution, the parameters of the distribution can be estimated using right censored maximum likelihood estimation, and the reliability of the machine can be assessed based on the estimated distribution. Generally, it is important to monitor the system over time and review the risk assessment periodically to ensure that any changes or updates to the design or operating conditions are taken into account. This will help to ensure that the system remains safe and reliable over its lifetime. For more details, see Amini et al. [1] and El-Morshedy et al. [2].

The cumulative distribution function (CDF) of the generalized gamma model [3] can be expressed aswhere , , and , which is flexible enough to accommodate both monotonic and non-monotonic failure rates. Following Yousof et al. [4], the CDF of the RGG model has the following form:where . The probability density function (PDF) corresponding to (2) can be expressed as

When comparing probability distributions for applications in insurance, several criteria are commonly considered. These criteria help insurers select the most appropriate distribution to model the claims data accurately. Here are some of the main criteria:1. This criterion assesses how well a probability distribution fits the observed historical claims data. Insurers typically use statistical tests, such as the Kolmogorov-Smirnov test or the chi-square test, to evaluate the goodness of fit. A distribution that closely matches the data is preferred, as it provides more reliable estimates for future claims.2. Skewness measures the asymmetry of a distribution, while kurtosis measures its tail heaviness. In insurance, it is important to consider the skewness and kurtosis of the claims data to capture any non-normal characteristics. Some distributions, such as the lognormal or Pareto distributions, are better suited for modeling skewed and heavy-tailed data commonly observed in insurance claims.3. Probability distributions often have parameters that need to be estimated from the data. Insurers consider the ease and accuracy of parameter estimation methods for a given distribution. Maximum likelihood estimation is a common approach used to estimate parameters, but other methods like the method of moments or Bayesian estimation may also be applicable. A distribution with easily estimable parameters is desirable, as it simplifies the modeling process.4. The interpretability of a probability distribution is crucial for insurance applications. Insurers and actuaries need to understand the underlying assumptions and characteristics of the selected distribution. Distributions like the normal (Gaussian) distribution or the gamma distribution are well-known and have established interpretations, making them popular choices. However, it is important to balance interpretability with the goodness of fit to ensure accurate modeling.5. The tail behavior of a distribution is important for capturing extreme events or catastrophic losses in insurance. Insurers need to assess whether a distribution adequately represents the tail risks inherent in the claims data. Heavy-tailed distributions, such as the Pareto or generalized Pareto distributions, are often considered for modeling extreme events.6. The historical credibility of a probability distribution is based on its track record and applicability to similar insurance portfolios or lines of business. Insurers often rely on industry experience, expert judgment, and historical data from similar risks to assess the suitability of a particular distribution. Distributions that have been successfully used in the past for similar insurance applications may carry more weight in the selection process.It's important to note that the choice of probability distribution should be based on a comprehensive analysis of the specific insurance context and the characteristics of the claims data. Insurance companies often employ experienced actuaries and statistical experts to evaluate and compare various distributions, considering these criteria, to make informed decisions about the most appropriate probability distribution for their insurance applications. The test statistic depends on the MLEs on the initial non-grouped real datasets, and the test statistic is of particular importance among all goodness-of-fit tests. The test statistic , pioneered by Nikulin [5, 6] and Rao and Robson [7], has a chi-square distribution and recovers information lost during data grouping. However, censoring makes all widely used goodness-of-fit tests worthless and leads to a variety of practical issues. As a result, additional researchers provide a wide range of improvements to the material that was already accessible. Bagdonavicius and Nikulin [8] developed a modified NRR statistic for statistical distributions with right censoring and unknown parameters. Since it recovers all information lost during data regrouping, this variant of the NRR statistic may be used to fit data from fields where the data are frequently censored, such as in survival analysis, dependability, and others. Following Nikulin [5, 6] and Rao and Robson [7], we will present modified NRR goodness-of-fit tests for adjusting the proposed model to full and right censored data.

In case of complete data, the NRR statistic is a well-known alternative to the conventional tests. It is founded on variations between two estimates of the chance of falling within grouping intervals. One estimate is based on the empirical distribution function, while the other is based on maximum likelihood calculations employing ungrouped starting data to estimate the unobserved parameters of the tested model. For further information, see Nikulin [5, 6] and Rao and Robson [7].

In general, the development of statistical methods for testing hypotheses and the validity of parametric distributions under censorship is accelerating, although the existence of censorship is considered as a huge challenge. There have been numerous contributions to the field of application in the history of statistical literature for verification tests in the case of controlled data.

The NRR test has been the subject of several research studies in the statistical literature. Because of their rarity, these studies may be tallied; here, we will list the most recent ones. In this work, the RGG distribution is derived in risk analysis and distributional validation, and the uncensored and the right censored scenarios are used to validate a modified goodness-of-fit test statistic based on the NRR statistic test and the modified NRR statistic , respectively. The statistic is adopted for testing the null hypothesis according to which a certain complete sample follows the RGG model. The statistic is assessed utilizing a comprehensive simulation study using the Barzilai–Borwein (BB) algorithm for complete data (see Ravi and Gilbert [9]). Then, the statistic is assessed utilizing a comprehensive simulation study using the BB algorithm for censored data. To examine the performance of the test when the sample size is increased, we have relied on the standard mean square error (MSE) in all simulated experiments, accounting for varied sample sizes.

Three uncensored real datasets (uncensored times between failures for repairable items’ data, uncensored reliability data, and uncensored strength data) are used in statistical testing under the statistic for distributional validation. The uncensored real data included in the analysis have a wide reputation in the statistical literature and have had a great deal of analysis. In this work, we will focus on an important aspect of statistical analysis, which is the aspect of hypothesis tests using these data. This most important material is motivated in the introduction, as most of the statistical works did not study and analyze this aspect despite the importance of hypothesis tests in statistical theory. On the other hand, three right censored real datasets are used to assess the statistic for distributional validation under the RGG model. The censored real data that have been considered and included in the analysis are the censored bone marrow transplant data, the censored times to infection of kidney dialysis patients data (censored times to infection data), and censored strength of certain type of braided cord data.

The new NRR statistical test showed that using the new model as a stand-in for looking at two right censored datasets is successful. In this context, we shall outline several recent studies that added to or changed the NRR. It is important to note that the browser for statistical literature on this topic (NRR goodness-of-fit test) will not find many new NRR goodness-of-fit extensions and will find few research studies that applied this test because the NRR goodness-of-fit test has specific requirements and strict procedures and demands censored data. As is generally known, obtaining novel censored data to apply to and emphasize the significance of the new test is a difficult duty. In the next few sections, we will discuss few recent research studies that looked at using this test on real data that had been subject to right censoring datasets, along with a description of the conclusions from each study independently.

In this introduction, we do not fail to review some of the limitations of the current study, while we present two basic limitations:(i)The datasets used must be positive only because the new probability distribution is bound by this constraint and is conditioned by this condition.(ii)The new modified NRR test introduced in this version can only be applied to data subject to right censored only.

The main novelties of this work can be highlighted as follows:(1)Employing the new probability distribution in the analysis and evaluation of actuarial risks through a set of actuarial measures. These actuarial measures (indicators) have been carefully selected due to their quality in results and popularity in application.(2)The use of the new probability distribution in modelling processes and statistical hypothesis tests using the NRR test.(3)Developing the theory of statistical hypothesis testing for controlled data by presenting a modified NRR test and applying it to real data.(4)Applying the modified NRR test in distributional validations using the new distribution.(5)Evaluating the performance of several different estimation methods in risk analysis processes. This approach to risk assessment using different estimation methods is considered a recent approach, and many studies are not available.

This paper is distinguished from these competing papers with the following points:(i)Actuarial application on insurance data.(ii)Using different estimation methods in evaluating and analyzing risks.(iii)Combining the original test and the modified test.(iv)Providing various applications on the original statistical test and the modified test.

2. Risk Indicators

The key risk indicators (KRIs) play a crucial role in understanding actuarial analysis of insurance risks. These indicators provide valuable insights into the level and nature of risks associated with insurance portfolios, allowing actuaries to quantify, measure, and manage these risks effectively. Here are some key points highlighting the importance of risk indicators in actuarial analysis:1. KRIs help actuaries quantify the level of risk in insurance portfolios. By using various metrics, such as loss ratios, claim frequencies, severity measures, or aggregate reserves, actuaries can assess the potential financial impact of risks. These indicators provide a numerical representation of the risk exposure and assist in decision-making processes related to pricing, reserving, and capital allocation.2. KRIs enable actuaries to measure and assess the magnitude and likelihood of potential losses. Actuarial models and techniques, such as probability distributions, statistical methods, and simulation studies, are used in conjunction with risk indicators to estimate the probability and severity of adverse events. This helps insurers evaluate the financial implications and potential solvency risks associated with the insurance business.3. KRIs serve as monitoring tools, allowing actuaries to track changes in risk levels over time. By regularly analyzing and comparing risk indicators against predefined thresholds or benchmarks, actuaries can identify emerging risks or deviations from expected risk levels. These indicators act as early warning signals, enabling insurers to take proactive measures to mitigate risks, adjust pricing, or revise underwriting strategies.4. KRIs assist in segmenting insurance portfolios based on the level of risk. Actuaries can use risk indicators to identify high-risk segments or policyholders and allocate appropriate resources for risk mitigation. This segmentation helps in portfolio management, allowing insurers to optimize their risk exposure, diversify risks, and balance their overall risk portfolio.5. KRIs play a crucial role in regulatory compliance and financial reporting for insurers. Regulators often require insurers to report and disclose risk-related information to ensure solvency and protect policyholders. Risk indicators, such as risk-based capital (RBC) ratios, economic capital models, or stress testing results, provide insights into the financial stability and resilience of insurance companies, ensuring compliance with regulatory requirements.6. KRIs support informed decision-making and strategic planning for insurers. Actuaries rely on risk indicators to assess the profitability of insurance products, evaluate potential risks associated with new business ventures, and determine appropriate pricing strategies. These indicators help in setting risk appetite, formulating risk management policies, and aligning business strategies with the risk tolerance and objectives of the insurance company. This indicator is frequently used to calculate the amount of capital needed to deal with such probable negative events. The VAR of the RGG distribution at the level, say VAR or , is the quantile (or percentile). Then, we can simply writewhere can be derived by inverting (2). For a one-year time when , the interpretation is that there is only a very small chance (0.01) that the insurance company will be bankrupted by an adverse outcome over the next year (see Wirch [10] for more details). Generally speaking, if the distribution of gains (or losses) is limited to the normal distribution, it is acknowledged that the number VAR(X) meets all coherence requirements. The datasets for insurance such as the insurance claims and reinsurance revenues are typically skewed to the right or to the left. The normal distribution is not suitable to describe the revenues from reinsurance and insurance claims. The TVAR of at the confidence level is the expected loss given that the loss exceeds the of its distribution. Then, the TVAR of can be expressed as

The quantity TVAR (X), which gives further details about the tail of the RGG distribution, is therefore the average of all the VAR values mentioned above at the confidence level q. Moreover, TVAR (X) can also be expressed as TVAR VAR , where is the mean excess loss (MEL) function evaluated at the quantile (see Wirch [10]; Acerbi and Tasche [11]; and Tasche [12]). If vanishes, then TVAR VAR , and for very small values of , the value of TVAR will be very close to VAR . The TV risk indicator, which Furman and Landsman [13] developed, calculates the loss’s deviation from the average along a tail. Explicit expressions for the TV risk indicator under the multivariate normal distribution were also developed by Furman and Landsman [13]. The TV risk indicator (TV ) can then be expressed as

As a statistic for the best portfolio choice, Furman and Landsman [13] developed the TMV risk indicator, which is based on the TV risk indicator. Thus, the measure TMV has the following form:

Then, for any RV, TMV TV , and for , TMV TVAR . We will use the methodologies that provide numerical solutions to this complex function, and we will use ready-made programs like “R” and “MATHCAD” to facilitate numerical operations. The use of numerical methods has recently become popular for many reasons, the most important of which is the availability of ready-made statistical programs, and the quantum function of the RGG is not known in a certain closed form.

Numerical approaches were applied in this paper’s risk analysis and evaluation procedure (see Section 4), as well as in the issue of distributional validation under the NRR and its updated equivalent version (see Section 7). For more details, see Alanzi et al. [14], Zhou and Gao [15], Hamed et al. [16], and Yousof et al. [17].

3. Risk Analysis Using Different Estimation Methods

3.1. Risk Assessment under Artificial Data

For the purpose of computing the abovementioned KRIs, the four estimation methods are discussed in this section: MLE, OLS, WLSE, and ADE. Three CLs (q = ) and N = 1,000 for sample sizes are considered. All results are reported in Table 1 (KRIs under artificial data for n = 50), Table 2 (KRIs under artificial data for n = 150), Table 3 (KRIs under artificial data for n = 300), and Table 4 (KRIs under artificial data for n = 500). The main goal of the simulations is to assess the effectiveness of the four risk analysis techniques in order to choose the most suitable and effective ones. Tables 1–4 provide the significant conclusions:(1)The measures VAR , TVAR , and TMV increase when increases for all estimation methods.(2)The measures TV and MEL decrease when increases for all estimation methods.(3)VAR VAR VAR VAR for most values of .(4)We can confirm by the numbers of the four tables that all functions are satisfactory and that one approach cannot be clearly recommended over another method. Based on these findings, we are obligated to provide an application based on real data that can select one way over another, determining the best and most appropriate methods. In other words, the simulation study did not help weighting the four methods decisively because they showed similar results in risk assessment. These convergent results assure that all methods have good and acceptable performance in modelling actuarial data and risk assessment.

3.2. Risk Assessment under the Insurance Payment Claims Data

Analyzing historical insurance data of claims using probability distributions is essential for several reasons: 1. Probability distributions provide a mathematical framework to model and analyze the frequency and severity of insurance claims. By fitting historical claims data to appropriate probability distributions, insurers can estimate the likelihood of different claim amounts and frequencies occurring in the future. This information is crucial for assessing the overall risk exposure of the insurance company.2. Probability distributions help insurers determine appropriate premiums for insurance policies. By understanding the distribution of claim amounts and frequencies, insurers can calculate the expected value of claims and incorporate it into the pricing structure. This ensures that premiums charged to policyholders align with the potential risks faced by the insurer, maintaining a fair and sustainable pricing model.3. Accurate estimation of potential claim costs is vital for insurers to set aside adequate reserves and plan their financial stability. By analyzing historical claims data using probability distributions, insurers can estimate the potential range of future claim payments and allocate sufficient funds to cover these liabilities. It enables insurers to make informed decisions about capital management, investment strategies, and financial reserves.4. Probability distributions provide insights into the potential volatility and tail risks associated with insurance claims. Insurers can analyze the shape of the distribution and its parameters to identify extreme events and tail risks that may have a significant impact on the financial health of the company. This knowledge helps insurers manage risks effectively, develop appropriate underwriting guidelines, and implement risk mitigation strategies.5. Probability distributions allow insurers to simulate various scenarios and evaluate the potential outcomes of different policy design choices. By incorporating historical claims data into probabilistic models, insurers can assess the impact of different policy terms, coverage limits, deductibles, and other factors on claim frequencies and amounts. This information helps insurers make data-driven decisions when designing insurance policies and developing risk management strategies. Overall, analyzing historical insurance data of claims using probability distributions enables insurers to gain insights into the nature of risks they face, make informed decisions, and effectively manage their operations. It supports pricing, reserving, risk assessment, and underwriting activities, contributing to the financial stability and success of insurance companies. In this work, we will consider and analyze a claims data collected from 2007 to 2013 This work proposes certain KRI quantities for the left-skewed insurance claims data under the EEC distribution, including VAR, TVAR, TV, and TMV [18]. One of the finest techniques for heavy-tailed distributions is based on the t-Hill approach, an upper order statistic modification of the t-estimator.

Table 5 reports the KRIs under the insurance claims data and MLE method for the RGG and GG models, where and . Table 6 gives the KRIs under the insurance claims data and OLSE method for the RGG and GG models, where and . Table 7 provides the KRIs under the insurance claims data and WLSE method for the RGG and GG models, where and . Table 8 presents the KRIs under the insurance claims data and WLSE method for the RGG and GG models, where and . Based on these tables, the following results can be highlighted:(1)For all risk assessment methods:(2)For all risk assessment methods:(3)For all risk assessment methods:(4)For all risk assessment methods:(5)For all risk assessment methods:(6)For the RGG model: Nearly for all values, the OLSE method is recommended since it provides the most acceptable risk exposure analysis; then, the MLE method is recommended as a second one. However, the other two methods perform well. For the GG model: Nearly for all values, the OLSE method is recommended since it provides the most acceptable risk exposure analysis; then, the MLE method is recommended as a second one. However, the other two methods perform well.(7)For all values and under all risk methods: The RGG model is better than the GG model. It is worth noting that the distributions have the same number of parameters, but the new distribution is the best in the process of modelling insurance claims reimbursement data and assessing actuarial risk. We hope that the proposed distribution will gain a great deal of interest from actuaries and practitioners in future actuarial and applied studies.

4. Distributional Validity

4.1. Distributional Validity under the UMLE Method

Here, the UMLE method is used to estimate the RGG distribution’s parameters. Let be random samples distributed according to the RGG model. The uncensored likelihood function is obtained by . Then, the uncensored log-likelihood function reduces towhere . The MLEs can be obtained by solving the following equation:

4.2. Distributional Validity Utilizing the Right CMLE Method

Let be a right censored data sample with fixed censoring time from the RGG distribution with parameter vector . Each can be written as . The right censored log-likelihood function has the formwhere is the survival function of the RGG model, and based on (15), we have

To this end, we use (16) to obtain the non-linear scoring equations as

Similar to the complete data scenario, we employ numerical techniques like the Newton–Raphson method, the Monte Carlo method, or the BB-solve package to compute the MLEs.

5. Testing Procedures

5.1. Testing Procedures from the Statistic

For testing the null hypothesis according to which a sample belongs to (2), whereconsider equiprobable grouping intervals , …, where ; and such asand . If denotes the number of observed grouping into these intervals andthe NRR statistic due to Nikulin [5] and Rao and Robson [7] is defined by

Here, and are the estimated information matrices on non-grouped and grouped data, respectively, and is the vector of the MLEs on initial data. The elements of the vector arewhereand is the number of the model parameters. The distribution of is . To construct the test statistic corresponding to the RGG with a parameter vector , we first calculate the MLEs and the limit intervals . Secondly, the derivatives are derived as

Finally, we obtain the statistic which allows to verify if data belong to the RGG distribution.

5.2. Testing Procedures for the Test Statistic with Right Censorship

The NRR statistic described above was adjustment by Bagdonavičius and Nikulin [8]. Generally, the NRR statistic is established based on the vector , where and are the observed number of failures and expected number of failures to fall into the grouping intervals , and the statistic is defined bywhere refers to the generalized inverse of the well-known covariance matrix . For facilitating the calculation process, this novel NRR statistical test can be expressed as follows:with the quadratic form obtained aswhere is the hazard rate function of the RGG model. Under the null hypothesis , the limit distribution of the statistic is a chi-square with degrees of freedom. For more details on modified chi-square tests, one can see the book of Voinov et al. [19]. For testing the null hypothesis that a right censored sample is described by the RGG distribution, we develop corresponding to this distribution. To this end, we have to compute the MLEs on initial data (see Section 3), the estimated information matrix which can be deduced from the score functions and the estimated limit intervals . To apply this test statistic, the expected failure times to fall into the grouping intervals must be the same for any , so the estimated interval limits are equal towhere and , where is the cumulative rate function of the RGG model. So, the numbers and can be obtained. Then, we can derive (and then calculate) the components of the estimated matrix as follows:and the estimated matrix is derived from the matrix . Therefore, the test statistic can be obtained easily:

6. Assessing and with Some Applications

We performed a significant investigation using numerical simulations in this section to demonstrate the flexibility and effectiveness of the tests suggested in this work. We then used actual data from reliability and survival analysis to run these tests.

6.1. Simulating under the UMLE Method

For simulating under the UMLE, the data were simulated times under the sample sizes , , , , , and . Using the BB algorithm and the R software, the MLEs and their mean square errors (MSEs) are calculated and presented in Table 9. For testing the null hypothesis according to which the data follow the RGG distribution, we calculate the statistical test, and then it is compared with the different empirical levels of rejection of the null hypothesis when where empirical levels of rejection of the null hypothesis are , , and . Table 10 gives the theoretical risk and empirical risk for complete case. The levels simulated for the statistic agree with those corresponding to the theoretical levels of the chi-square distribution with degrees of freedom, and it is noticed after accounting for simulation errors. In light of this, we can state that the test suggested in this study can appropriately adapt the data obtained from a RGG model.

6.2. Simulating under the CMLE Method

For simulating the under the uncensored maximum likelihood method, the data were simulated times under the sample sizes . Using the BB algorithm and the R software, the MLEs and their mean square errors (MSEs) are calculated and presented in Table 11. For testing the null hypothesis according to which the data follow the RGG distribution, we calculate the statistical test, and then it is compared with the different empirical levels of rejection of the null hypothesis when where empirical levels of rejection of the null hypothesis are , , and . Table 12 gives the theoretical risk and empirical risk for censored case. The levels simulated for the statistic agree with those corresponding to the theoretical levels of the chi-square distribution with degrees of freedom, and it is noticed after accounting for simulation errors. In light of this, we can state that the test suggested in this study can appropriately adapt the data obtained from a RGG model.

7. Data Analysis

Three examples from various fields are used to demonstrate the applicability of the proposed paradigm. We utilize to fit the first one’s censored data from a survival analysis to predicted distributions. For the whole data scenario, is built to see if the suggested model can accurately represent the two more occurrences (for more relevant datasets, see Emam et al. [20]).

7.1. Real Applications for Uncensored Data

7.1.1. Times between Failures for Repairable Items Data

The first dataset is given by Murthy et al. [21]. These data have had a great deal of analysis and study, as many researchers and scholars have modelled and analyzed them and drawn many conclusions about them. The data refer to the time between failures for repairable items (see Ibrahim et al. [22] for more details). The data are as follows: 1.43, 0.11, 0.71, 0.77, 2.63, 1.49, 3.46, 2.46, 0.59, 0.74, 1.23, 0.94, 4.36, 0.40, 1.74, 4.73, 2.23, 0.45, 0.70, 1.06, 1.46, 0.30, 1.82, 2.37, 0.63, 1.23, 1.24, 1.97, 1.86, 1.17. Using R software and BB algorithms, we have and . Then, taking, for example, 6 intervals, then , and calculate the Fisher information matrix (FIMx) on the initial data, and we have

Then, by calculating the NRR test statistic , we have . Sincewe can accept the null hypothesis that the times between failures for repairable items data follow the RGG distribution.

7.1.2. Reliability Data

The second dataset includes the reliability data given by Cabarbaye et al. [23]. These data have undergone extensive examination and investigation, as several researchers and academics have modelled, examined, and concluded on them. The data are as follows: 0, 313, 360, 231, 286, 340, 212, 287, 243, 170, 141, 150, 593, 328, 234, 206, 108, 134, 231, 218, 281, 192, 457, 269, 201, 181, 277, 479, 272, 223, 272, 163, 370, 217, 182, 202, 451, 303. Using R software and BB algorithms, we have and . Then, taking, for example, 7 intervals, then , and calculate the FIMx on the initial data, and we have

Then, by calculating the NRR test statistic , we have . Sincewe can accept the null hypothesis that the reliability data follow the RGG distribution.

7.1.3. Strength of Glass Fiber Data

The third dataset includes the strength of glass fiber data given by Smith and Naylor [24]. Additionally, as many researchers and academics have modelled, examined, and drawn several inferences from the strengths of fiber glass data, they have received considerable attention in statistical modelling. The data are as follows: 1.014, 1.081, 1.082, 1.185, 1.223, 1.248, 1.267, 1.271, 1.272, 1.275, 1.276, 1.278, 1.286, 1.288, 1.292, 1.304, 1.306, 1.355, 1.361, 1.364, 1.379, 1.409, 1.426, 1.459, 1.46, 1.476, 1.481, 1.484, 1.501, 1.506, 1.524, 1.526, 1.535, 1.541, 1.568, 1.579, 1.581, 1.591, 1.593, 1.602, 1.666, 1.67, 1.684, 1.691, 1.704, 1.731, 1.735, 1.747, 1.748, 1.757, 1.800, 1.806, 1.867, 1.876, 1.878, 1.91, 1.916, 1.972, 2.012, 2.456, 2.592, 3.197, 4.121. Using R software and BB algorithms, we have and . Then, taking, for example, 7 intervals, then , and calculate the FIMx on the initial data, and we have

Then, by calculating the NRR test statistic , we have . Sincewe can accept the null hypothesis that the glass fiber data follow the RGG distribution.

7.2. Real Applications for Censored Data

7.2.1. Times to Infection of Kidney Dialysis Patients

Consider data of times to infection of kidney dialysis patients (see Klein and Moeschberger [25]). Infection times: 1.5, 3.5, 4.5, 4.5, 5.5, 8.5, 8.5, 9.5, 10.5, 11.5, 15.5, 16.5, 18.5, 23.5 26.5. Censored observations: 2.5, 2.5, 3.5, 3.5, 3.5, 4.5, 5.5, 6.5, 6.5, 7.5, 7.5, 7.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 12.5, 13.5, 14.5, 14.5, 21.5, 21.5, 22.5, 22.5, 25.5, 27.5. Using R software and BB algorithms, we have and . Table 5 gives the values of , and under . Table 13 gives the values of for data of times to infection of kidney dialysis patients.

Then, by calculating the modified NRR test statistic , we have . Sincewe can accept the null hypothesis that the data of times to infection of kidney dialysis patients follow the RGG distribution.

7.2.2. The Bone Marrow Transplant Data

Consider data of times to infection of kidney dialysis patients for 38 patients (see Klein and Moeschberger [25]). Time to death: 1, 86, 107, 110, 122, 156, 162, 172, 243, 262, 262, 269, 276, 371, 417, 418, 466, 487, 526, 716, 781, 1111, 1182, 1199, 1279, 1377, 1433, 1496. Censored observations: 194, 226, 350, 530, 996, 1167, 1330, 1330, 1462, 1602, 2081. Using R software and BB algorithms, we have and . Table 6 gives the values of , and under . Table 14 lists values of for the bone marrow transplant data.

Then, by calculating the modified NRR test statistic , we have . Sincewe can accept the null hypothesis that the bone marrow transplant data follow the RGG distribution.

7.2.3. Strength of Certain Type of Braided Cord Data

We apply the findings from this analysis to real data derived from reliable sources (Crowder et al. [26]) for the third dataset. The forces of 48 pieces of cord that had withstood for a certain amount of time were examined as part of an experiment to learn more about the strength of a specific type of braided cord after the weather. Strength: 41.7, 43.9, 49.9, 50.1, 50.8,51.9, 52.1, 52.3, 52.3, 52.4, 52.6, 52.7, 53.1, 53.6, 53.6, 53.9, 53.9, 54.1, 54.6, 54.8, 54.8,55.1, 55.4, 55.9, 56, 56.1, 56.5, 56.9, 57.1, 57.1, 57.3, 57.7, 57.8, 58.1, 58.9, 59, 59.1, 59.6, 60.4, 60.7. Censored observations: 26.8, 29.6, 33.4, 35, 36.3, 40, 41.9, 42.5. Using R software and BB algorithms, we have and . Table 7 gives the values of , and under . Table 15 gives the values of for strength of certain type of braided cord data.

Then, by calculating the modified NRR test statistic , we have . Sincewe can accept the null hypothesis that the strength of certain type of braided cord data follow the RGG distribution.

8. Concluding Remarks

A novel continuous probability distribution called the Rayleigh generalized gamma (RGG) distribution is introduced and studied in this work, but we will approach it from fresh angles that depart from those often covered by scholars. In order to highlight more practical aspects in the areas of risk assessment and analysis, distributive verification, and its related practical applications on complete data and censored data, we chose to ignore many theoretical results and algebraic derivations, and this is not to say that they are not important. However, by presenting and discussing some novel characterizations based on some related theories, such as characterizations based on the two truncated moments, characterizations in terms of the hazard function, and characterizations based on the basis of the conditional expectation of a function of the random variable, we were able to cover some theoretical aspects of the RGG distribution. By analyzing a collection of commonly used financial indicators, such as the value-at-risk (VAR), tail-value-at-risk (TVAR), tail variance (TV), tail mean-variance (TMV), and mean excess loss (MEL) function, it is possible to analyze and evaluate the risks that insurance firms face. The maximum likelihood estimation approach, the ordinary least squares method, the weighted least squares estimation method, and the Anderson–Darling estimation method are all described as estimate strategies for the major important risk indicators. These four methods were used and applied for the actuarial evaluation, and a comparison is presented for determining the best method under a simulation study (for artificial assessment) and under an application to insurance claims data. The simulation is performed under three degrees of confidence, considering various sample sizes. With regard to the application to insurance claims data, the following results can be highlighted:(1)For all risk assessment methods:(2)Under the RGG model and the MLE method: The VAR is monotonically increasing starts with 3496.50391 and ends with 3496.50391, the TVaRq in monotonically increasing starts with 4558.57622 and ends with 6919.75575. Under the GG model and the MLE method: The VAR is monotonically increasing starts with 3278.09435 and ends with 8410.23646, the TVaRq in monotonically increasing starts with 859.59031 and ends with 9790.19896.(3)Under the RGG model and the OLSE method: The VAR is monotonically increasing starts with 3632.99105 and ends with 7484.76402, the TVaRq in monotonically increasing starts with 4996.48683 and ends with 4996.48683. Under the GG model and the OLSE method: The VAR is monotonically increasing starts with 3594.53397 and ends with 3594.53397, the TVaRq in monotonically increasing starts with 5576.34069 and ends with 11789.14593.(4)Under the RGG model and the WLSE method: The VAR is monotonically increasing starts with 3526.87158 and ends with 6849.20863, the TVaRq in monotonically increasing starts with 4718.69291 and ends with 7414.04414. Under the GG model and the WLSE method: The VAR is monotonically increasing starts with 3496.50391 and ends with 6431.29618, the TVaRq in monotonically increasing starts with 4558.57622 and ends with 6919.75575.(5)Under the RGG model and the AE method: The VAR is monotonically increasing starts with 3540.62976 and ends with 6859.01979, the TVaRq in monotonically increasing starts with 4731.51551 and ends with 7422.636. Under the GG model and the AE method: The VAR is monotonically increasing starts with 3458.40688 and ends with 9237.13337, the TVaRq in monotonically increasing starts with 5236.20289 and ends with 10792.44936, the TV .(6)For the RGG model: nearly for all values, the OLSE method is recommended since it provides the most acceptable risk exposure analysis; then, the MLE method is recommended as a second one. For the GG model: nearly for all values, the OLSE method is recommended since it provides the most acceptable risk exposure analysis; then, the MLE method is recommended as a second one.(7)For all In comparing various q values and risk methods, it has been found that the RGG (Risk Gamma-Gamma) model outperforms the GG (Gamma-Gamma) model. The RGG distribution demonstrates superior performance when used to model insurance claims reimbursement data and assess actuarial risk, even though both probability distributions have an equal number of parameters. Given this observation, it is expected that actuaries and practitioners will show significant interest in adopting the RGG distribution for future actuarial and applied research endeavors.(8)For all risk estimation methods, TV , TMV , and MEL are monotonically decreasing.

In the framework of distributional validation and statistical hypothesis tests for the complete data, the well-known Nikulin–Rao–Robson statistic (), which is based on the uncensored maximum likelihood estimators on initial non-grouped data, is considered under the RGG model. The statistic is assessed under a simulation study and under three real datasets as well, and the following results can be highlighted:(i)For the uncensored times between failures for repairable items data: ; therefore, we can accept the null hypothesis that the times between failures for repairable items data follow the RGG distribution.(ii)For the uncensored reliability data: ; therefore, we can accept the null hypothesis that the reliability data follow the RGG distribution.(iii)For the uncensored strength data: ; therefore, we can accept the null hypothesis that the strength data follow the RGG distribution.

In the framework of distributional validation and statistical hypothesis tests for the censored data, a modified NRR statistic , which is based on the censored maximum likelihood estimators on initial non-grouped data, is considered under the RGG model. The statistic is assessed under comprehensive simulation study and under three real datasets, and the following results can be highlighted:(i)For the censored times to infection of kidney dialysis patients data, ; therefore, we can accept the null hypothesis that the data of times to infection of kidney dialysis patients follow the RGG distribution.(ii)For the censored bone marrow transplant data, ; therefore, we can accept the null hypothesis that the bone marrow transplant data follow the RGG distribution.(iii)For the censored strength of certain type of braided cord data, ; therefore, we can accept the null hypothesis that the strength of certain type of braided cord data follow the RGG distribution.

Appendix

#p represents parameters #dd represents score functions library(BB) library(nleqslv) ¡- function(p){n = 30; 3b6¡- 1-(1 + P[1] z) exp(-P[1] z) b¡-(3b6^∧theta)/(1-3b6^∧ theta) ¡- 1-exp(-b^∧2) dd ¡- rep(NA, length(p)) dd[1]¡-sum(delta ((2/P[1])-z-((P[1] (2 P[2]-1) exp(-P[1]z))/(3b6))-((2P[1]P[2]z^∧2exp(-P[1]z)b^∧2)/(3b6(1-3b6^∧(P[2])))))-sum((1-delta)((2P[2]P[1]z^∧2exp(-P[1]z)b^∧2exp(-b^∧ 2))/(3b6(1-3b6^∧(P[2]))(1-))) dd[2]¡-sum(delta (1/P[2] + 2 ln()-23b6 ^∧ 2ln(3b6)(1-b)))-sum(1-delta)((2P[1]3b6^∧2ln(3b6)(1+b)exp2061(-b^∧2))/(1-)) dd} p0 ¡- rep(0.8, 1.2) ##e can change it## (p0) BBsolve (par = p0, fn = ) BBsolve (par = p0, fn = )$par nleqslv (x = p0, fn = )$ r¡-round (1 + 2.303 log(n, 10)) a¡-1: (r − 1);; a0¡- + 1e − 40; ar ¡ -1; E_j¡- 1: (r) for (j in 1: r){E_j[j] ¡- (j/(r)) sum(H)} U_j¡- 1: r for (j in 1: r) {U_j[j]¡-0} for (i in 1: n){ if (x[i]¡ a_j[1]) U_j[1]¡ -U_j[1] + 1} for (i in 1: n) { if (x[i]¿ = a_j[r-1]) U_j[r] = U_j[r] + 1} for (j in 2: (r − 1)) { for (i in 1: n) { if ((x[i]¡a_j[j]) & (x[i]¿ = a_j[j-1])) {U_j[j]¡-U_j[j] + 1} }} e_j¡- sum(H)/r X_¡- ((U-e_j)^∧2)/U_j X2¡- sum(X_) Z_j¡- (U-e_j)/sqrt(n) C1j¡-1/n sum(delta ((1/P[1])-(si/(1-exp(-P[1] si))))) C2j¡- 1/n sum(delta ((P[3]/P[2])-((P[1] P[3] P[2] ^∧(P[3]-1) zi)/(1-exp(-P[1] si)))) Aj¡- U_j/n j¡- sum(Cij Aj − 1 Z_j) Y2n¡- X2 + Q ca¡-qchisq (0.95, r − 1) ca if (Y2n¡ca) {print(“H0 est acceptée”)} else print(“H0 est rejeté e”) Y2n

Data Availability

The dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was funded by Researchers Supporting Project (RSP2023R488), King Saud University, Riyadh, Saudi Arabia.

References

A. Amini, A. Abdollahi, M. A. Hariri-Ardebili, and U. Lall, “Copula-based reliability and sensitivity analysis of aging dams: adaptive Kriging and polynomial chaos Kriging methods,” Applied Soft Computing, vol. 109, Article ID 107524, 2021.
View at: Publisher Site | Google Scholar
M. El-Morshedy, M. S. Eliwa, A. Al-Bossly, and H. M. Yousof, “A new probability heavy-tail model for stochastic modeling under engineering data,” Journal of Mathematics, vol. 2022, Article ID 1910909, 20 pages, 2022.
View at: Publisher Site | Google Scholar
R. C. Gupta, P. L. Gupta, and R. D. Gupta, “Modeling failure time data by Lehman alternatives,” Communications in Statistics-Theory and Methods, vol. 27, no. 4, pp. 887–904, 1998.
View at: Publisher Site | Google Scholar
H. M. Yousof, A. Z. Afify, G. G. Hamedani, and G. Aryal, “The Burr X generator of distributions for lifetime data,” Journal of Statistical Theory and Applications, vol. 16, no. 3, pp. 288–305, 2017.
View at: Publisher Site | Google Scholar
M. S. Nikulin, “Chi-square test for normality,” in Proceedings of the International Vilnius Conference on Probability Theory and Mathematical Statistics, vol. 2, pp. 199–122, Vilnius University: Vilnius, Lithuania, January 1973.
View at: Google Scholar
M. S. Nikulin, “Chi-square test for continuous distributions with shift and scale parameters,” Theory of Probability and Its Applications, vol. 18, no. 3, pp. 559–568, 1974.
View at: Publisher Site | Google Scholar
K. C. Rao and D. S. Robson, “A chi-square statistic for goodness-of-fit tests within the exponential family,” Communications in Statistics-Simulation and Computation, vol. 3, no. 12, pp. 1139–1153, 1974.
View at: Publisher Site | Google Scholar
V. Bagdonavičius and M. Nikulin, “chi-squared goodness-of-fit test for right censored data,” International Journal of Applied Mathematics & Statistics, vol. 24, pp. 30–50, 2011.
View at: Google Scholar
V. Ravi and P. D. Gilbert, “BB: an R package for solving a large system of nonlinear equations and for optimizing a high-dimensional nonlinear objective function,” Journal of Statistical Software, vol. 32, no. 4, pp. 1–26, 2009.
View at: Google Scholar
J. Wirch, “Raising value at risk,” North American Actuarial Journal, vol. 3, no. 2, pp. 106–115, 1999.
View at: Publisher Site | Google Scholar
C. Acerbi and D. Tasche, “On the coherence of expected shortfall,” Journal of Banking & Finance, vol. 26, no. 7, pp. 1487–1503, 2002.
View at: Publisher Site | Google Scholar
D. Tasche, “Expected shortfall and beyond,” Journal of Banking & Finance, vol. 26, no. 7, pp. 1519–1533, 2002.
View at: Publisher Site | Google Scholar
E. Furman and Z. Landsman, “Tail variance premium with applications for elliptical portfolio of risks,” ASTIN Bulletin, vol. 36, no. 2, pp. 433–462, 2006.
View at: Publisher Site | Google Scholar
A. R. Alanzi, M. Q. Rafique, M. H. Tahir, W. Sami, and F. Jamal, “A new modified kumaraswamy distribution: actuarial measures and applications,” Journal of Mathematics, vol. 2022, Article ID 4288286, 18 pages, 2022.
View at: Publisher Site | Google Scholar
K. Zhou and L. Gao, “Constructing a risk assessment framework for building integrated photovoltaic (BIPV) projects from the perspective of four-dimensional risk,” Journal of Mathematics, vol. 2022, Article ID 6062238, 17 pages, 2022.
View at: Publisher Site | Google Scholar
M. S. Hamed, G. M. Cordeiro, and H. M. Yousof, “A new compound lomax model: properties, copulas, modeling and risk analysis utilizing the negatively skewed insurance claims data,” Pakistan Journal of Statistics and Operation Research, vol. 18, no. 3, pp. 601–631, 2022.
View at: Publisher Site | Google Scholar
H. M. Yousof, W. Emam, Y. Tashkandy, M. M. Ali, R. Minkah, and M. Ibrahim, “A novel model for quantitative risk assessment under claim-size data with bimodal and symmetric data modeling,” Mathematics, vol. 11, no. 6, p. 1284, 2023.
View at: Publisher Site | Google Scholar
P. Artzner, “Application of coherent risk measures to capital requirements in insurance,” North American Actuarial Journal, vol. 3, no. 2, pp. 11–25, 1999.
View at: Publisher Site | Google Scholar
V. Voinov, M. Nikulin, and N. Balakrishnan, Chi-Squared Goodness of Fit Tests with Applications, Academic Press, Elsevier, Cambridge, MA, USA, 2013.
W. Emam, Y. Tashkandy, H. Goual et al., “A new one-parameter distribution for right censored bayesian and non-bayesian distributional validation under various estimation methods,” Mathematics, vol. 11, no. 4, p. 897, 2023.
View at: Publisher Site | Google Scholar
D. P. Murthy, M. Xie, and R. Jiang, Weibull Models, John Wiley & Sons, Hoboken. NJ, USA, 2004.
M. Ibrahim, E. M. MacFarlane, G. Matteo et al., “Functional cytochrome P450 1A enzymes are induced in mouse and human islets following pollutant exposure,” Diabetologia, vol. 63, no. 1, pp. 162–178, 2020.
View at: Publisher Site | Google Scholar
A. Cabarbaye, J. Faure, R. Laulheret, and A. Cabarbaye, “Interest of a global optimization tool for reliability models adjustment and systems optimization,” in Reliability, Risk, and Safety, Three Volume Set, pp. 617–624, CRC Press, Boca Raton, FL, USA, 2009.
View at: Google Scholar
R. L. Smith and J. Naylor, “A comparison of maximum likelihood and Bayesian estimators for the three-parameter Weibull distribution,” Applied Statistics, vol. 36, no. 3, pp. 358–369, 1987.
View at: Publisher Site | Google Scholar
J. P. Klein and M. L. Moeschberger, Survival Analysis: Techniques for Censored and Truncated Data, vol. 1230, Springer, New York, NY, USA, 2003.
M. J. Crowder, A. C. Kimber, R. L. Smith, and T. J. Sweeting, Statistical Analysis of Reliability Data, Chapman &Hall/CRC, Boca Raton, FL, USA, 1991.

Copyright

Copyright © 2023 Yusra Tashkandy et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies