Abstract
This research study proposes the inclusion of randomness or an error term in the modified Lee–Carter model, which improves the traditional Lee–Carter model for modeling and forecasting mortality risk for years in the actuarial science field. While the modified Lee–Carter model points out some of its common shortcomings , it has no distributional assumption that has been placed on the error/disturbance term. Incorporating a Gaussian distributional assumption on the error term is proposed, and then, the deep learning technique is used to obtain the parameter estimates. It is a departure from the traditional singular value decomposition estimation technique for estimating the parameters in the model. Finally, the Bühlmann credibility approach is incorporated into the model to determine its forecasting precision compared to the classical Lee–Carter model before applied in actuarial valuations.
1. Introduction
The Lee–Carter model has become the benchmark stochastic mortality model applied in many fields when modeling mortality rates, hedging longevity risk, forecasting, and predicting mortality risk. Many actuaries, statisticians, and demographers use the model in stochastic risk modeling from various actuarial applications. The conventional [1] model is defined as follows:where and are the systematic mortality rate and time, respectively, having two constraints of and .
The random errors and known as mortality indices are observed during the period. The stated two constraints ensure that the LC model is identifiable in its application. It is not possible to note the values of during computation; hence, the concept of the singular value decomposition method can be used when estimating the values of the unknown quantities (, , and all values of and ).
More interestingly, some mortality modeling papers have misunderstood the model in its operations, for instance, denoting which is commonly defined as central death rates for a given life aged exactly for a given time period of years shown by [2–5] as
While the model in equation (2) can be pretty confusing since it states that the general mortality rate is the death rate from the randomness property of . This concept is captured in several papers, such as in [6], Naryongo [7], and [8], where [1] is treated as having no distribution on the random error as shown in equation (1). Furthermore, this might be a huge problem as it states that is dependent completely when both are calculated through the similar random variable , which are central death rates that are dependent across all age groups with the assumption of and being dependent. This assumption becomes important when determined by the similar mortality trend since the rates are independent across all ages [9–11] and [12].
Today, several extensions and applications have been made, making actuarial science literature rich in terms of mortality modeling techniques. Also, the freeware statistical package has included many packages, namely, “,” “,” “,” and “,” thus helping in forecasting the rates of future mortality and at the same time fitting a model of time series especially to the predicted rates of the mortality index, see [2, 4, 13–17]. Deep learning mortality modeling has been conducted in [18], thus making it one of novel methodologies to deal with data paucity in developing countries.
In this research study, the concept of placing a distributional assumption on the error or disturbance term is proposed. It is suggested that a normal distributional assumption on the error term be placed, and then, the deep neural network techniques be used to train and obtain the parameter estimates. Besides, the concept of the Bühlmann credibility approach is introduced to get the mortality forecast. Using the credibility technique, an approach proposed in [19] is found to enhance the forecasting model precision.
2. Modeling of the Modified Lee–Carter Model
From equation (1), it is proposed to define the distribution of the random error term, , to be a normal density with a mean and variance of and , respectively, at every age . Since there exists a constraint on all the unobserved random mortality index of directly, this ensures that the proposed model used in modeling is identifiable. The next step is to estimate the parameters in an environment of limited data availability like the Kenyan population.
2.1. Deep Neural Network Estimation
2.1.1. Model Parameters
It is suggested that the deep neural network technique be used to train and obtain the parameter estimates of the model. The mean-squared error is a loss function used in deep neural networks given bywhere and are actual and predicted values, respectively, from the deep neural network in observations.
With one input layer having two nodes and a hidden layer of three nodes, its output node will have a single node [20]. After training the data, the imputed data in the input layer will have nodes within the hidden layer taking the corresponding values from its input layer before multiplying by a weight and adding a corresponding bias of , with 1 representing the layer and representing its corresponding node [21].
From the above two-tier artificial neural network, we want to train the parameters in equation (1) to estimate its parameters. The ANN architecture can be presented as;
The above backpropagation and feedforward processes will be repeated many times until the errors are negligible when estimating the parameters from the ANN architecture shown in Figure 1. Using python software, we code and train to estimate the parameters for estimations.

Deep neural networks often capitalize on the ANN component. They work so well by improving a model since each node in the hidden layer makes both associations and grades the importance of the input in determining the nature of the output. Therefore, the deep network has multiple hidden layers. In addition, “deep” refers to the layers of the model being multiple layers deep.
2.1.2. Activation Functions
Definition 1. Let be the Rectified Linear Activation Unit () activation function given bywhere , with the values ranging from 0 to .
defines the output functions of deep neural networks.
Definition 2. Let be the sigmoid of the activation function within a deep neural network defined by
for , with values ranging from 0 to 1.
This defines the output functions of deep neural networks.
These two activation functions are used to provide nonlinearity without which deep neural networks cannot model nonlinear relationships of the systematic mortality risk.
2.1.3. How the above Activation Function Works
An activation function is a mathematical function that is applied to the output of a neuron in a neural network. Its purpose is to introduce nonlinearity into the network, allowing it to learn more complex patterns. Two activation functions will be used, namely, the sigmoid function and the ReLU (rectified linear unit) function. Each of these functions has a slightly different form and characteristics, but all of them serve to squash the output of a neuron into a range of values, which is more useful for the learning process.
In our case, the sigmoid function maps any input value to a value between 0 and 1, making it useful for modeling probabilities. The ReLU function maps any negative input value to 0 and any positive input value to itself, making it useful for introducing nonlinearity into the network without introducing additional complexity.
In summary, the activation function is an important part of the architecture of a neural network, and it plays a key role in the network’s ability to learn and generalize to new data. The output data are the force of mortality, , calculated from the input data of survival probabilities of derived from the life table values for artificial neural networks.
2.2. Incorporation of the Bühlmann Credibility Approach
2.2.1. Mathematical Preliminaries
From a classical approach, the latest prediction is defined aswhere is defined as the data sample mean, whereas is its general prior mean .
The Bühlmann credibility estimate or factor depends on the size of the sample and the ratio of the expected present value of variance (EPV), and the variance of hypothetical mean (VHM) ratio is defined by , which can be used in mortality modeling, see [22–24].
To be more precise, the value of varies with both the values of and in a way that increases as the number of sample size increases of the data. The best estimate for the future value of , where .
To apply the above concept of the Bühlmann Credibility approach to the model, is used to model mortality. Figure 2 shows the curve of the expected value of against the time of Kenyan data for both males and females at different ages.

(a)

(b)
To eliminate the downward trend, let for the values of and during forecasting. The Bühlmann credibility approach is used to compare the values with those of conventional [1], and it is assumed that and are all independent. Secondly, for values of , the distribution of and values of and for depends on the risk parameter , and finally, are independent for each of .
Now, the Bühlmann Credibility estimate is applied as follows:for values of and the value of and is the ratio of the expected present value of variance and the variance of hypothetical mean . In terms of parametric estimations, and .
2.2.2. Incorporating the Bühlmann Credibility into the Modified LC Model
The classical paper assumption of the overall mortality trend is taken following a simple random walk with a drift for the prediction of mortality: = , where trend errors of the time follow Gaussian and are independently and identically distributed (.) such that and , see [8]. The error terms in our proposal are . of white noises satisfying the Martingale structure in the following equation:where provides the information about the process up to a time and covariances of the two random errors are .
Considering a random variable that denotes the central death rates between time and that means ,where the value of and and and from equation (1), it is leading to , which follows the sum of independent variables of Gaussian distribution. From the conditional expectation as well as variance of , it is easy to apply the Bühlmann credibility such that and Var in the same order. Since the expectation of value of the stated hypothetical mean, , and the estimated value of denoted as is given by
From equation (10), the variance process expected value denoted by can be estimated aswhile the hypothetical mean variance . This can be estimated through
By writing the equation in the form of , it is easy to estimate the value of as , where is the credibility factor. More data availability from the Kenyan population means more emphasis is put on the values as the credibility estimate.
3. Data Analysis and Interpretation
To illustrate how this proposed model works, we apply it to Kenyan mortality data before making statistical inferences/deductions while noting the variations from the classical [1] model. To gain a firm conclusion, the population is studied under central death rates for males and females. The consolidated population is between 20 and 100 years after projecting it from 2020 to 2050 and using ten-year age lags. The lag time values of , as well as , are used.
The statistical R package known as “demography” is used on the model to obtain estimates for the values of , , and after data training. The estimated values values of and are tabulated in Tables 1 and 2, respectively, for both males and females and consolidated mortality rates.
The proposed inference is applied when fitting models to both males and females and combined rates of mortality. Moreover, the use of the “” package is applied when obtaining the proposed deep learning generated estimates before reporting the estimates for ’s, , and in Tables 1 and 2, respectively, for both males and females and consolidated mortality rates. Although the estimates for the value of obtained from the novel method is are similar to those values derived from the traditional LC method, estimates for are different for both methodologies since the new method does not assume the value of .
The proposed unit root test is also applied to both males and females and combined rates of mortality rates, where we will apply with the value of and the values of , which is illustrated in equation (11). It is important to take a note that equation (11) is given as as T.
All of the obtained estimates of variance, test statistics, and p values are recorded in Tables 3–6, respectively, for males, females, and consolidated mortality rates. According to the illustrations, it easy to note that these quantities are pretty good for L selection. Besides, novel root testing rejects the hypothesis of the unit root for females and consolidated mortality rates but refuses to reject the hypothesis of the unit root especially for male mortality rates.
Ultimately, we examine the soundness/robustness test of the stated deduction on the unit root hypothesis for the mortality index by rerunning the above unit root test for female, male, and consolidated population ages between 20 and 100 years.
4. Numerical Results and Forecasts
To examine the finite sample performance of the unit root test and estimators, the model is considered for the value of , , and estimates obtained from mortality rates as tabulated in Table 7 for the traditional model. The estimates are much higher than under deep learning methods.
To examine the finite sample performance of the unit root test and estimators, the model is considered for the value of , , and estimates obtained from mortality rates, as tabulated in Tables 1 and 2. The values of are independent of . The values of and are tabulated in Tables 1 and 2. With a sample of 10, 000 that are randomly drawn samples from the model, with a sample size of and 100, the value of is considered to yield the values.
Tables 1 and 2 show the estimates of parameters determined under deep learning techniques after fitting the model, and SE is the standard error of the model for both males and females and consolidated rates, respectively. In addition, Table 2 shows the consolidated rates of mortality for ages between 20 and 100 under deep learning, which are lower than those of the traditional method.
By computing the estimators for values of , , and within the settings illustrated above before reporting the mean as well as standard deviation of the above estimators, Tables 1 and 2 show that estimators for those values of s, , and are all correct. The obtained the unit root test has been investigated looking under the stated settings.
We also investigate with the values of and that has been denoted by the value of in the equation as well as the true value of being denoted in equation as . The variance values of estimators and empirical sizes of the novel unit root test illustrated in the lower panel are tabulated in Tables 3–6 showing the larger size than that of the nominal level and the selection of limit L with an effect on the test. In addition, the size becomes more precise as the value of T becomes large.
Table 3 shows the variance estimates, , Z values, and test statistics for and , where the value of .
Table 4 shows the variance estimates, , Z values, and test statistics for and , where the value of .
In summary, Table 3 shows the consolidated rates of mortality for ages between 20 and 100 under deep learning, which are lower than those of the traditional method.
Table 5 shows the variance estimates, , Z values, and test statistics for and 2, where the value of .
Table 6 shows the variance estimates, , values, and test statistics for and 2, where the value of . In summary, Table 5 shows the consolidated rates of mortality for ages between 20 and 100 under deep learning, which are lower than those of the traditional method.
5. Conclusions and Recommendations
The above results have shown that using deep neural network techniques to estimate parameter values improved from the low levels of standard error (a measure of model volatility). Also, incorporating the Bühlmann credibility approach into the model has helped foster the model’s accuracy.
This method is vital when forecasting mortality for future dates for higher levels of precision, while the use of data in the training of deep neural networks is essential when modeling mortality and forecasting because the choice of parameter estimation methods plays a vital role in showing the model’s accuracy.
For policymakers, when modeling events of concern, such as the death rates of a population or infectious diseases, the choice of the parameter estimation method is critical to the model’s accuracy. However, adequate data help improve the “quality” of prediction/forecasting under the incorporated Bühlmann credibility approach since data are the king in systematic mortality risk modeling and actuarial valuations.
Data Availability
Data are available on request.
Conflicts of Interest
The author declares that there are no conflicts of interest concerning the publication of this paper.