Abstract
The cost of long-term care (LTC) is one of the huge financial risks faced by the elderly and also is a significant challenge to the social security system. This article establishes a piecewise constant Markov model to estimate the dynamic health transition probability and based on actuarial theory to calculate the long-term care cost, in contrast to the static or nontransferable state hypotheses in traditional models. Using the Chinese Longitudinal Healthy Longevity Survey, this article found that the average cost of LTC for the elderly varies greatly due to gender and health conditions, the cost for women may increase by up to 75% compared to that for men, and the cost for unhealthy elderly may more than double compared to that for healthy elderly. Furthermore, if LTC is included in the medical insurance system, in theory, women's average pay price will be more than twice that of men.
1. Introduction
Longevity risk and health shocks are intertwined with each other and impose significant challenges on financial budgets for the aged [1]. China is the most populous country in the world, and the population aged 65 and over will be accounted for 25% of the total population by 2030, of which about 68 million will experience some degrees of disabilities; approximately 18.6% may require LTC for daily living activities (World Population Prospects, 2019) [2]. At the same time, family miniaturization and empty nests have weakened the ability and willingness of traditional families to care the elderly at home [3, 4]. It is estimated that the average population of each family is 2.62 persons, 0.48 persons less than 3.10 in 2010. Empty-nest family has gradually become the main form of elderly family. In 2000, the proportion of empty-nest families was 1/5, and in 2020, it was close to 1/2 (7th national census). In this context, the intervention of social elderly services is inevitable, of which the LTC system is the top priority. LTC system, according to HIAA’s (Health Insurance Association of America) definition, which provides LTC for the disabled and semidisabled, has played a significant role in alleviating the care burden of disabled elderly families and improving the elderly life quality in developed countries and has also become an important part of the social security systems in China. Since being listed in the “Healthy China 2030 planning outline” in 2016, the LTC system has currently been piloted in 29 cities across the country.
Realistic urgent needs and a series of solutions have created favorable conditions for exploring the LTC system and related academic research has gradually increased. However, existing research focuses more on system design, but quantitative research, especially on LTC cost, is not very sufficient in using Chinese population health data. LTC is long-lasting, and the care costs incurred are also large. The increase in the elderly disabled population will inevitably increase the pressure on care funding. If there is no more reliable and diversified source of funds, it will be difficult for families or societies to independently bear the financial difficulties caused by the pension. Chinese LTC pilot program has been rapidly advanced and has achieved initial results; however, the sustainability and adequacy of LTC funds have yet to be tested. The amount of LTC funds depends on the LTC cost, and the key is health transition probability estimation. Health transition probability means the possibility of transfer in different health states, such as transfer from healthy to unhealthy and recovering from unhealthy to healthy [5]. This article attempts to build a novel alternative method to estimate health transition probability matrices and, on this basis, to calculate the LTC cost and cost allocation schemes, which not only offers information for the elderly health but also provides a reference for the LTC system.
LTC cost estimation usually includes static and dynamic methods according to whether the health status can be transferred or not. The static method assumes the health states are irreversible and absorbed, namely, one-way or one-step transition method without considering body function improvement or rehabilitation [6–9]. The irreversible feature enabled cost estimation as a kind of annuity and was widely used in the early stages due to its simple and convenient calculation, which is also its weakness [10, 11]. In fact, our recent survey found that a proportion of disabled or semidisabled people whose health can be improved with the development of medical technology, especially among the young elderly. For example, according to the Chinese Longitudinal Healthy Longevity Survey (CLHLS) in 2017, the ratio of recovery for disabled or semidisabled aged 65–70 years old reaches 0.20, and the corresponding women ratio is 0.19. Therefore, the static method ignoring the state transferability may have a large deviation. Another weakness is that the result is average without considering initial health status, which means healthy and unhealthy people have the same trajectory of health changes and the same care cost. Obviously, its results will fail to meet the actuarial requirements [12–14]. Therefore, dynamic methods are proposed to resolve these problems.
Dynamic methods allow that people’s health can be transferred, are theoretically superior, and are widely used in fields of demography, medicine, and sociology [15–17]. Dynamic methods mainly include generalized regression and the Markov process. Generalized regression uses multiple population characteristics and different models to fit transition probability, which is flexible in form and can better fit and smooth the solution process. In fact, there may be problems of inconsistency and subjectivity in variable selection and model setting, such as the Tweedie model [18], Gaussian model [19], and ordinal multicategorical model [20, 21]. More importantly, due to the limited and detailed individual information available, the regression model has commonly been used to estimate one-period transition probability, assuming that the health states are unchanged to obtain the LTC cost. In this way, except for the one-period transition probability, the other calculation process is still similar to the static method, and the result mitigates forecasting accuracy.
Markov process, compared with traditional methods, has a theoretical advantage in handling the transition probability of multiple states [22, 23]. However, due to model complexity and insufficient data available, its development is limited or inappropriate assumptions based on time homogeneity [24–26]. Time homogeneity means that the transition probability is only related to the time interval and irrelevant to the initial age, but in fact, non-time-homogeneous characteristics of health transition have been verified, so this assumption is more suitable for severe illnesses whose rehabilitation may not be related to age [27–29] or be used to roughly estimate [30, 31]. The hypothesis of time homogeneity means that the instantaneous change in health status is constant, and the transition probability is only related to the time interval. That is to say, no matter how old the starting age is, the health status changes the same in the same time interval. Obviously, this assumption is more suitable for no recovery, serious illness, or short-term forecast.
In this study, considering assumption rationality and calculation feasibility, we propose an alternative approach based on a multistate transition model to estimate LTC cost. Specifically, we calculate three-year transition probability by tracking sample health changes during a two-wave survey, then estimate multiperiod transition matrices based on the piecewise Markov process, and finally calculate LTC cost and cost allocation schemes based on actuarial theory. Among them, piecewise constant probability and age queue dislocation multiplication are used to develop non-time-homogeneous methods with limited data. Our research design is shown in Figure 1, written as the product of three terms, transition probability matrices, cost estimation, and cost allocation. Sections 2.1 and 2.2 provide details about the data source and state definition. Section 2.3 describes our approach to estimating probability matrices, cost, and cost allocation. The results are presented in Section 3 and discussion is in Section 4.

2. Data and Methods
2.1. Data Sources
Two types of data are used in the study. The first type comes from the sixth and seventh waves of CLHLS in 2014 and 2017. The CLHLS follows the principle of strict random sampling and covers 23 out of 31 provinces in China, and its data quality is generally recognized by the international and domestic academic circles and has become one of the projects with the largest sample, rich data information, and huge research potential (for more CLHLS’s information, refer to https://opendata.pku.edu.cn/). According to Markov property, the transition probability is only related to the current health state, not the prestate, so only two years of data are needed and the survey data in 2014 as the base period. Since the elderly are the main population with LTC needs, this article deleted samples who were younger than 65 years old and 820 samples that were not tracked, including 8945 samples, of which 2879 samples had died before 2017 and the survival sample size is 6066. The average age is 79.26 years old, the women sample accounts for 52%, and the rural elderly account for 51%, and others are shown in Table 1. The second type of data comes from the China Life Insurance Mortality Table (China Association of Actuaries, 2016), which is used to calculate the survival rate of all age groups when the annual level premium is obtained.
2.2. Health States
LTC may determine the eligibility and amount of payment according to the health states. According to the LTC definition of the American Insurance Association and the classification standards that are widely used in academia (Health insurance association of America; [9, 20, 32, 33], we use three indexes, including the Activities of Daily Living Scale (ADLs), Instrumental Activities of Daily Living Scale (IADLs), and cognitive ability (CA) to define health states by convention. ADLs contain six items, including bathing, dressing, toileting, indoor transferring, continence, and feeding; IADLs include eight items; cognitive ability includes 27 items. Referring to the common evaluation standards in the world and considering the generality and computational complexity, based on the three indexes, the health state is divided into four levels: health (denoted as state 1), health impairment (denoted as state 2), disability (denoted as state 3), and death (denoted as state 4). State 1 means no any obstacle in all three indexes; state 3 is defined with three or more obstacles in daily activities or less than 16 cognitive function scores, and the complementary state is defined as state 2. The health state and definition are shown in Table 2.
2.3. Strategy for Transition Probability Matrices
2.3.1. Markov Process and Transition Probability
Markov process, as kind of random process with Markov property, is used to study the dynamic state space of discrete event. If is a random process, is state space, for and , the distribution function of under is only related to , but not to , (that is, , which is known Markov property or no memory, is called a Markov process.
Transition probability indicates the probability of state at the age of turning to state at the age of , namely,where refers to age, refers to state at the age of , refers to state at the age of , and refers to the high age. States 1, 2, and 3 are transferable, and state 4 is absorbed state. All state transitions are shown in Figure 2.

Generally, it is assumed that is only related to the state at the age of rather than to the previous and be regarded as a four-state discrete Markov process.
The transition probabilities by different initial states form a transition probability matrices as follows:
The first line represents the probability from state 1 transiting to each state after years; the second and third lines are the transition probabilities of states 2 and 3. Since state 4 (death state) is an irreversible state, the last line always is , and the sum of each row is 1. The matrix can also be written as a column matrix, , , , and are column vectors.
2.3.2. Time-Homogeneous Markov Process
Transition intensity indicates the instantaneous transition from state to state , , , and transition intensity matrix indicates transfer intensity among states.
When only two states and irreversible,
When they are multistate and transferable, multiple possible paths exist, , , expressed by the following matrix:where
which indicates that transition probability is a function of time interval and is not related to the initial age, called time homogeneity.
2.3.3. Piecewise Constant Markov Process
Time homogeneity assumption is convenient for the expression of the probability function, but in most cases, it is not suitable for real application. In this article, considering calculation feasibility and assumptions reasonableness, we propose piecewise constant transition intensity; i.e., transition intensity is constant within a year but differs in different years, which can be expressed as follows:
For example, when calculate in , possible transfer paths and probabilities are shown in Figure 3.

This means that Moreover, other elements in are calculated
In the same way, -year probability matrix can be expressed as follows:
It should be noted that one-year transition probability matrices in (8) are usually difficult to obtain due to long-span available longitudinal insufficient, we use two-wave cross-sectional data and age cohort dislocation multiplication to solve this problem, which means that represents transition probability matrix at , represents transition probability matrix at in the same cross section, and by analogy can be estimated.
2.4. LTC Total Cost
For conveniently understanding, we first calculate two-year cost as an example to illustrate. means initial health state 1, payment condition of state 3, payment amount , and two-year period. As shown in the left in Figure 3, the possibility of initial state 1 transiting to each state at the end of the first year is , then the present value of the cost in the first year will be , where means the discount rate. States at the end of the first year may transit to state 3 in the second year as shown in the right in Figure 3; the transition probabilities will be . Then the present value of the cost in the second year will be as follows:, where the value of is exactly in the first row and the third line in . Therefore, we obtain
Then, we deduce -year period cost , indicating the initial state at the age of , payment amount is when state is at the age of , and in period of years.where is the value in th line and the th row in probability matrix .
2.5. Cost Allocation
Further, if LTC cost is allocated through LTC insurance, then according to actuarial theory, we can derived calculation formulas. Single premium and annual level premium are two common premium payment methods, both of which have an actuarial present value equal to the total cost. The difference is that the single premium refers to paying all the premiums at one time, and the annual level premium refers to paying the same amount of premium each year for a certain period. Specific steps are as follows:(1)-year single premium indicates the initial state at the age of , payment amount is when state is at the age of and in insurance period of years: where is the value in th line and the th row in probability matrix .(2)-year single premium with the payment postponed for years : the other assumptions are the same as those of . Since the probability from state at the age of to different health states years later is , the expected payment for each state can be calculated as (from Equation (10)); then(3)-year level premium , indicating the premium paid in equal amount each year as long as they survive: the other assumptions are the same as those of . refers to the probability the state is 4 (i.e., death state) at the age of when the state is at the age of . refers to the survival probability; i.e., the insured only make payment in survival condition and the data can be acquired from the life table.
3. Results
According to the previous research flow, first calculate the health transition probability matrix, and then estimate the LTC cost data and cost allocation schemes. The calculation involves two kinds of data. One is CLHLS, which is the main data source and used to calculate the health transition probability matrices and total cost, and the other is China Life Insurance Mortality Table (CLIM), which is used to calculate the annual level premium for cost allocation. Among them, two kinds of software are used as follows: one is Mathematica; Mathematica is a scientific computing software, which can be used to solve complex computing problems. In this article, the health transition probability matrices are computed by Mathematica. The other is Statistical Product and Service Solutions (SPSS). SPSS is a commonly used statistical analysis software, and except for the matrix calculation, the rest are processed by SPSS.
3.1. Health Transition Probability Matrices
3.1.1. Three-Year Transition Probability Matrices
Since CLHLS are longitudinal data at three-year intervals, we first calculate the three-year transition probability. Before calculation, we first grouped the samples by age and gender, which were classified by the usual practice and Chinese life table; then according to health definition, to track each individual health state changes, we finally calculated the population ratio in different states and used these as transition probability of two-wave survey. There may be probabilistic instability problems in the superaged period (for example, the number of people who transfer from unhealthy to healthy is very small), we combine 95 years old and above into one group, and three-year probability matrices are obtained firstly from survey 2014 to survey 2017. As seen in Table 3, for 65-70-year-old healthy (i.e., state 1) men, the probabilities of remaining in state 1, transitioning to state 2, state 3, and state 4 are 0.675, 0.240, 0.024, and 0.061, respectively. The other data gave the same indications.
Figures 4–6 show the differences in health by age and gender. For intuitive comparison, the range and scale of the coordinate axes in these figures are the same. According to Figure 4, it can be seen that there is a possibility of returning from unhealthy to healthy; that is, state 2 transitions to state 1 and state 3 transitions to state 1. Especially in the low-age group, the possibility of state restoration is not small, so ignoring the reversibility of the state will bias the results. Figures 5 and 6 show the probability of disability and death with age.

(a)

(b)

(c)

(a)

(b)

(c)

(a)

(b)

(c)
Among them, the probabilities of disability over 90 years old in Figure 5(b) and over 85 years old in Figure 5(c) decline due to the sharp death risk increasing during this period. From the perspective of gender, the disability probability of men is always significantly less than that of women under the same conditions, especially those who were 75–80 years old in Figure 5(b) and 70–85 years old in Figure 5(c), but the overall death probability of men is greater than that of women as seen in Figure 6, and the difference between 85 and 90 years old in Figures 6(b) and 70–80 years old in Figure 6(c) is quite significant, which coincides with the corresponding disability rate in Figure 5. Our results show that (1) men have a relatively low disability and high mortality, (2) women have a survival advantage, and (3) men have a health advantage. These results are consistent with demographic theory. The trend and complementary characteristics of disability and mortality are also consistent with the results of Qiao and Hu [34].
3.1.2. One-Year Transition Probability Matrices
Based on the above three-year transition probability matrices, we assume that the transition probability is constant within three years; according to (5), estimate one-year transition matrices (as seen in Table 4), and this method is widely used in discrete time Markov processes, especially in commercial LTC, public LTC security plan, and insurance supervision [35, 36].
Next, we estimate 30-year transition probability matrices to obtain the total cost, and the setting of 30 years is because we assume that the maximum age is 95, as explained above. Due to space limitations, matrices are omitted and available on request.
3.2. LTC Total Cost
Since this article focuses more on the calculation method, for the sake of analytical tractability, LTC cost estimation involves many hypotheses without loss of generality, and the calculation results can be used as a reference whereby the absolute magnitude is not regarded as an inevitable event. Assuming that a 65-year-old person needs LTC in the next 30 years, the annual cost is 1; then according to formula (10), the total LTC cost is shown in Table 5.
From Table 5, the men’s LTC cost in states 1, 2, and 3 is 1.208, 1.526, and 2.512, respectively, and the women’s cost is 2.502, 2.735, and 3.746. These values are relatively high and even twice those of men. The costs in different initial states have big differences; the men’s cost in state 3 is twice that of state 1, and that of women in state 3 is three times that of men in state 1. Comparing the related research results [4, 25], although the costs are different due to research assumptions, the cost quantitative relationship between gender and initial state is consistent.
3.3. The Single Premium
The single premiums with different ages and genders are shown in Table 6. It should be noted that, according to the insurance practice, we calculate the payment period as 60 years old, and the calculation of the above cost starts at 65 years old due to insufficient data in the 60–65-year-old sample, so we first discount the total cost to 60 years old according to (11) and then calculate the premium according to (12). As seen in Table 6, the single premium for the males at the age of 25 years old is 0.351 and that for women is 0.752. In other words, the healthy males at the age of 25 pay an insurance premium of 0.3513 and the women pay 0.7522 now, all of them can obtain disability security after they reach 65 years old, i.e., compensation of one unit per year in case of disability.
3.4. The Annual Level Premium
The annual level premium during the working period is shown in Table 7, and the corresponding calculation formula is (13), where the survival probability was calculated combined with China Life Insurance Mortality Table in 2014. Similarly, the results show that men’s cost is greater than that of the women and basically remains at about twice. What we need to pay more attention to is the trend of price with age; the price curves have an obvious inflection point around the age of 43, which shows a steady and slight upward trend at first and then a gradually increasing trend. In particular, the premium difference between men and women becomes obvious after the age of 43, which is mainly due to the health difference between men and women. The gender differences and changing trends of cost are consistent with the fact that men have health advantages and women have survival advantages.
4. Discussion and Suggestions
This article proposes a multistate transition probability matrix model based on the Markov process and derives and estimates the LTC total cost and cost allocation by gender and age based on an actuarial theory using CLHLS and China Statistical Yearbook LTC. Compared with the existing models, this article first relaxes the assumption that the health state cannot be transferred and all states are absorbed to construct dynamic transition probability matrices and then use age cohort dislocation multiplication to solve the Markov time homogeneity. Our method provides new ideas for elderly health dynamic changes research under limited data and enriches and supplements the existing literature. However, we did not find detailed and comparable solutions to test the effectiveness of the model due to the lack of quantitative research or the differences in the status definition, our conclusions are in line with theoretical expectations and are also supported by the elderly health change characteristics in demography, and we hope that our work can become a scheme to be compared.
Our findings show LTC costs are significantly related to age, gender, and initial health status, and compared with healthy elderly, the cost for unhealthy elderly may more than double; compared with men, the cost for women may increase by up to 75% and the difference widens with age. If the cost is shared during the working period through LTC insurance, then the annual premium at different ages will increase with age, especially for 43 years that may be an inflection point, and over this point, the price will increase exponentially and the gender difference will also expand. Accordingly, considering the LTC system is still in an initial stage in China, some suggestions are proposed as follows. (1) LTC cost estimation should be classified according to ages, genders, and initial health status due to big health differences in different types. For example, the maximum cost of unhealthy women is as much as three times that of healthy men under the same age, especially for Chinese women who have a long life expectancy and a long unhealthy life expectancy; nonhierarchical estimation may affect the sustainability of the LTC account. (2) Due to premiums at a young age, it is reasonable to smooth long-term care risks throughout the life cycle, especially for women with poor health; and if LTC is used as social insurance, it should be arranged at an early period where there is no big gender difference. (3) Since the LTC cost for the unhealthy elderly is much greater than that of the healthy elderly, it is necessary to provide comprehensive and full-cycle health services with prevention as the main focus to improve the health of the elderly and reduce future long-term care pressure.
Finally, despite the promising benefits, we acknowledge that our proposed approach has limitations. First, the model does not consider the impact of future medical technology changes on morbidity or mortality, which may overestimate or underestimate future costs. When we use the dislocation and multiplication of ages cohorts based on cross-sectional data to solve the problem of the lack of long-term longitudinal tracking data, it means the health model of the earlier cohort is used to simulate the future health status of the younger cohort. Still, actually, health change has a trend over time, such as increasing survival rates. The second is the problem of sample selection bias follow-up survey data. Due to the tracking data being used, participants who are in poor health or have a higher risk are more likely to be not included in the sample, and the samples included are more likely to be relatively healthy, which may cause underestimating the future cost. Both of these problems are related to the limitations of elderly health data and are more difficult to solve; however, the representativeness and sample size of the sample in this article are still very advantageous in the study of similar problems, and the influence of the two aspects is opposite and can be corrected each other. In addition, for the sake of data availability and analytical tractability, our modeling setup has been very stylized. Combining more realistic terms such as different payment levels, maximum payment period, and payment deadline consistent with the claim age and using a completely in-homogeneous Markov process or the semi-Markov method may be more effective [13, 37], which is also a possible future direction. In particular, the calculation results are mainly from the individual level, and it not only offers information for the LTC security system but also provides a reference for the commercial LTC insurance rates. But if a government wants to implement an LTC social insurance system, and premium is equal for everyone regardless of age and sex, then how the premium is calculated will be an interesting topic. In fact, although China's LTC has been included in the plan, there are still many disputes over whether it will be in the form of social insurance or commercial insurance. For the questions at hand, we are convinced that our setup is a reasonable compromise between realism and tractability.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the National Social Science Foundation of China under grant 21BRK018. and also by the Ministry of Science and Technology under Grant MOST 106-2410-H-182-004 and Chang Gung Medical Foundation under Grant BMRPA79.