Abstract

Maintenance hemodialysis is the main method for the treatment of end-stage renal disease in China. The value is the gold standard of hemodialysis adequacy. However, requires repeated blood drawing and evaluation; it is hard to monitor dialysis adequacy frequently. In order to meet the need for repeated clinical assessments of dialysis adequacy, we want to find a noninvasive way to assess dialysis adequacy. Therefore, we collect some clinically relevant data and develop a machine learning- (ML-) based model to predict dialysis adequacy for clinical hemodialysis patients. We collect 250 patients, including gender, age, ultrafiltration (UF), predialysis body weight (preBW), postdialysis body weights (postBW), blood pressure (BP), heart rate (HR), and blood flow (BF). An efficient graph-based Takagi-Sugeno-Kang Fuzzy System (G-TSK-FS) model is proposed to predict the dialysis adequacy of hemodialysis patients. The root mean square error (RMSE) of our model is 0.1578. The proposed model can be used as a feasible method to predict dialysis adequacy, providing a new way for clinical practice. Our G-TSK-FS model could be used as a feasible method to predict dialysis adequacy, providing a new way for clinical practice.

1. Introduction

Maintenance hemodialysis is the main treatment for end-stage renal disease in China. Adequate hemodialysis not only prolongs survival time [13] but also reduces dialysis complications, improves quality of life, and reduces mortality. is the most commonly used indicator to assess the adequacy of hemodialysis. The British Society of Nephrology and the Kidney Disease Outcome Quality Initiative (K/DOQI) recommend a minimum of 1.2. The value needs to measure the BUN level (before and after dialysis) and is calculated by the Daugirdas formula (). This method requires repeated blood draws and evaluations, so it is difficult to frequently monitor the adequacy of dialysis. Currently, some clinical researchers used body monitor component (BCM) measurement to calculate the value. However, the BCM technology requires special equipment, and the operation method has not yet formed a unified standard. The BCM technology cannot be widely developed. Therefore, it is especially important to find a more convenient, simple, and effective method to assess the adequacy of dialysis.

In recent years, machine learning (ML) has been widely used in the medical field and has achieved good results. For example, neural networks [4] and the support vector machine (SVM) [5, 6] were used to predict the dry weight (DW) of hemodialysis patients. In the field of bioinformatics, lots of ML technology have been well used in drug discovery [79], protein function [10, 11], and disease analysis [12].

ML-based predictive models can also be used to quickly estimate the adequacy of dialysis. This calculation method can provide a reference for clinical practice. Takagi-Sugeno-Kang Fuzzy Systems (TSK-FS) [1315] are well known for good interpretability [16] and approximation accuracy [17, 18]. In this study, we developed an effective graph-based Takagi-Sugeno-Kang Fuzzy System (G-TSK-FS) model to predict the adequacy of dialysis.

2. Methods

2.1. Patients

From January 2018 to December 2020, this study collected the data of 250 patients from Wuxi People’s Hospital, China. The criteria of selection are (1) patients over 18 years old, (2) patients without severe infection and heart failure within 30 days, (3) patients receiving maintenance hemodialysis for more than three months, (4) patients with no history of mental illness, and (5) patients who are informed and volunteered to participate in this study. The exclusion criteria are (1) patients who withdrew midway and (2) incomplete data.

All patients have received hemodialysis (HD) or hemodiafiltration (HDF) through the Fresenius machine. They were all dialyzed for four hours. The dialysate was fixed at 500 ml/min. Table 1 shows the gender distribution, average age, mean predialysis body weight (preBW), average ultrafiltration level (UF) (the difference between weight before and after dialysis), average blood pressure, average heart rate, and average blood flow.

2.2. Blood Sampling

Each patient contains two blood samples: (1) before dialysis, a sample is collected from a vascular access vein without anticoagulant. Before collecting, we collected 10 milliliters of blood from those patients who used hemodialysis catheters as vascular access and (2) the other sample is obtained from the inlet of extracorporeal circulation before the end of dialysis. When the blood sample is taken, the blood flow rate will be slowed to 50 ml/min. At this time, the dialysate stops flowing and blood can be collected after 15 seconds.

The is used as a “gold standard” for postdialysis, and predialysis eqU is calculated as where is ultrafiltration, is postdialysis body weight, and is the duration of the dialysis session in hours. .

2.3. Graph-Based TSK Fuzzy System

In this work, we use TSK-FS to predict the of a hemodialysis patient. For a classic 1-order TSK fuzzy system, the fuzzy inference rules are defined as follows.

TSK fuzzy rule is as follows.

If is is is , then , where is a fuzzy subset of the th rule for the th input variable . denotes the number of fuzzy rules. Each fuzzy rule is premised on the feature space . And TSK-FS maps the fuzzy sets to an output single dependent variable by . The output of the TSK-FS can be formulated as follows: where and are the fuzzy membership function and normalized function via fuzzy set . And can be calculated by where is the fuzzy membership function of the th rule under the th input variable. In general, TSK-FS uses the Gaussian membership function: where and are two parameters of the th variable value of the fuzzy set . Fuzzy C-means (FCM) is employed to estimate the following two parameters: where is the fuzzy membership of the th sample under the th fuzzy set by FCM clustering. denotes the scale parameter. When the premise (if-parts) of the TSK-FS is determined, let

And equation (2) (then-parts) can be formulated as

So, the problem of TSK-FS training can be regarded as solving linear regression: where and are the true value to be approximated and the feature after fuzzy rule mapping, respectively. denotes the number of training samples. is the dimension after fuzzy rule mapping. To improve the generalization performance of the model, we add the Laplace regularization term to equation (8): where and are the coefficients of the two regularization terms. We derive formula (9) and get the solution where is the Laplacian matrix, which can be calculated as where is a diagonal matrix, . Similarity matrix is built by cosine similarity of two feature vectors. We call this model as graph-based TSK-FS (G-TSK-FS), and the frame diagram of TSK-FS is shown in Figure 1. The least squares is employed to solve the optimization problem of G-TSK-FS.

3. Result

In this work, we test G-TSK-FS and other predictors on the dataset. Each model is evaluated with the root mean square error (RMSE) [5, 19], -squared, and adjusted -squared under 10-fold cross-validation (10-CV) [20, 21]. In addition, Bland-Altman analysis is also used to evaluate the agreement of two different methods (between clinical methods and predictive models).

3.1. Selection of Parameters for the Model

In order to make the model have the best prediction performance, we use the grid search method to get the best parameters of the model. G-TSK-FS has three parameters, including , , and . The range of these parameters is set as and . First, we fix to search for the best and . The search results are shown in Figure 2. It can be seen that the RMSE value is the minimum (0.1950) when and . Then, and are set as 2 and 2−6 and is set from 2−10 to 20 with steps of 2 (in Figure 3). At last, the best RMSE is obtained under . In addition, the adjustable parameter of the kernel width of the Gaussian membership function is .

3.2. Comparison to Other Predictive Models

To evaluate the performance of our model, other predictive models are also tested on our dataset. They are linear regression (LR) [22, 23], support vector regression (SVR) [24], artificial neural network [25] based on the back propagation algorithm (ANN), and standard TSK-FS. Table 2 shows the results of RMSE, -squared, and adjusted -squared. In general, the smaller RMSE (close to 0), the larger -squared, and adjusted -squared (close to 1) indicate that the model has better prediction performance. It can be seen from the table that our method (G-TSK-FS) obtains the smallest RMSE (0.1578) and the largest -squared (0.7523) and adjusted -squared (0.7222). In addition, G-TSK-FS has increased by 0.0181 (-squared) and 0.0204 (adjusted -squared) on the basis of TSK-FS. This shows that the model has better generalization performance after Laplace regularization. Figure 4 shows the distribution of predicted values (all models) and true . From the 150th to 160th samples, each model has severe jitter, which may be caused by the noise during the data collection process.

3.3. Bland-Altman Analysis

The Bland-Altman plot is a useful tool, which can evaluate the agreement between predictive methods and the clinical method. Table 3 and Figure 5 show the results of five models via Bland-Altman analysis. In general, the lower the average difference (closer to 0) and the smaller the error acceptance range (95% confidence zone is between −1.96 SD and +1.96 SD), the better the agreement between the model and the clinical method. From the table, it can be seen that all methods have low average variance values. Among them, LR has the lowest value (−0.07312). In addition to LR and ANN, SVR (−18.1914 to 16.0155), TSK-FS (−18.1955 to 16.7179), and G-TSK-FS (−17.9686 to 16.3001) obtain the smaller range of agreement. It can be found in Figure 5 that the errors of LR and ANN for some points are very large, and the differences are greater than ±50%. For LR, ANN, SVR, TSK-FS, and G-TSK-FS, the ratios of disagreement interval are all close to 5%, which means that the prediction methods are equivalent to clinical methods. Generally, when the value is less than 5%, the prediction model can be completely equivalent to the clinical method. The results of the evaluation show that G-TSK-FS has the potential to help clinical evaluation of with low cost.

4. Discussion

The kinetics of urea removal is very complicated [26], and blood is usually drawn to calculate . What is more, strict blood collection procedures should be followed during dialysis. It is greatly affected by many factors, which will directly affect the calculation accuracy of the value [27]. In our research, we found that adequate dialysis is related to age, gender [28], ultrafiltration [29], dry weight, dialyzer surface area, blood flow [30], DBP, SBP, and heart rate before and after dialysis. It is consistent with a previous study [31]. This indicates that these clinical features can be used to assess the ability of dialysis.

LR, ANN, and SVR are regression methods, which have been widely used in many fields. In our work, the TSK-FS method achieves better results. It is more suitable for our task. The results show that the value of predicted by the G-TSK-FS is close to the clinical approach. G-TSK-FS obtains the smallest RMSE (0.1578) and the largest -squared (0.7523) and adjusted -squared (0.7222). In addition, the smaller range of agreement (−17.9686 to 16.3001) and the ratio of disagreement interval (close to 5%) show that it is a potential computational model to replace clinical methods.

Although clinical attention has been paid to the value of in patients. Few scholars have used G-TSK-FS prediction and patients’ clinical characteristics to predict patients’ dialysis adequacy. In the field of precision medicine, more scholars pay attention to clinical prediction models [3236]. Assessing the adequacy of dialysis requires repeated blood tests, which increases patient costs. In addition, the results of the adequacy test are affected by many factors, such as the quality of blood sample collection, the time of blood sample submission, and the reliability of test results. We study machine learning based on big data. Data related to the prediction model are clinical characteristics of patients. We use machine learning and other clinical data of the patient, which is convenient for clinical collection and noninvasive operation and will not increase the patient’s payment, to calculate .

5. Conclusions

Our method has made some progress in predicting . However, we do not take the noise samples or the characteristics of the noise into account. In addition, the number of samples collected has not yet reached a certain scale. In future work, we will introduce other machine learning techniques such as sample filtering and feature selection [37, 38] to deal with various types of noise. At the same time, further expanding the patient sample size is also the work of the next step.

Data Availability

The data used to support the findings of this study are available from the corresponding authors upon request.

Ethical Approval

This study has been approved by the ethics committee (KY21002).

Written informed consent has been signed by all participants.

Conflicts of Interest

The authors declare that they do not have any conflict of interest.

Authors’ Contributions

Aiyan Du, Xiaofen Shi, and Xiaoyi Guo are joint first authors.

Acknowledgments

Thanks are due to the Hemodialysis Center of Wuxi People’s Hospital for collecting data in our study.