Abstract
Diabetes is a chronic disease that is characterized by insufficient production or utilization of insulin and a consequent high increase in blood sugar. Diagnosis of diabetes is a complex process and requires a high level of expertise. The disease is characterized by a set of signs and symptoms. Some of these symptoms are obtained through laboratory analysis. Creation of a knowledge base and automation of disease diagnosis are important and allow fast detection and treatment. Various techniques have been used to develop a high-accuracy system for the diagnosis of diabetes. Fuzzy logic is one of the appropriate methodologies for the development of such medical diagnostic systems. Several research studies have used fuzzy models to diagnose medical diseases due to the imprecision and uncertainty associated with medical data. Moreover, a high level of uncertainty in medical data requires a type-2 fuzzy system to handle these uncertainties and diagnose diabetes. The paper proposes the integration of a type-2 fuzzy system and neural networks for the diagnosis of diabetes. Using the structure of type-2 fuzzy neural network (T2FNN) and statistical data, the system’s design for the diagnosis of diabetes is performed. A number of simulations have been done in order to evaluate the performance of the designed system. The comparative results demonstrated the efficiency of using the T2FNN system in the diagnosis of diabetes. The physician can use the system for diabetes’ diagnosis.
1. Introduction
Diabetes is one of the most important disorders that is widely spread among people. The disorder is appearing as a result of increasing blood glucose. The pancreas produces insulin that helps blood to carry glucose to all body cells of the human organism. When the insulin produced by the pancreas is not sufficient or the organism is no longer able to use insulin properly, then the concentration of glucose in the blood increases. The high concentration of glucose in the blood is called diabetes that can cause serious health problems and damage the human organs. The disorder has different categories such as type-1 diabetes, type-2 diabetes, gestational diabetes, impaired glucose tolerance (IGT), and impaired fasting glycaemia (IFG) [1]. Type-1 diabetes is caused by the destruction of the autoimmune beta cell. As a result, the body does not generate insulin or generate an insufficient quantity of insulin. The disease usually occurs in children or young adults, but the disease can affect any people. The patients having this disease are dependent on insulin all their life. Type-2 diabetes is progressive due to the loss of beta cell. It is the most common type of diabetes that appears in adults after the age of 40 due to a poor lifestyle or poor diet. It is also increasing in children. In this type, the body is unable to use insulin. Gestational diabetes is arising during pregnancy due to hormone alteration. It disappears, but it makes women more prone to type-2 diabetes. IFG and IGT are transition states from normal to diabetes.
Diagnosis of diabetes is a difficult process and requires a high level of expertise. Because of the late diagnosis of diabetes, many people lose their organs and many people also suffer from loss of vision in their life [2]. In addition, heart and vascular disorders are frequently observed in diabetic patients. Therefore, early diagnosis of diabetes is as important as early diagnosis of cancer. With the early diagnosis of diabetes, the cost of the health systems and treatment is reduced [3]. The maintenance of blood sugar levels is essential for diabetes. The most important hormone involved in the regulation of blood sugar in the body is the insulin hormone secreted by the beta cells of the pancreas [4].
Diabetes can cause serious damage to the patients. These give rise to heart disease, damage to the kidneys, damage to the eyes and nervous system, and amputation of some organs. The early diagnosis of diabetes can prevent further complications while keeping the disease under control. The physician recommends the use of drugs depending on the type of the disease along with lifestyle changes. These include increasing physical activities, reducing smoking cigarettes and alcohol consumption, and dietary changes. The existing methods cannot absolutely treat diabetes. If the disease is detected in its early stage, then it can be controlled. Late diagnosis complicates the treatment efforts. For this reason, the early diagnosis of diabetes is very important for the timely treatment of diabetes.
Different studies were devoted to diabetes diagnosis. A number of machine learning models such as neural network [5], hybrid system [6], SVM [7, 8], NN, decision tree, and random forest-based models [9, 10] have been developed. Different algorithms are applied for learning the network models using statistical data about diabetes [11]. Erkaymaz et al. [11] used a small-world NN model, and Christopher et al. [12] used wind-driven optimization for the design rule base and tested it on the diagnosis of diabetes. Kannadasan et al. [13] used deep NN based on stacked autoencoders for extraction of features and classification of diabetes. The development of these models is based on statistics. Diagnostic systems use various input factors and symptoms for the detection of diseases which is a difficult process. The diseases are characterized by a set of input symptoms. Sometimes, it is difficult to specify the exact interval of these symptoms affecting the healthy life of humans. By partitioning input signals into different intervals and analyzing them, the doctors make the decision about the health conditions of patients. These input intervals are often characterized by uncertainty. Also, some data characterizing the symptoms are noisy. Due to its complexity and vague nature, the diagnosis of the disease may result in an undesirable error. For example, various patients may react to various diseases to varying degrees. Therefore, diagnosis of the diseases is always carried out with uncertainty. Incomplete patient data and complex character of diagnosis cause vague nature of the diseases as well as uncertainty of decisions. One of the more appropriate approaches for solving such problems is the design of systems based on fuzzy logic. Fuzzy logic is one of the best methodologies to describe uncertainties and present appropriate associations between input and output variables. Fuzzy logic-based systems are more suitable for designing diagnostic systems that can deal with uncertainty in medical diagnosis. Fuzzy logic uses linguistic terms with excellent numerical approximation in order to describe imprecise knowledge [14, 15]. A number of research studies have been done for the diagnosis of medical diseases. Bressan et al. [16] used fuzzy rule-based inference for the classification of the diabetes mellitus type 2. Ghazavi and Liao [17] used fuzzy classification algorithm and ANFIS model, Feng et al. [18] used integration of fuzzy inference system and DNA coding with supervised learning, Beloufa and Chikh [19] employed fuzzy classifier and modified bee colony algorithm, and Ramezani et al. [20] used logistic regression and ANFIS model for classification of diabetes. Mansourypoor and Asadi [21] utilized reinforcement learning for the design of a fuzzy rule-based system for the diagnosis of diabetes. El-Sappagh et al. [22] used fuzzy ontology-based semantic case-based reasoning for diagnosis of diabetes, and [23] employed grey wolf optimization for designing fuzzy rules for diabetes.
When the information used in the knowledge base is characterized with uncertainty, then the type-1 fuzzy system cannot handle the effect of such kind of uncertainties [24, 25]. One efficient approach for handling these uncertainties is the use of type-2 fuzzy sets in the system design. Type-2 fuzzy sets were proposed by Zadeh as an extension of type-1 fuzzy sets. Because the membership function of type-2 fuzzy sets is three-dimensional, it can provide a good framework for managing these uncertainties. Type-2 fuzzy sets were later developed by Mendel and his students [24, 25]. In the paper, the authors used interval type-2 fuzzy sets for the development of a medical system for the diagnosis of diabetes.
In the literature, type-2 fuzzy systems are employed to solve engineering problems [26–36] and diagnose various medical diseases [37–40]. Shafaei Bajestani et al. [37] presented a type-2 fuzzy regression model for prediction retinopathy in diabetic patients. Mohammed and Hagras [38] presented a diabetic diet recommendation system using a type-2 fuzzy system. The system provides help to the patients for achieving a healthy lifestyle in order to control the disease. Another study presented an ontology model that uses interval type-2 fuzzy sets for the representation of knowledge and for diabetic diet recommendations [39]. In [40], the type-2 fuzzy system is developed to control blood glucose levels in patients. One important problem in the design of a type-2 fuzzy system is the design of antecedent and consequent parts of fuzzy rules. One efficient way is the use of neural network structure for the design of the fuzzy system. Integration of NN structure with fuzzy logic allows the design of a high-accuracy system. These two methodologies are integrated for the design of a type-2 fuzzy neural system (T2FNN) for the diagnosis of diabetes.
As it was shown, a number of research studies have been conducted for the accurate identification of diabetes. The main problem was the design of the diagnostic system with high accuracy. In this paper, T2FNN is proposed for this purpose. Contributions of the paper are the following: the structure of T2FNN that integrates interval type-2 fuzzy sets and neural networks is proposed; the learning algorithm of the system is designed using cross-validation techniques and gradient descent algorithm; and using statistical data and T2FNN structure, a medical diagnostic system is designed for diabetes.
The paper is organized as follows. Section 2 presents the T2FNN system developed for the diagnosis of diabetes. Section 3 presents simulations of the identification system. Section 4 gives the conclusions of the paper.
2. T2FNN Model for Diagnosis of Diabetes
Diabetes is characterized by a number of signs and symptoms. Some of these symptoms are determined using laboratory analysis. In the paper, we used an extended version of Pima dataset with 2000 samples that includes eight input symptoms. The input symptoms are a number of pregnancies, blood pressure, 2 hours of plasma glucose concentration in the oral glucose tolerance test, hourly serum insulin value, triceps skin fold thickness, diabetes pedigree function, body mass index, and age. System outputs are being diabetic or healthy. The development of the diagnostic system is implemented using datasets that include the statistical values of input and output variables. The methodologies are based on type-2 fuzzy sets. Mamdani and TSK (Takagi–Sugeno–Kang) type fuzzy systems are extensively using for the system design. It was indicated that TSK type fuzzy systems have high accuracy in identification and classification problems [33]. In the paper, the type-2 TSK system is used for diabetes identification.
The main problem in type-2 TSK system design is the construction of IF-THEN rules that include type-2 fuzzy values in the antecedent and linear functions in consequent parts. In the paper, the multi-input multi-output type-2 TSK fuzzy rules are used. They are presented as follows:where x1, x2, …, xm are the input variables, refers to type-2 interval fuzzy sets associated with the i-th input signal and j-th rule and represented by triangle forms, yj (j = 1, ..., r) are linear functions, and are coefficients of linear functions, i = 1, ..., m, j = 1, ..., r, k = 1, …, n, where m is the number of input signals, r is the number of rules, and n is the number of output signals.
The main problem in a system design is determining the antecedent and consequent parts of the rules. In the paper, neural network architecture and learning algorithms are utilized to design the type-2 TSK fuzzy system. Figure 1 presents the T2FNN structure that integrates these two approaches. The system is employed for the diagnosis of diabetes.

The T2FNN uses eight inputs in order to predict two output variables. The input layer is used for distributing the signal. The second layer includes interval type-2 membership functions used to represent unknown parameters of antecedent part of rules (1). The membership functions are represented using Gaussian.where xj are input signals and cij and σij are the center and width of the membership functions. Uncertainties can be associated to the mean cij∈ and width σij∈ of membership functions. Figures 2(a) and 2(b) depict the Gaussian membership functions with uncertain mean and uncertain width, respectively. We use interval type-2 membership functions, with an uncertain mean as shown in Figure 2(a). Each point of the membership function is characterized by the upper and lower membership values that are calculated using (2).

(a)

(b)
Next, the firing strength of each rule is calculated in the rule layer. For this purpose, t-norm “min” implication operation is used. The outputs of the rule layer are determined aswhere is t-norm min operator. After finding firing strengths of rules, type-2 fuzzy outputs of the rules are determined. These operations are implemented in the fourth and fifth layers. The layers five and six implement type reduction and defuzzification operations. The inference engine presented in [32, 33] is used to determine the crisp output of the system.where are computed using (4), xj are input signals, yj are outputs of linear functions, and are coefficients of the linear functions, and p and q parameters are used to adjust the lower and upper portions in the final output.
After finding the output signal of the system, the training of the parameters is started. Training allows updating the values of c1ij, c2ij, and σij coefficients of membership functions and and coefficients of the linear functions and output layer correspondingly. In the paper, the gradient descent algorithm is applied for correcting the values of the unknown coefficients. The readers can refer to reference [33] for the details of the learning algorithm.
3. Implementation of T2FNN for Diagnosis of Diabetes
Here we are considering the design of the T2FNN-based identification system of diabetes using two datasets. We used an extended version Pima dataset that includes 2000 data samples for the design of the system [41]. The T2FNN system is used for the design of diabetes diagnosis system. The first previous version of Pima diabetes dataset consists of 768 patients, and the second version includes 2000 patients. In total, there are 8 inputs in this dataset. The first input used in the datasets is the number of pregnancies. Diabetes risk increases in women who have had more than three pregnancies. The second input is the 2-hour plasma glucose concentration. If this value is above 140, it means that there is a risk of diabetes. The third input in the datasets is the blood pressure value. If blood pressure value is greater than 90, the risk of diabetes increases. The next one is the fourth entry which is skinfold thickness which gives us information about diabetes. It has been observed that the thickness of the skinfold in diabetic patients is greater than that in healthy people, and this value should be 15 on average under normal conditions, whereas it is generally more than 23 in diabetic patients. The 2-hour serum insulin value was used as the fifth input in the Pima dataset. If this value is greater than 166, it may be a sign of diabetes mellitus. Increasing the 2-hour serum insulin value can increase the risk of type 2 diabetes. The sixth input is the body mass index. If the body mass index is more than 30, the risk of diabetes increases. This penultimate input is diabetes pedigree function. Diabetes pedigree function is generally above 0.5 in diabetic patients, and as this value increases, the risk of diabetes will be much more. The last and the eighth input used in the Pima dataset is the age of the patients. Datasets have 2 outputs with or without diabetes in response to all these inputs. All inputs used in the system are signs of diabetes. In datasets, 1316 of 2000 patients are healthy and 684 of them are diabetic patients. Table 1 presents the fragment of statistical data taken from the extended Pima datasets. Statistical measurements used for the extended Pima datasets are given in Table 2. In the table, mean, standard deviation, and maximum values of each attribute are given. The relationship between the input and output signals of the dataset is high-order nonlinear. Feature importance score can be used to determine the most important input features affecting output data. Feature importance assigns scores to input features based on how the input signals are useful for predicting a target variable. We used statistical correlation scores to determine the importance of input features. Table 3 presents correlation scores of the input features used in the paper. As shown, there are no big differences between the scores of input features. Only the last two parameters have low scores than the others.
The T2FNN system is used for the diagnostic system design. In system design, the basic problem was finding appropriate values of the parameters of antecedent and consequent parts of the type-2 fuzzy rules in (1). These are c1ij, c2ij centers and σij widths of membership functions and and coefficients of the linear functions and output layer correspondingly. In the paper, the cross-validation approach with a gradient descent algorithm is applied for adjusting the parameters. The simulation has been done using 10-fold cross-validation technique for 2000 epochs. Here, the whole datasets are divided into 10 equal groups. Nine groups were used for training, and one group was used for testing. In each epoch, the number of the test group will be changed. In simulations, we used a different number of rules for the system design. Root mean square error (RMSE), accuracy, sensitivity, specificity, and precision were evaluated in order to measure system performances. The simulations were done using 16, 32, 40, 48, 64, 80, and 100 fuzzy rules. The learning of the type-2 fuzzy TSK system with 80 and 100 rules is demonstrated in Figures 3(a) and 3(b), correspondingly. Figures 4(a) and 4(b) depict graphical illustrations of type-2 membership functions of the T2FNN system before (initialized randomly) and after training, respectively. For clear visibility, we presented only four membership functions. The learned membership functions are used for describing the antecedent part of type-2 fuzzy rules. The consequent part of the rules uses linear functions characterized by weight coefficients. The T2FNN with trained values of c1, c2, o, and is used for the classification of diabetes in online mode. Table 4 depicts simulation results of the T2FNN system for diagnosis of diabetes using a different number of rules. As shown, an increase in the number of fuzzy rules causes an increase in the system accuracy. Better results have been obtained using 100 fuzzy rules. The values of training, validation, and testing errors were obtained as 0.185, 0.219, and 0.217 correspondingly. The values of accuracy, sensitivity, specificity, and precision were obtained as 99.75, 100, 99.6, and 99.27 correspondingly.

(a)

(b)

(a)

(b)
In order to prove the effectiveness of the designed system, the simulation results of T2FNN system were compared with the results of other systems used for the diagnosis of diabetes. In the existing references, the researchers used the first version of Pima datasets [42] in simulations to make a fair comparison, and we used the same previous version of Pima and did simulations using a different number of rules. We used the T2FNN structure and cross-validation technique with a gradient descent algorithm to design the system. Table 5 depicts simulation results of the T2FNN system utilizing the first version of Pima datasets. Table 6 presents the comparative simulation results of different models. We presented the results of four simulations—T2FNN with 16, 32, 80, and 100 rules. The designed T2FNN systems with 32, 80, and 100 rules have better accuracy rates than other models. The comparative results obtained demonstrate the efficiency of using the T2FNN system in the diagnosis of diabetes.
4. Conclusions
Analysis of existing research studies shows that various models have been designed for the diagnosis of diabetes. The basic aim in the design of these systems was to achieve a high accuracy rate. In this paper, T2FNN is proposed for the diagnosis of diabetes. The type-2 fuzzy inference scheme and neural network structure are integrated to construct the T2FNN model. Extended Pima diabetes dataset is utilized for the design of T2FNN. The training of T2FNN is implemented using a gradient descent algorithm with the application of the cross-validation technique. Using a different number of fuzzy rules, a number of simulations have been conducted. It was demonstrated that the increased number of fuzzy rules increases the accuracy of the T2FNN model in the diagnosis of diabetes. Better results have been obtained using 100 type-2 fuzzy rules. The values of accuracy, sensitivity, specificity, and precision of the system for extended Pima datasets (2000 samples) were obtained as 99.75, 100, 99.6, and 99.3 correspondingly. For comparative analysis, the simulations were carried out using the first version of the Pima (768 samples) dataset. The values of accuracy, sensitivity, specificity, and precision were obtained as 99.1, 99.25, 99, and 98.1 correspondingly. In order to evaluate the performance of the designed system, the performance of T2FNN is compared with the performances of other systems used for the diagnosis of diabetes. The comparative results demonstrate the efficiency of using the T2FNN system in the diagnosis of diabetes.
Data Availability
The Pima data used to support the findings of this study have been deposited in Kaggle (https://www.kaggle.com/uciml/pima-indians-diabetes-database; https://www.kaggle.com/johndasilva/diabetes) and UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/).
Conflicts of Interest
The authors declare that they have no conflicts of interest.