Abstract
Parkinson’s disease (PD) is a central sensory system illness that causes tremors and impairs mobility. Unfortunately, the identification of PD in the early stage is a challenging task. According to earlier studies, around 90 percent of persons with PD have some variety of vocal abnormality. A number of measures are available to identify Parkinson’s disease. As a result, voice estimations can be utilized to identify the state of impacted persons. This research presents a novel method called as an ensemble stacking learning algorithm (ESLA), a classifier for identifying Parkinson’s disease on the collected data set from the healthy people. To calculate the performance, the proposed ensemble method is compared with other existing techniques and finds the improved classification ability of this proposed method. It is exhibited that the proposed method for PD patients creates the most reliable outcomes and accomplishes the highest accuracy. This ensemble approach is implemented as different stacked models using various classifiers such as random forest (RF), XGBoost (extreme gradient boost), linear regression, AdaBoost (AB), and multilayer perceptron (MLP), and an analysis on the level of accuracy achieved through prediction is done. Finally, the performance of the proposed method among the chosen algorithms is suggested for the prediction of disease compared with other algorithms.
1. Introduction
Parkinson’s disease is characterized by slower movement, tremor, and difficulty with gait and balance in a human body. The death of nerve cells in the locus niger, a portion of the nervous system, is the main reason for PD. Dopamine and other neurotransmitters are secreted by cells in the substantia nigra, which connect with other brain movement control areas. When cells in the substantia nigra die, the secretion of serotonin is stopped, and the other movements become uncontrolled. The basic symptoms of Parkinson’s disease are originated from a disruption in the movement control areas of the brain. A research work was carried out by Dorsey et al. [1] in which the statistical data from 2016 reveals that India has roughly 0.58 million people living with Parkinson’s disease, with a considerable increase in prevalence projected in the following years.
A research work done by Schrag et al. [2] explored the enormous number of persons affected by Parkinson’s disease and concluded that the underlying genetic and environmental risk factors that are specific to the Indian population are the main causes. Parkinson’s disease is a major healthcare issue. Since there is a refusal-specific diagnostic investigation for PD patients, detecting the disease might be difficult, and the patients are diagnosed with the help of clinical symptoms. According to studies, males are 1.5 times more inclined to develop PD [3] compared to females, necessitating the development of a gender-based identification method for improved broadcast. Parkinson’s disease involves a variety of symptoms that become more noticeable as the illness progresses through the phases. Changes in the quality of speech and difficulties in speaking are the two symptoms that persons with Parkinson’s disease may encounter in worse situations [4]. Subsequently, the patient’s verbal communication becomes increasingly tricky, and others who converse with them must frequently ask them to repeat the statements classified by Skodda et al. [5].
Tremor is another important symptom of Parkinson’s disease [4]. In [6], a research work titled as “Amplitude fluctuations in essential tremor,” was carried out by Mostile et al., and they explained the symptoms are frequently misinterpreted as essential tremor (ET). A tremor is an involuntary movement affecting one or more body parts. This tremor, like voice deterioration, makes patient’s lifestyle difficult since the uncontrolled activities of their limbs prevent them from easily accomplishing daily tasks. Unfortunately, till now, there is no treatment for this condition; however, medicine can be used to manage the symptoms [7]. Jankovic et al. [8] explored that the early identification of PD is critical since therapies such as levodopa/carbidopa are more successful when given early in the illness. Parkinson’s disease and Parkinson’s-like symptoms are treated with a combination of levodopa and carbidopa.
Mamun et al. [9] described that carbidopa/levodopa is still the most effective treatment for Parkinson’s disease. Nonpharmacologic therapies, like increased exercise, can also be implemented in the untimely stages of PD, which, according to research, may help to reduce the severity of the disease. This type of early diagnosis, however, is not practicable, especially for rural people in undersized nations where trained brain doctors are insufficient. Although PD is presently incurable, the existing medications can greatly lessen symptoms, especially in the early stages of the illness by Singh et al. [10]. A research work carried out by Gupte and Gadewar [11] explored that according to prior research, roughly 90% of persons with PD show vocal problems of some kind. As a result, voice measures are used to identify and follow the evolution of symptoms associated with Parkinson’s disease. This research presents an ensemble-based approach for predicting PD based on voice measurements.
The remaining paper is dissected as separate sections. Section 2 explains the related works. Section 3 describes the materials and methods used for the prediction of Parkinson’s disease. Section 4 gave a detailed report about the results and its discussion about the requirements of this work. Finally, section 5 concludes the research work.
2. Related Works
Many researchers have done their work in the prediction and analysis of Parkinson’s disease. Some of the research works are taken, and their results are analyzed in this paper. Voice analysis reveals significant improvement in the progression of PD. As a result, voice-related characteristics take part in a significant division in the automated diagnosis of Parkinson’s disease. Patients and healthy people are automatically distinguished by acoustic characteristics [12]. Several kinds of researches on speech analysis in Parkinson’s disease patients have been undertaken with the goal of detecting the disease and its severity. Little et al. [13] described about one of the well-received efforts that resulted in the creation of a database containing 23 voice characteristics. The same also given in the dataset and discussed in [14]. This dataset has been extensively used in many research works [15–17] for PD classification, and this research is also using the same dataset.
While a definitive diagnosis of Parkinson’s disease is troublesome, the viability of voice-based indicative systems has driven scientists to foster classification calculations to recognize PD patients from healthy people. In [18], the authors compared the effectiveness of multiple machine learning classifiers in detecting Parkinson’s disease in patients with dysphonia, a vocal problem. In this research work, it is tested that the robustness of the detection method, three distinct classifier techniques were used to identify between PD patients and healthy people, and the results were compared. The classifiers used were random tree, feedforward back propagation, and support vector machines. Perez et al. [19] proposed data aggregation to reduce the data dependency and Bayesian logistic regression to handle repeated measurements.
Benba et al. [20] deliberated that the mean value is acquired from voice print and the frames of voice samples which is classified into healthy ones from patients of PD using a leave-one-subject-out validation scheme and SVMs. In [21], the authors investigate how well the novel algorithms can distinguish PD patients from healthy persons. By using feature selection algorithms, they selected four subsets of dysphonia measures and did the binary classification with the help of random forests and SVMs. Finally, the authors discovered that several of the recently suggested dysphonia metrics enhance the effectiveness of classifiers to distinguish healthy persons from PD patients by complementing current methods. Hariharan et al. [22] suggested a hybrid model for feature reduction, feature preprocessing, and classification. For classification, they have combined the probabilistic neural network, common regression neural network, and least-square SVM. The results indicate that using a combination of feature reduction, feature preprocessing, and classification, the Parkinson’s dataset can be classified with a maximum accuracy.
Behroozi and Sami [23] proposed a classifier known as multiple classifiers for detecting PD. In this approach, for each voice test, an independent classifier is used, and a majority vote from all of the classifiers would determine the final accuracy of classification. Su and Chuang [24] suggested a dynamic attribute selection method based on fuzzy entropy measurements for PD classification. Linear discriminant analysis (LDA) was used to discriminate speech samples from PD patients and healthy persons in order to explore the influence of feature selection. The findings reveal that different speech models necessitate distinct feature selections. When they used dynamic feature selection, they were able to get a greater rate of classification accuracy than when they used all of the characteristics.
In [25], the scientists proposed PD automated recognition using optimum-path forest (OPF), a novel pattern recognition approach that does not imply any feature space. Finally, the authors found that OPF outperformed SVMs, ANNs, and other frequently used supervised classification algorithms for PD findings. Froelich et al. [26] have diagnosed in their work that Parkinson’s disease is based on a person’s voice. As an initial step, they categorized the voice samples into sick or healthy persons using decision trees. Second, a person’s final diagnosis is determined utilizing previously categorized voice samples by applying the threshold-based technique.
Furthermore, as opposed to a single classifier, many ensemble models were used [27]. Sheibani et al. [28] proposed an ensemble technique for recognizing sick and healthy samples using voice frequency features to predict class labels. An idea was formulated to combine sample primary feature vectors with anticipated class labels and achieves the classification accuracy of 90.6%. A novel classification method that combines the multiedit nearest neighbour (MENN) algorithm with a combined learning approach was recommended in [29]. The suggested technique improved PD classification by using voice data in this investigation, and it may be used in future studies to enhance PD classification accuracy.
3. Materials and Methods
There are a few approaches available for the analysis of PD in the research areas of information technology. Herewith, it is discussed how some of the techniques which are used by most of the researchers in their work are taken into account via some existing methods. The existing methods RF, XGBoost, linear regression, AB, and MLP are used in this work to analyze the Parkinson’s data. Motivated by the previous ensemble works, in this study, an ensemble PD classifier is proposed by combining both random forest and extreme gradient boosting (XGBoost) algorithms with linear regression and adaboost (AB) ensemble methods. This study used four stacking models to improve the prediction precision.
3.1. Description of Dataset
The dataset used in this work was created by Sakar et al. [30], and it has been gathered from the UCI machine learning repository. The information was accumulated from 188 patients with PD (107 male and 81 female) with ages from 33 to 87 at the Department of Neurology in CerrahpaÅŸa Faculty of Medicine, Istanbul University. [31–33] The benchmark group comprises of 64 healthy people (23 men and 41 women) with ages differing somewhere in the range of 41 and 82. Time-frequency features, wavelet transform-based features, mel frequency cepstral coefficients (MFCCs), vocal fold features, and TWQT features have all been used for voice recordings of Parkinson’s disease (PD) patients in order to separate clinically significant data for PD evaluation. Totally, this dataset consists of 756 instances with 754 attributes. Each and every column denotes the attributes and the columns from 3 to 23 refer to the baseline features. The last column specifies the class label, and this label was utilized as the main variable in the algorithm by which the accuracy value is calculated. Table 1 summarizes the various features and parameters of the dataset.
Initially, two base classifiers are used for this research such as random forest and XGBoost. For random forest classifier, the main features used in this study are max_depth, min_samples_leaf, and n_estimators. In random forest, the longest path between the root and leaf node is defined as the tree’s max_depth. Using this parameter, the depth of each tree can be restricted according to the requirements. When a tree goes to maximum depth, it will have a greater number of splits and collects more information on data. Max_depth is one of the decision tree’s settings that reduce the decision tree’s growth on a macro level. For this research, the max_depth of each decision tree is 10.
Min_samples_leaf is a parameter that defines the minimum number of samples that should be present in the leaf node once a node is split. If any of the leaf node is not having the specified number of samples, then it will not be considered as a leaf node, and the corresponding parent will be considered as the leaf node. In this study, the min_samples_leaf parameter is fixed as 5. By defining a minimum sample requirement for terminal nodes, the tree’s growth can be controlled. As the parameter value grows, this hyperparameter also aids in preventing overfitting. The amount of trees in the random forest is represented by n_estimators parameter. When there are more trees, it is easy to understand the data. However, the raise in the number of trees will also increase the model’s time complexity. In this research, the n-estimators parameter value is fixed as 50.
For XGBoostclassifier, the important features used are max_depth, learning_rate, n_estimators, and min_child_weight. Learning_rate parameter determines how each tree affects the final result. Gradient boosting starts with a baseline estimate that is updated based on each tree’s output. The magnitude of the change in estimates is controlled by the learning parameter. Lower values are often recommended because they make the model more resistant to specific tree properties and so allow it to generalize effectively. Lower numbers would necessitate a larger number of trees to model all of the relationships, which would be more computationally costly. For this research, the learning_rate is fixed as 0.2.
Max_depth feature defines the depth to which a tree can grow to its maximum potential. Maximum depth allows the model to be trained perfectly on specific relations from a single sample, which helps to prevent overfitting. Number 5 is fixed as the value for max_depth parameter. In a child, min_child_weight specifies the minimum weighted sum of all observations. Overfitting is controlled with this parameter. Higher values will not allow a model to learn relationships that are particularly specific to the tree’s sample. The default value for this parameter is 1, and this value is used in this study. The n-estimators refer to the total number of successive trees that will be modelled. Even while gradient boosting is fairly resilient while managing a large number of trees, it may overfit to a point. As a result, cross-validation should be used to adjust this for an exact learning rate. This research has fixed the value as 100 for the N_estimator parameter.
3.2. Ensemble Stacking Learning Algorithm (ESLA)—Proposed Method
This work proposed an ensemble stacking learning algorithm (ESLA) for class label prediction based on voice features to identify sick and healthy samples. Figure 1 shows the architecture of the proposed ensemble technique. As a first step, the basic and simple classification models are developed by using RF and XGBoost algorithms. The random forest algorithm is frequently referred to as the random forest classifier in machine learning. This approach adopts a unique way that it may be used for both regression and classification types of problems. Through cross-validation, the RF method generates a random sample of numerous decision trees and combines them together to achieve a more stable and accurate classification model. Generally, if the number of trees is more, then the created model through RF will be robust and so the accuracy is also very high. In addition, the missing values will be handled by the random forest classifier, which will keep the accuracy of a major percentage of the data. In spite of a greater number of trees, the RF classifier will not allow the concept of overfitting in the model. It can handle a huge data collection with a greater dimensionality.

The second basic classifier used in this research is the XGBoost algorithm. XGBoost is an advanced machine learning algorithm that excels in terms of speed and accuracy. It is simple to create a model with XGBoost. However, it is difficult to improve the model with XGBoost. When developing an XGBoost model, examine a variety of parameters and their values. To increase and completely use the advantages of the XGBoost model over other algorithms, parameter tuning is required. The regularization feature of the XGBoost algorithm reduces the overfitting of a model. When compared to the gradient boosting method, XGBoost uses parallel processing and is much quicker. Due to the high flexibility nature of XGBoost, users can create their own optimization targets and assessment criteria in XGBoost. This gives the model a total new dimension, and the possibilities are endless. XGBoost permits the user to do cross-validation at every iteration of the boosting process, simplifying it to get the correct number of boosting iterations in a single run.
The output of each prediction model developed in the first step is then computed and used as the input for the next stages. In the second stage, by tuning some parameters, the best training and testing accuracies are achieved for both RF and XGBoost algorithms. In the RF algorithm, the tuned parameters are max_depth, min_samples_leaf, and n_estimators and in the XGBoost algorithm, and the tuned parameters are learning_rate, max_depth, and min_child_weight. After the parameter tuning, 120 instances are trained with the RF classifier, and 144 instances are trained with an XGBoost classifier. Finally, a best model is selected from both RF and XGBoost classifiers. The accuracy is improved from 84.21% to 84.86% for the RF classifier model, and the accuracy of the XGBoost classifier model is improved from 88.15% to 88.85%.
The concept of the ensemble learning method is used in the next step. It is not always enough to depend on the outcomes of a single machine learning model. Ensemble learning is a combining method for the predictive abilities of numerous learners in a systematic way to improve predictive performance. The end result is a single model that combines the outputs of numerous models. The foundation learners, or models that make up the ensemble, might come from the same or distinct learning algorithms. Stacking, bagging, and boosting are the three major groups of ensemble learning methods. This study used the stacking technique to improve prediction accuracy. Stacking is an ensemble learning model that builds a new model using predictions from numerous nodes. An ensemble stacking learning algorithm (ESLA) is developed to create four stacked models. Along with the base classifiers RF and XGBoost, logistic regression, adaboost, and multilayer perceptron (MLP) are ensembled to obtain the highest accuracy in PD prediction. The stacked models are explained in the next section.
4. Results and Discussion
This research work has proposed 4 stacking models for the classification of PD in the chosen data set. These models are named as Stackedmodel1, Stackedmodel 2, Stackedmodel 3, and Stackedmodel 4. These are actually known and identified as META models which are used for the prediction in the medical dataset. Out of these four models, one of the model yields the best results which are to be visualized for the prediction of disease. The architecture of these models is shown in Figure 2. Based on the evaluation criteria of these models, the obtained results are analyzed.

Python software was used to apply classification methods to the dataset in this investigation. In each classifier model, 80% of the data set was used to train the model, and 20% was used to test the model. For this work, four stacked models are designed for Parkinson’s disease prediction. Stacking is an ensemble machine learning algorithm and learns how to integrate predictions from two or more basic machine learning algorithms using a meta-learning method. Stacking model architecture includes two or more base models, also called as level-0 model, and a META model that combines the predictions of the base models, also known as a level-1 model. Figure 2 shows the architecture of the proposed stacking technique.
Table 2 shows the base and stacked models for the proposed work. This table shows the levels of the ensemble models and the levels are level-0 and level-1. The table having models and ensemble algorithms are used in this work.
In this proposed stacking, four models are being trained on 188 samples of data. The models’ results are merged to get the final prediction for any instance xi. For learning the weights βj of the level-0 predictors, stacking adds a level-1 method called meta-learner. That is, for the level-1 learner, the prediction y(xi) of each training instance xi is training data which can be defined in the following formula:where xi is the instances, is optimal weights of level-0 predictors, and hj is base models.
Although any machine learning approach can be employed, logistic regression is commonly utilized to solve the optimization problem. It is depicted using the following formula:
The leave-one-out prediction derived by training hj across the subset of i-1 occurrences with the ith sample left aside is denoted by hj(-i). Here, i refers to the total number of samples. In this case, i = 188. Base models hj are retrained throughout the whole dataset and utilized to assess previously unknown examples and appropriate weights are calculated.
Stackedmodel1 is a hybrid model which is the best model of both RF and XGBoost algorithms with logistic regression, and Stackedmodel2 is a combination of RF, XGBoost, and adaboost classifiers. Stackedmodel 3 is an ensemble of Stackedmodel1 with multilayer perceptron (MLP) while Stackedmodel4 is a combination of Stackedmodel2 with MLP. The primary motivation for employing this ensemble strategy was to reduce the error rate. When compared to a single contributing model, an ensemble can generate better predictions and produce better results. In the end, the above four stacked models provide various training and generate testing accuracies. In all those trainings, the Stackedmodel3 gives the best prediction accuracy of 90%. Samples from the coding are given as follows: Sample coding: # stacked model 3: estimator = [ (‘rf’,rf_best), (‘xgb’,xgb_best), (‘mlp’,mlp) ] stack_model = StackingClassifier( estimators = estimator,final_estimator = LogisticRegression()) stack_model_3 = stack_model.fit(x_train, y_train) print(“Training accuracy = “,accuracy_score(y_train, stack_model_3.predict(x_train))) print(“Test accuracy = “,accuracy_score(y_test, stack_model_3.predict(x_test))) Training accuracy = 1.0 Test accuracy = 0.9013157894736842
Logistic regression is a statistical strategy that uses past data to predict the result of a dependent variable. It is a typical approach for tackling binary classification issues and one of the forms of regression analysis. Boosting is an ensemble approach that aims to construct a strong classifier from the collection of weak classifiers. This is accomplished by first developing a method from the training data, then attempting to fix the faults in the first model with a second model. AdaBoost is a machine learning method that may be used to improve the performance of any other machine learning technique. The multilayer perceptron (MLP) is a category of feed-forward neural network. It has three layers: an input layer, a hidden layer, and an output layer. The input signals are received by the input layer, the hidden layer processes the inner functioning of the network, and the output layer performs the classification task.
4.1. Performance Evaluation
The performance of the algorithms is analyzed after executing the source code which is written in the PYTHON programming language. The metrics used for the evaluation are carried out by the approaches utilized in this research work which are explained as follows. Evaluation of performance is critical in classification to properly justify the study of finding’s accuracy. Many performance evaluation approaches have been followed in classification for a long time and have become standard performance evaluation metrics in related areas. Accuracy, sensitivity, and specificity are the metrics, and the formulae are listed as follows:where TP is the true positive rate, and FP implies the false positive rate.
Recall or sensitivity or true positive rate (TPR): it is the extent of positive cases that were accurately recognized, as determined utilizing the following condition:where FN implies false-negative rate.
Accuracy is the proportion of the total number of predictions that were correct. It is resolved to utilize using the following equation:where TN represents true negative.
Sensitivity is the level of positive records which gives correct result for every single positive record.
Particularity is the level of positive records arranged accurately out of every single positive record.
The F-measure processes a few normal of the data recovery accuracy and reviews measurements.
True positive (TP) and true negative (TN) are used to describe accurate classification, whereas false positive (FP) and false negative (FN) are used to define inaccurate classification (FN). TP refers to the number of PD cases correctly identified while TN indicates the number of cases identified correctly as healthy. FP refers to the number of healthy patients incorrectly identified as PD patient and FN show the number of PD cases incorrectly identified as healthy. The capacity of a test to accurately distinguish between sick and healthy instances determines its accuracy. A test’s sensitivity refers to its capacity to appropriately identify patient instances, and specificity refers to its capacity to appropriately identify healthy instances.
From all these base classifiers and stackedmodels, an assessment of the accuracy of prediction is done. And Table 3 shows the results of metrics of basic classifiers and stacked models where the metrics are training accuracy, testing accuracy, sensitivity, and specificity of RF model, XGBoost model, Stackedmodel1, Stackedmodel2, Stackedmodel3, and Stackedmodel4. In all these base classifiers and stacked models, the proposed Stackedmodel3 (ensemble stacking learning algorithm) has the training accuracy, testing accuracy, sensitivity, and specificity values to be very high when compared to other base classifier models and stacked models. The graphical representation of Table 3 is shown in Figure 3 illustrating the metrics of the proposed model.

According to the obtained results through its findings, the ESLA algorithm outperformed the other single classifiers, achieving 90.13% classification accuracy, 92.61% sensitivity, and 91.24% specificity. However, by using an ensemble-based strategy, the suggested method was able to attain acceptable results. In addition, this work is compared with existing ensemble-based approaches, the comparison is shown in Table 4, and the graphical representation is shown in Figure 5. The time complexity for the random forest is 2 to 5 minutes, and XGBoost is 10 to 15 minutes. Though the stacked model needs more steps than the individual model, the time complexity of stacked models is 5 to 10 minutes. Since the stacked model is built upon the trained basic classifiers of random forest and XGBoost, the time complexity is less. But, it is not possible to confirm that this model alone yields better results for the prediction of disease in the medical data. This approach is one of such methods which are normally utilized for the classification and predictions. Different results were obtained through different perspectives by different authors where the implementations also vary. But, this approach takes classification methods and implements regression techniques to achieve high-level accuracy through prediction.
Figure 4 shows the accuracy of the training data set for the chosen stacked models, in which it is identified that the accuracy achieved 100% in XGBoost, Stackedmodel1, and Stackedmodel3. A minimum accuracy is obtained by the RF model.

To evaluate the performance, the proposed ensemble method is compared with other existing methods and shows that the proposed method is better implemented. Actually, results obtained from these methods via its execution at various times which are given in the tables and figures imply that the accuracy is very high for the proposed method ESLA. It is also noted from the tables and figures shown as results produced by this research work that a 90.13 percentage level of accuracy is obtained by the ESLA method, and other methods are less accurate than the ESLA method. From Table 4 and from Figure 5, it is observed that the results obtained by this experiment are very clear that the proposed method ESLA algorithms yield better results compared with all other existing methods via its performance through performance metrics.

5. Conclusions
Nowadays, medical data analysis has many approaches and uses a number of different methods for the diagnostics of diseases. Most of the approaches available in today’s world have used multiple methods. One such method proposed in this research work uses the prediction and classification of Parkinson’s disease from the publicly available data set. Voice-based parameters alone were taken for the prediction of disease. While voice-based Parkinson’s disease categorization has been proven to be effective, current approaches do not have the capacity to evaluate speech samples, which is essential for enhancing Parkinson’s disease classification effectively. Due to the difficulty of medical diagnosis and the prevalence of Parkinson’s disease, it is critical to propose a simple and affordable solution for its accurate and fast identification. Comparing the voice frequency characteristic of persons under controlled settings is an efficient technique to diagnose Parkinson’s disease. This research work utilized a novel approach ensemble stacking learning algorithm (ESLA), a classifier for the identification of Parkinson’s disease. The obtained results compared with the existing ensembles such as RF, XGBoost, logistic regression, AB, and MLP. Different approaches give different types of results, each of which produces many kinds of outcomes. In this work, the proposed method ESLA yields the best results when compared to existing methods via its obtained accuracy and from the other metrics. The same method may be extended to identify other diseases and also to use the other existing methods in the future.
Data Availability
The data taken for this work are from the publicly available data set.
Conflicts of Interest
The authors declare that they have no conflicts of interest.