Research Article

A Probabilistic Approach for Missing Data Imputation

Table 1

Literature review of ML algorithms and highlights.

ML categoryMethodHighlights

ClusteringBest fit missing value imputationUsed for the IoT datasets. The paper provides a comparison of BFMVI with other existing algorithms and shows that BFMVI outperforms them in terms of accuracy and efficiency [7]
Cluster-directed framework for neighbor-based imputationA brand-new cluster-directed framework is suggested by the authors. CFNI: cluster-directed framework for neighbor-based imputation, which uses data clustering alone to lead the identification of closest neighbors in order to get a more precise imputed value [8]
C-meansUsed to impute the value in missing places by similar entries in the complete datasets [9]. Used in distributed datasets [10]
K-meansPatil et al. used it to impute missing value in their work [11]

Deep learningDeep neural networkAble to fit the data closely, and can accurately predict new data points [12]
Long-short-term memoryDemonstrates good performance for time series missing values [13]

EnsembleAdaBoostIn [14], authors showed that the method is good enough to resilient missing data to identify hemodynamic instability in ICU patients early on
eXtreme gradient boostingEmploys feature selection and superior accuracy [15, 16]
Random forestIn [17], authors used random forest to estimate categories for similarity measuring to impute missing data

Neural networkMultilayer perceptron (MLP)In [18], authors showed the good results in categorical variables using MLP

Instance basedk-nearest neighbors (kNN)Pan et al. [19] considered the feature relevance which was measured by their modified KNN
Support vector machine (SVM)It was used to impute missing data for activity-based transportation model [20]