Complexity

Research Article

A Probabilistic Approach for Missing Data Imputation

Table 1

Literature review of ML algorithms and highlights.


ML category	Method	Highlights

Clustering	Best fit missing value imputation	Used for the IoT datasets. The paper provides a comparison of BFMVI with other existing algorithms and shows that BFMVI outperforms them in terms of accuracy and efficiency [7]
	Cluster-directed framework for neighbor-based imputation	A brand-new cluster-directed framework is suggested by the authors. CFNI: cluster-directed framework for neighbor-based imputation, which uses data clustering alone to lead the identification of closest neighbors in order to get a more precise imputed value [8]
	C-means	Used to impute the value in missing places by similar entries in the complete datasets [9]. Used in distributed datasets [10]
	K-means	Patil et al. used it to impute missing value in their work [11]

Deep learning	Deep neural network	Able to fit the data closely, and can accurately predict new data points [12]
Deep learning	Long-short-term memory	Demonstrates good performance for time series missing values [13]

Ensemble	AdaBoost	In [14], authors showed that the method is good enough to resilient missing data to identify hemodynamic instability in ICU patients early on
	eXtreme gradient boosting	Employs feature selection and superior accuracy [15, 16]
	Random forest	In [17], authors used random forest to estimate categories for similarity measuring to impute missing data

Neural network	Multilayer perceptron (MLP)	In [18], authors showed the good results in categorical variables using MLP

Instance based	k-nearest neighbors (kNN)	Pan et al. [19] considered the feature relevance which was measured by their modified KNN
Instance based	Support vector machine (SVM)	It was used to impute missing data for activity-based transportation model [20]