Research Article

COVID-19 Infodemic in Malaysia: Conceptualizing Fake News for Detection

Table 3

Summarized procedures for data preparation.

IndexProcedure

1The columns that require for future analysis are extracted
2The presence of null or missing values of the extracted data are examined
3Columns “Tajuk” and “Keterangan” are combined for future analysis
4The news distribution is plotted based on the categories of the news
5Column “label” is added, and the news is labelled into “real” and “fake”
6The balance of data is checked
7The data are preprocessed by natural language processing (NLP)
8Frequency distribution of 30 most common token of words and word clouds for COVID-19 related fake news are generated
9Feature extraction of the data is generated by term frequency-inverse document frequency (TF-IDF) with bigram
10The data are split into 70% train data and 30% test data
11SMOTE oversampling technique is employed to solve the data imbalanced problem