Research Article
COVID-19 Infodemic in Malaysia: Conceptualizing Fake News for Detection
Table 3
Summarized procedures for data preparation.
| | Index | Procedure |
| | 1 | The columns that require for future analysis are extracted | | 2 | The presence of null or missing values of the extracted data are examined | | 3 | Columns “Tajuk” and “Keterangan” are combined for future analysis | | 4 | The news distribution is plotted based on the categories of the news | | 5 | Column “label” is added, and the news is labelled into “real” and “fake” | | 6 | The balance of data is checked | | 7 | The data are preprocessed by natural language processing (NLP) | | 8 | Frequency distribution of 30 most common token of words and word clouds for COVID-19 related fake news are generated | | 9 | Feature extraction of the data is generated by term frequency-inverse document frequency (TF-IDF) with bigram | | 10 | The data are split into 70% train data and 30% test data | | 11 | SMOTE oversampling technique is employed to solve the data imbalanced problem |
|
|