Research Article
COVID-19 Infodemic in Malaysia: Conceptualizing Fake News for Detection
Table 3
Summarized procedures for data preparation.
| Index | Procedure |
| 1 | The columns that require for future analysis are extracted | 2 | The presence of null or missing values of the extracted data are examined | 3 | Columns “Tajuk” and “Keterangan” are combined for future analysis | 4 | The news distribution is plotted based on the categories of the news | 5 | Column “label” is added, and the news is labelled into “real” and “fake” | 6 | The balance of data is checked | 7 | The data are preprocessed by natural language processing (NLP) | 8 | Frequency distribution of 30 most common token of words and word clouds for COVID-19 related fake news are generated | 9 | Feature extraction of the data is generated by term frequency-inverse document frequency (TF-IDF) with bigram | 10 | The data are split into 70% train data and 30% test data | 11 | SMOTE oversampling technique is employed to solve the data imbalanced problem |
|
|