Assessing the Influence Level of Food Safety Public Opinion with Unbalanced Samples Using Ensemble Machine Learning
Table 1
Model settings.
Model
Settings
NB
Uniform prior probability of classes; other parameters follow the default setting of sklearn MultinomialNB model.
SVM
Parameters follow the default setting of sklearn LinearSVC model.
XGBoost
Early stopping rounds = 10; eval_metric = “logloss”; other parameters follow the default setting of the Python package XGBoost.
FastText
Minimal number of word occurrences = 2; other parameters follow the default setting of the Python package FastText.
TextCNN
Keras-based implementation of a TextCNN [11]-like CNN, with a dropout layer after the embedding layer (dropout rate = 0.2); the 1D convolutional layer has 250 filters (kernel length = 3); a 3-max pooling layer follows and is followed by a flatten layer, a 50-unit dense layer, and a 3-unit softmax layer; the activation function of the convolutional layer and the dense layer is ReLU; input length = 1000, batch size = 256, epochs = 5.
LSTM
Keras-based implementation of LSTM; the embedding layer is connected to a LSTM layer with 200 neurons, where a 0.2 dropout rate of the input and recurrent state is applied; following the LSTM layer is a dropout layer (dropout rate = 0.2), a 64-unit dense layer (ReLU activation function) and a 3-unit softmax layer; input length = 1000, batch size = 128, epochs = 5, Adam optimizer, learning rate = 0.01.
BERT
Chinese pretrained model, L = 12, H = 768, A = 12; batch size = 32, epochs = 5, learning rate = 2e − 5; input length = 128.
KNN
Parameters follow the default setting of the sklearn neighbors model.