A Novel Sentiment Analysis Model of Museum User Experience Evaluation Data Based on Unbalanced Data Analysis Technology
Algorithm 2
An oversampling method.
Input: SV area sample set Xsv, denoised MNSV area sample set XNSV-maj, denoised FNSV area sample set XNSV-min, test data D;
Output: training samples Xtr, real SVM sample set Xsv.
Step 1: Initialize the sample, let X4 = Xsv.
Step 2: Find all the minority class samples in the sample set Xsv and divide them into correct samples X1 and wrong samples X2.
Step 3: The first δ samples in X1 close to the decision boundary and the samples in X2 form a new sample set Xj; for each sample in Xj, use the SMOTE algorithm to oversample between the same class, generate X3, and add X3 to into X4.
Step 4: X4, XNSV-maj, XNSV-min form a new training set X’tr, and the training set trains the SVM classifier to find the corresponding SSV.
Step 5: Input the test data D, obtain the accuracy rate ACC1 of the minority class and the accuracy rate ACC2 of the majority class after classification, and calculate the corresponding G-mean at this time.
Step 6: Compare the size of ACC1 and ACC2, if ACC1 is greater than or equal to ACC2, then terminate the operation; otherwise, continue the following operations;
Step 7: Repeat steps 2 to 6 until ACC1 is greater than or equal to ACC2, select the training sample X’tr corresponding to the largest G-mean during the period, then X’tr is the required optimal data. Correspondingly, the support vector sample set X’sv corresponding to X’tr is the closest support vector set.