Research Article

Improved Pre-miRNA Classification by Reducing the Effect of Class Imbalance

Algorithm 1

Algorithm of classifying the real/pseudo pre-miRNAs based on cost-sensitive ensemble learning.
Input: a dataset, V, including all the positive and negative samples, and the negative samples are more than the positive samples.
Output: ensemble classifier based on integrating multiple classification instances.
(1) For to
(2)If is a positive sample
(3) , is the weight of
(4)else
(5) 
(6)End If
(7) End For
(8) t is used to record the current iteration number, and its initial value is set as 1
(9) While ()
(10)  = Null, the negative training set is emptied
(11) If equals 1
(12)  All negative samples are gathered into k clusters based on the -means method. Assume set P is composed of all the
   positive samples and the parameter
(13)  For each cluster, the sample locating closest to the center is selected and added into . Furthermore, the number of
   negative samples is equal to that of positive samples
(14) else
(15)  According to the weights of negative samples, k negative samples are selected in proportion to their weights.
   These samples are added into and
(16) End If
(17) The training dataset is composed of and . A new classification instance based on SVM is constructed by using
  the training dataset and integrating their weight distribution
(18)  is used to classify all the samples in , evaluate its classification performance ,
  and compute its classification error rate
(19) The adjustment weight is calculated, and the weight of each positive and negative sample
  is updated by using the rule
(20) 
(21) End While
(22) An integrated classifier is constructed by integrating classification instances based on the voting mechanism.
  The final classification result is obtained as follows.