Research Article

A Classification and Novel Class Detection Algorithm for Concept Drift Data Stream Based on the Cohesiveness and Separation Index of Mahalanobis Distance

Table 1

Representative research achievements of concept drift data stream in 2000–2016.

TypeAlgorithmYearCharacteristicsReference

Incremental learningVFDT2000The leaf node is replaced with a split node, and the algorithm uses less memory and time.[21]
HAT2009Hoeffding trees are combined with a sliding time window based techniques; there is no need to predict when concept drift occurs in the data stream.[22]
OHT2014The misclassification rate is used to control node splitting, and the concept drift is solved based on misclassification classes and false alarm rates.[23]
Hoeffding-ID2016Bayes’ theorem is combined with traditional Hoeffding trees. The new spanning tree is continuously used in the classification process to replace the old spanning tree so that the classifier maintains high accuracy and adapts to the data flow concept drift.[24]

Cluster-basedCluStream2003Extending the traditional clustering algorithm BIRCH to the data flow scenario has strong flexibility and scalability, but it is sensitive to outliers.[25]
DenStream2006Microclusters are used to capture summary information about a data stream, which can find clusters of arbitrary shapes in the data and have the ability to process noise objects.[26]
IEBC2014The clustering framework is integrated with the classified data stream using sliding window technology and data marking technology, which is excellent in clustering results and detection concept drift but can only process classified data.[27]
MuDi-Stream2016The multidensity classification problem in the concept drift data stream is solved by a hybrid method based on network and microclusters, but it is not suitable for high-dimensional data streams.[28]

Integrated learningAWE2003K classifiers are fixedly constructed, and a new classifier is trained in batch mode using the new arrival data object. Subsequently, the k most accurate classifiers are selected to form a classifier set, and each classifier is weighted according to the accuracy.[29]
AE2011It mainly solves the problem of data stream mining noise and is a collection of horizontal and vertical integration framework methods. The time complexity is high.[30]
EM2013Concept drift and novel class in the data stream can be automatically detected, but only concept drift under dynamic feature sets can be handled.[31]
CLAM2016It uses a class-based integrated classifier to efficiently classify data flow loop classes and novel classes, but it cannot classify multiclass data.[32]