Journal of Electrical and Computer Engineering

Research Article

A Classification and Novel Class Detection Algorithm for Concept Drift Data Stream Based on the Cohesiveness and Separation Index of Mahalanobis Distance

Table 1

Representative research achievements of concept drift data stream in 2000–2016.


Type	Algorithm	Year	Characteristics	Reference

Incremental learning	VFDT	2000	The leaf node is replaced with a split node, and the algorithm uses less memory and time.	[21]
	HAT	2009	Hoeffding trees are combined with a sliding time window based techniques; there is no need to predict when concept drift occurs in the data stream.	[22]
	OHT	2014	The misclassification rate is used to control node splitting, and the concept drift is solved based on misclassification classes and false alarm rates.	[23]
	Hoeffding-ID	2016	Bayes’ theorem is combined with traditional Hoeffding trees. The new spanning tree is continuously used in the classification process to replace the old spanning tree so that the classifier maintains high accuracy and adapts to the data flow concept drift.	[24]

Cluster-based	CluStream	2003	Extending the traditional clustering algorithm BIRCH to the data flow scenario has strong flexibility and scalability, but it is sensitive to outliers.	[25]
	DenStream	2006	Microclusters are used to capture summary information about a data stream, which can find clusters of arbitrary shapes in the data and have the ability to process noise objects.	[26]
	IEBC	2014	The clustering framework is integrated with the classified data stream using sliding window technology and data marking technology, which is excellent in clustering results and detection concept drift but can only process classified data.	[27]
	MuDi-Stream	2016	The multidensity classification problem in the concept drift data stream is solved by a hybrid method based on network and microclusters, but it is not suitable for high-dimensional data streams.	[28]

Integrated learning	AWE	2003	K classifiers are fixedly constructed, and a new classifier is trained in batch mode using the new arrival data object. Subsequently, the k most accurate classifiers are selected to form a classifier set, and each classifier is weighted according to the accuracy.	[29]
	AE	2011	It mainly solves the problem of data stream mining noise and is a collection of horizontal and vertical integration framework methods. The time complexity is high.	[30]
	EM	2013	Concept drift and novel class in the data stream can be automatically detected, but only concept drift under dynamic feature sets can be handled.	[31]
	CLAM	2016	It uses a class-based integrated classifier to efficiently classify data flow loop classes and novel classes, but it cannot classify multiclass data.	[32]