Abstract
Change-point detection (CPD) is to find abrupt changes in time-series data. Various computational algorithms have been developed for CPD applications. To compare the different CPD models, many performance metrics have been introduced to evaluate the algorithms. Each of the previous evaluation methods measures the different aspects of the methods. Based on the existing weighted error distance (WED) method on single change-point (CP) detection, a novel WED metrics (WEDM) was proposed to evaluate the overall performance of a CPD model across not only repetitive tests on single CP detection, but also successive tests on multiple change-point (MCP) detection on synthetic time series under the random slide window (RSW) and fixed slide window (FSW) frameworks. In the proposed WEDM method, a concept of normalized error distance was introduced that allows comparisons of the distance between the estimated change-point (eCP) position and the target change point (tCP) in the synthetic time series. In the successive MCPs detection, the proposed WEDM method first divides the original time-series sample into a series of data segments in terms of the assigned tCPs set and then calculates a normalized error distance (NED) value for each segment. Next, our WEDM presents the frequency and WED distribution of the resultant eCPs from all data segments in the normalized positive-error distance (NPED) and the normalized negative-error distance (NNED) intervals in the same coordinates. Last, the mean WED (MWED) and MWTD (1-MWED) were obtained and then dealt with as important performance evaluation indexes. Based on the synthetic datasets in the Matlab platform, repetitive tests on single CP detection were executed by using different CPD models, including ternary search tree (TST), binary search tree (BST), Kolmogorov–Smirnov (KS) tests, t-tests (T), and singular spectrum analysis (SSA) algorithms. Meanwhile, successive tests on MCPs detection were implemented under the fixed slide window (FSW) and random slide window (RSW) frameworks. These CPD models mentioned above were evaluated in terms of our WED metrics, together with supplementary indexes for evaluating the convergence of different CPD models, including rates of hit, miss, error, and computing time, respectively. The experimental results showed the value of this WEDM method.
1. Introduction
Change-point (CP) detection is the application of core techniques to detect abrupt changes in properties of time-series data. It has been widely studied in many real-world problems, such as atmospheric and financial analyses [1], fault detection in engineering systems [2, 3], changes detection in a variance of oceanographic time series [4], genetic time-series analyses [5], and online detection of steady-state operation [6]. For example, the usage of this method to detect abnormal patterns in ECG and EEG signals may also be beneficial [4, 7–15]. This application would allow appropriate staff to be alerted of abrupt changes in a patient’s medical situation and to provide on-time treatment [16, 17]. In addition, CPD models can be tightly combined with some nonlinear modeling approaches and their applications, such as classification of human hand movements [18], degradation signal for prognostic improvement [19], real-life hand prosthetic control [20], single-channel surface electromyography (sEMG)-based control [21]. CPD models utilize algorithms that cover the fields of data mining, statistics, and computer science, including parametric and nonparametric methods [8, 22–27]. Each CPD algorithm can be assessed from the aspect of detection accuracy, computational cost, or whether it can be a real-time detection.
Many performance metrics have been introduced to evaluate CPD algorithms based on the type of decisions they make [28]. Aminikhanghahi and Cook [29] reviewed the performance evaluation methods commonly used for CPD models. The evaluation can be based on a yes/no decision whether the resultant change point was detected within a certain distance from the actual change point. In this case, the CPD model can be treated as a binary classification model and can be evaluated with the usual measures, such as accuracy, sensitivity, specificity, or ROC curve [30, 31]. For real applications, for example, clinical decision-making, cut-offs applied to the model outcomes can be adjusted to achieve different sensitivity and specificity [32]. However, when the difference in time between the resultant eCP and the actual tCP represents the measure of CPD performance, then the evaluation of these algorithms is not as straightforward as for the binary classification. There is no single label against which the performance of the algorithm can be measured. A few useful metrics consider the distance between the eCP and the tCP to measure CPD method performance. These metrics include mean absolute error (MAE), mean squared error (MSE), mean signed difference (MSD), root mean squared error (RMSE), and normalized root mean squared error (NRMSE). Of these, except NRMSE normalizes the unit size of the predicted value and facilitates a more direct comparison of error between different datasets, the other methods measure only the absolute distances between the eCP and the tCP. However, even NRMSE does not count the difference between the situations when the eCP is before and after the actual tCP. It also fails to consider the relative position of the tCP within the total length of the time-series sample.
In our previous studies [33], a preliminary WED method was proposed for evaluating a CPD model for single change-point detection. In this existing method, a concept of weighted error distance (WED) is introduced for counting a normalized error distance between each pair of the resultant eCPs and the actual tCPs, and then the performance of different CPD models is ranked by the averaged WED accordingly [33]. In this study, a novel WEDM method is proposed to compare the overall performance of CPD models for MCPs detection on multiple data segments in a time series with different data features. Based on the previous WED measure, a concept of normalized error distance was introduced in this WEDM method, that allows comparisons of the distance between the estimated change-point (eCP) position and the target change point (tCP). During the successive MCPs detection, the proposed WEDM method first divides the original sample into a series of data segments in terms of assigned tCPs, and then counts a normalized error distance (NED) value for each segment. Then, our WEDM presents the frequency and WED distribution of the resultant eCPs from all data segments in the normalized positive-error distance (NPED) and the normalized negative-error distance (NNED) intervals in the same coordinates. Last, the mean WED (MWED) and MWTD (1-MWED) were calculated and dealt with as important performance indexes. Based on the synthetic datasets in the Matlab platform, both repetitive tests on single CP detection and successive test on MCPs detection were executed by using different CPD models, including ternary search tree (TST) [8, 34], binary search tree (BST) [15, 24], Kolmogorov–Smirnov (KS) tests [22, 25], t-tests (T) [23, 35], and singular spectrum analysis (SSA) algorithms [36] recorded in our previous studies [22, 37]. Meanwhile, these CPD models above were evaluated under the random slide window (RSW) [8, 38, 39] and fixed slide window (FSW) frameworks [40–44] in terms of our WEDM and supplementary indexes including the rates of hit, miss, error, and computing time, respectively. The experimental results showed the value of this WEDM method.
2. Methods
In this part, the proposed WEDM is theoretically illuminated in the following steps. First, the diagnosed sample is divided into a series of data segments according to the assigned target MCPs. Second, a normalized error distance (NED) is calculated by comparing the distance between the resultant eCP position and the actual tCP within each data segment. Third, the frequency and WED distribution of the resultant eCPs detected from all segments are presented across the normalized positive-error distance (NPED) and the normalized negative-error distance (NNED) intervals in the same coordinates. Last, the metrics of mean WED (MWED) and mean WTD (MWTD) are given to efficiently evaluate a CPD model for MCPs detection on a series of data fluctuations in an identical time series.
2.1. Data Segmentation
Suppose a time-series signal can be observed as a trajectory of a multiple data distribution process, in which the segment Xi is defined by the following equation:where t∈{ ti−1+1,..., ti}, 0< i ≤ M, and is a deterministic and piece-wise function of one-dimensional signal with change points (satisfying , and i = 1, …, M−1 for insuring that abrupt changes occur), and M∈{1, 2, …, n} is the number of data segment regimes and therefore M−1 is the number of abrupt changes, 0 = t0 < t1< ···< ti <···< tM = n. The number M−1 and locations η1,. . ., ηM−1 of change points in the process are supposed to be unknown. The sequence (εi)i ∈ N is assumed to be random white noise and such that E(εi) is exactly or approximately zero. In the simplest case, (εi)i ∈ N is modeled as i.i.d., but can also follow more complex time-series distributions.
Consider an observed time-series signal with M−1 change points mentioned above, one-part time series with a size of is selected from , , and . Suppose a set of target MCPs is contained within , and . In the proposed WEDM method, the diagnosed data sample is first divided into a series of data segments according to different target CP positions in the . The process of data segmentation is described below (Figure 1):(1)For each tCPi to be diagnosed in the , the data segment Segi can be denoted as follows: where and , and two endpoints and in are formulated as follows:(2)Especially, the first Seg1 and the last Segn can be presented according to the tCP1 and tCPn as follows: where and are the two endpoints in , respectively.(3)Then, the time series can be divided into a set of data segments . That is, , and the following equation holdswhere is the total length of , and refers to .

2.2. NED Evaluation on Single CP Detection
In the scheme of error distance (ED) measurement on single CP detection (Figure 2), each segment in time series is divided into the former (left) part and the latter (right) part by the actual tCPi located at the data point and .

From a statistical point of view, we refer to the former (left) part as a positive area and the latter (right) part as a negative one. When applying a CPD to detect the actual tCPi in the data segment , a resultant eCPi might be estimated from either the positive area or the negative one. A few concepts are introduced here to measure CPD model performance: true-positive distance (tPD), positive-error distance (pED), true-negative distance (tND), and negative-error distance (nED). If the resultant eCPi is detected on the left side of the tCPi (positive area), then pEDi and tPDi can be calculated. That is, the distance from the eCPi to the tCPi and the start point, respectively. Meanwhile, nEDi and tNDi are not applicable. Conversely, when the eCPj is estimated from the right side of the tCPi (negative area), nEDi equals the distance from eCPj to tCPi, and tNDi is the distance from the eCPj to the end of the data segment . At the same time, pEDi and tPDi do not exist (Figure 2). These definitions can be represented in formulas (6)–(9)as follows:
In which, Xa and Xb represent the start and endpoints of the time-series segment , respectively, Xc is the position of actual tCPi in the , Xd and Xe refer to the positions of resultant eCP on the left or right side of the tCPi respectively.
Basically, for a current data segment in the scheme of NED evaluation on single CP detection (Figure 3), the distance between the start point and the and the distance from the to the end of each segment are both normalized to 1, and the normalized tCP position for each segment will match to the same point. In formulas (10)–(13), , , , and can be interpreted as the normalized true-positive distance (), normalized positive-error distance (), normalized true-negative distance (), and normalized negative-error distance (), respectively.

Thereafter, a normalized error distance in formula (14) is presented by a piecewise function of and , according to the resultant eCPi located at the positive or negative area.
2.2.1. WED Evaluation on MCPs Detection
Given a series of data segments in a diagnosed time series above, we can assemble all the resultant eCPs into an identical coordinate and present their NED ranging from the positive area [−1, 0] to the negative area [0, 1] in the x-axis (Figure 4). Then, the frequencies of can be defined in the all resultant eCPs as follows:

In which, is the number of the resultant eCPs that their NED values equal to , and Nt is the number of resultant eCPs in total, .
Then, the weighted error distance is introduced according to the and in the resultant eCPs (Figure 5). For each in the scattered distribution of resultant eCPs, its corresponding is equal to WpEDi or WnEDi depending on whether the is located at the positive-NpED or negative-NnED area ranging from −1 to 1 in the x-axis. The definitions of , , and are formulated as follows:

Thereafter, a mean weighted error distance (MWED) is defined as follows:where l and r refer to the numbers of the eCPs located before and after the actual tCPs (positive-NpED area and negative-NnED area), respectively. In most of the CPD models, when the search algorithm reaches the start or end of the time series, if no change point is found, then the resultant eCP can be set as either the start or the end. Therefore, the sum of l and r will be equal to N (the total number of actual tCPs to be diagnosed in a time series ). Formula (17) can be simplified as follows:
Furthermore, following MWED, 1-MWED can be referred to as mean weighted true distance (MWTD) and used as a measure of the overall performance of a CPD model for MCPs detection on time series with a series of data fluctuations.
3. Results and Discussion
To accurately evaluate different CPD models, other related indexes were introduced besides our WEDM. In the synthetic experiments, time-series datasets were generated and assembled by using the Gaussian distribution function in the Matlab platform, and then repetitive tests on single CP detection were executed by using different TST, BST, KS, and SSA models. Meanwhile, the performance of CPD models was evaluated by using successive tests on MCPs detection that were implemented under different RSW and FSW frameworks, respectively.
3.1. Related Evaluation Indexes
In the synthetic tests, some other indexes are used for evaluating the convergence of different CPD models, including the hit, miss, and error rates, and computing time. Given a data segment in the time series mentioned above, the related definitions are introduced in terms of the error distance between the resultant eCPs and the actual tCPi as follows (Figure 6):(1)Error distance: Given an actual tCPi assigned in the current data segment , the error distance between each pair of the estimated eCPj and the tCPi is defined by .(2)Hit area: For the actual tCPi, the hit area named is formulated by , where is the threshold value of error distance between tCPi and eCPj.(3)Hit: Given an error distance mentioned above, if holds, then the tCPi is hit by eCPj and recorded by . Therefore, the value of defined in formula (18) equals 0.(4)Error: On the other hand, if holds, then eCPj is dealt as an error result labeled by . In this circumstance, the value of is within the rage .(5)Miss: In addition, if no change point is detected from the , then the target is missed, and identified by . Accordingly, the value of is set to be 1 because of the missing tCPi. Thereafter, the hit rate, miss rate, and error rate are formulated as follows: In which, is the number of actual tCPs hit by the resultant eCPs, is the part of actual tCPs that are missed, and stand for the number of the resultant MCPs in which holds. is the number of resultant MCPs in total, and it is usually larger than , that is, the number of the actual tCPs within the time series . Generally, it holds true that for all the resultant eCPs.(6)Computing time: In addition, for a certain CPD model k, the computing time is mainly used for tCPs detecting from the multiple data segments in , and it can be denoted as follows:where refers to the computing time cost in the , and is the total data segments. Then, the normalized time is defined as follows:

In which, stands for the computing time of the model k, and n is the total model to be compared. The represents the time ratio of model k to all methods, and then it can reflect the searching efficiency against others. Generally, both TST and BST models in our previous studies have a time complexity of nearly [8, 10, 13]; therefore, they should be faster and more efficient than some traditional algorithms with time complexity about , such as KS, CUSUM, t-test, or SSA methods.
3.2. Repetitive Tests on Single CP Detection
In the first experiment, repetitive tests on single CP detection were executed on the synthetic dataset, that is, Dataset1 that was generated by the Gaussian function in the Matlab R2016 platform. For each time series with single target CP, it is composed of both the positive area and the negative area before and after the assigned target . The former and latter were generated by the normal distribution N (μ = 0, σ = 1) of size m (m time points included in the positive area), and N (μ = V, σ = 1) of size N-m (N-m time points in the negative area), respectively, where V is a constant mean value, and N is the total length of .
Here, we first present the results from Dataset1 that was composed of multiple 20 data groups with different length N, variance V, and tCP, and each group contains 100 time-series samples. Therefore, Dataset1 included 2000 time series in total, and this experiment named Exp1 is performed by using TST, BST, KS, T, and SSA models, respectively. In our simulations, the time-series samples in each group were generated by selecting the random values of sample length N from 2^10 to 2^15, variance from 1.0 to 3.7, and the position of actual tCP from 1 to N.
In the 20 groups of Exp1, the repetitive tests are executed by using different CPD models including the TST, BST, KS, T, and SSA, respectively (Figure 7). With the total 2000 time-series samples in Dataset1, the frequency and WED distribution of resultant MCPs are illustrated from the positive-NpED range of [−1, 0] to the negative-NnED range of [0, 1] in the x-axis. From these results, we can see that if the resultant eCP is much closer to the central axis of x = 0, then the WED value generally gets smaller and tends to be 0, and vice versa. In all five models, TST and KS obtain the eCPs that are mostly located near the central field of x = 0, and then have narrower WED distributions and smaller WED values than other models, except that TST has a few eCPs fallen into the positive-NpED field. As for other BST, T, and SSA models, the eCPs are mainly scattered with a wide range from the NpED to the NnED areas, therefore their WED distributions are wider and bigger, especially for T and SSA.

(a)

(b)

(c)

(d)

(e)
Meanwhile, these simulation results also illustrate that both TST and KS have better convergency than others, especially, the TST has the highest hit level and takes the shortest convergent time in all five models. For the rest models, BST seems much better than others, and T has the worst convergency, because of the lowest hit, the biggest error, and convergent time in all five models. Furthermore, the mean analyses (Table 1) indicate that the TST takes the shortest computing time, has the highest hit rate, the smallest MWED, and the biggest MWTD out of the other four models. For T and SSA models, a lot of eCPs are scattered the whole field from NPED to NNED, especially, T has the biggest values of error rate and MWED and needs the longest time in all five models.
In addition, the efficiencies of five models are evaluated using random parameter values in a total of 20 tests. The dynamic tracks including hit rate, miss rate, error rate, and MWED are illustrated versus the test number from 1 to 20 (Figure 8). Also, the mean analyses on hit rate, miss rate, error rate, and MWED are presented in the histograms, in which, “1,” “2,” “3,” “4,” and “5” in x-axis refer to the TST, BST, KS, T, and SSA models, respectively. In the whole process of simulation tests, the TST model has a relatively higher hit rate with some fluctuations and keeps more stable and lower levels of miss rate, error rate, and MWED than others. Although KS has a smaller hit rate than TST and BST, it keeps lower tracks of miss and error rates than BST, T, and SSA. To some extent, BST has a bigger hit rate, and lower values of error rate and MWED than T and SSA, it seems unstable due to the drastic oscillations in the tracks of hit and miss rates. For T and SSA, both models have smaller hit rates and keep dramatic fluctuations in the tracks of error rate and MWED value, despite a lower miss rate than BST.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)
Furthermore, taking one representative test as an example, the simulations of single CP detection are repetitively executed by using 100 time-series samples with random values of parameters N = 2^14, tCP = 12267, and V = 1.9. For different TST, BST, KS, T, and SSA models, the resultant eCPs are illustrated using the locations, distributions, frequency, and WED, in line with the test number, time-series positions, NPED, and NNED in the x-axis, respectively (Figure 9). For both TST and KS models, it is easy to see that most of the eCPs are located within the small range near the actual tCP = 12267, and similar results can be found in the distribution, frequency, and WED analyses on the resultant eCPs. On the contrary, similar results for the rest of BST, T, and SSA models are that lots of the eCPs are randomly scattered across the fields from NPED to NNED, and small parts of the eCPs are gathered near the actual tCP.

(a)

(b)

(c)

(d)
Then, the mean analyses for this representative test are summarized in terms of WMTD, hit rate, miss rate, error rate, MWED, and time (Table 2). The results show that the TST model has much smaller values of MWED, miss and error rates, and computing time, as well as the biggest values of hit rate and MWTD than others. Despite a long time and smaller hit rate than TST, KS kept similar levels of MWTD, hit, miss, and error rates with it. As for the rest BST, T, and SSA, although the three models had similar performance, BST had the biggest miss rate, T had the smallest MWTD and hit rate, and the biggest values of time, error rate, and MWED.
3.2.1. Successive MCPs Detection under the RSW Framework
In the second experiment, successive tests on MCPs detection were implemented by using other synthetic datasets such as Dataset2 that was composed of W time-series samples, and each sample was assembled by n data segments with different features and distributions. For a given , each is assigned between two adjacent segments and , 1 i. Then, the sample can be denoted as = {, , , , , , , , }, where is the size of segment in . In the successive tests on MCPs detection, two experiments named Exp2 and Exp3 were implemented based on Dataset2 under the RSW and FSW frameworks, respectively. For each experiment, a series of tests for MCPs detection was executed by using TST, BST, KS, T, and SSA models, respectively.
In Exp2, the number of segments n within each sample Xi was stochastically chosen from 15 to 30, and each data segment Segj = {, , } was randomly generated by the Gaussian distribution N(Uj, Vj) of length Nsj from 2^12 to 2^15, with mean Uj from 1.0 to 0.1 × NMCPs, and variance Vj from 1 to 2.0 × NMCPs, respectively. Here, we present the results of successive tests on MCPs detection under the RSW framework. First, the frequency and WED distribution of resultant MCPs (Figure 10) are displayed within the whole range from the negative-NPED field to the positive-NNED field in the x-axis. Generally, for a certain CPD model, the resultant MCPs are closer to the central axis x = 0, their values of MWED are much smaller. In contrast, the bigger MWTD has, the better efficiency is, and vice versa. In all five models, the results (Figure 10) and the mean analyses (Table 3) show that most of the resultant MCPs detected by TST are located near the central axis x = 0, and TST has the biggest hit rate, the smallest values of miss and error rates, therefore it has the highest MWTD out of others. For the BST model, although a lot of the resultant MCPs are scattered away from the central axis x = 0, it has a smaller error rate and MWED, as well as a bigger hit rate and MWTD than the rest models. For KS, T, and SSA, the common feature is that most of the resultant MCPs are spread through the whole field ranging from −1 to 1 in the x-axis. KS has a bigger MWTD than the other two, T has the smallest MWTD, and SSA has the biggest values of error rate and computing time in all five models.

(a)

(b)

(c)

(d)

(e)
Meanwhile, these simulations illustrate that the TST has the best convergency because it has the highest hit level, the lowest error, and takes the shortest convergent time in all five models. For the others, the BST model has much better convergency due to the higher hit, lower error, and shorter time than others. SSA seems the worst one in all five models, because of the lowest hit, the biggest error, and convergent time.
Second, the performance of five CPD models is demonstrated by a series of 10 tests in total, in which the respective parameters of the sample size N, the number of MCPs NMCPs, the mean , and variance are randomly taken from 2^12–2^15, 15∼30, 1∼0.1 × NMCPs, and 1∼2 × NMCPs, respectively. The results of dynamic tracks and mean analyses (Figure 11) indicate that the TST model still keeps a better grade with a higher and more stable level of hit rate, as well as the lower levels of error rate and MWED than the other four models. Although BST looks more efficient than KS, T, and SSA, the dynamic tracks in all four items present stronger fluctuations, especially for the miss rate. This probably means that BST has unstable performance during the process of MCPs detection. As for the rest models, they all have similar tracks of lower hit rate and bigger error rates. KS presents instability due to the fluctuant tracks of miss rate and MWED, and so does the T model because of the fluctuant miss rate in the total of random 10 tests. Also, the model’s performance can be intuitively evaluated and distinguished from each other in terms of the mean analyses in the histograms (Figure 11(e)–11(h)).

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)
Last, one representative test is selected from Exp2 above, and the simulations of MCPs detection are demonstrated by using a time series with nMCPs = 25 (Figure 12). For the diagnosed data sample (Figure 12(f)), the distributions of resultant MCPs are illustrated by using different CPD models of TST, BST, KS, T, and SSA models, respectively (Figure 12(a)–12(e)). The results of frequency and WED distribution of resultant MCPs (Figure 13) and mean analyses (Table 4) reveal that the TST is a superior one in all five models because most of the resultant MCPs hit the target MCP positions, and few of them are dealt with as miss or error states. The BST model takes second place due to a smaller hit rate and bigger error rate than TST. For the rest models, KS, T, and SSA get worse one by one because more numbers of resultant MCPs are in the error state. As a result, the hit rate gets lower, and MWED takes bigger as well.

(a)

(b)

(c)

(d)

(e)

(f)

(a)

(b)

(c)

(d)

(e)
Successive MCPs detection under the FSW framework.
In the Exp3 under the FSW framework, the total of 30 data segments was arranged within each sample Xi, and each data segment Segj = { , , } was randomly generated by the Gaussian distribution N(Uj, Vj) of length Nsj from 2^12 to 2^15, with mean Uj from 1.0 to 0.1 × 30 and variance Vj from 1 to 2.0 × 30, as well as with the size of fixed slide window Nfsw ranging from 2^6 to 2^15, respectively.
In our simulations, we execute a total of 10 successive tests on MCPs detection under the FSW framework. First, the frequency and WED distribution of resultant MCPs (Figure 14) are displayed from the negative-NPED field to the positive-NNED field in the x-axis. Generally, for a certain CPD model, the resultant MCPs are much closer to the central axis x = 0, and their WED values are much smaller. The results (Figure 14 and Table 5) indicate that for the TST model, most of the resultant MCPs detected are located near the central axis x = 0, and it has the biggest hit rate, the smallest values of error rate, MWED, and computing time; therefore, it has the highest MWTD in all five CPD models. As for BST, KS, T, and SSA models, the common feature is that most of the resultant MCPs are randomly scattered through the whole field ranging from −1 to 1 in the x-axis. For KS, it has a smaller miss rate and MWED and a bigger MWTD than the others. Although BST has a bigger hit rate and shorter time, it has a bigger MWED and smaller MWTD than TST and KS. T and SSA have much bigger values of MWED, error rate, and smaller MWTD, especially SSA has the smallest MWTD and the biggest values of error rate and time in all five models.

(a)

(b)

(c)

(d)

(e)
Meanwhile, these simulations illustrate that the TST has the best convergency, in terms of the highest hit, the lowest error, and the shortest time in all five models. For the other four models, the BST model is much better than the rest ones, because it has a relatively higher hit level, lower error rate, and much shorter time than others. Unfortunately, SSA has the worst convergency in all five models, due to the lowest hit level, the biggest error rate, and the longest convergent time out of the other four models.
Second, the performance evaluation on five CPD models is demonstrated respectively by a series of successive MCPs detection tests in Exp3. Generally, the dynamic tracks and histogram analyses (Figure 15) show that all five CPD models present respective instability in response to the size of the fixed slide window, Nfsw ranging from 2^6 to 2^15, especially for the TST, BST, and KS models. Despite the TST model having the biggest miss rate with drastic fluctuations, it still keeps a better efficiency due to the highest hit rate and the lowest levels of error rate and MWED out of the other four models. As for the rest ones, BST seems better than KS, T, and SSA, because of the higher hit rate and the slightly decreasing level of error rate. Although KS reversely keeps decreasing hit rate and increasing error rate with big fluctuation, it seems better than T and SSA, on account of lower levels of miss rate and MWED. Both T and SSA present inefficiency and insensitivity in response to the increasing Nfsw, especially for the SSA model, with the lowest hit rate and the highest levels of error rate and MWED out of other ones.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)
Last, taking the TST model as an example, five representative simulations are selected from the total 10 tests in the FSW framework of Exp3 (Figure 16(a)–16(e)), and then the performance evaluation is listed under the values of Nfsw = 2^6, 2^8, 2^12, 2^14, and 2^15, respectively (Table 6). Given one data sample with NMCPs = 30 (Figure 16(f)), the results of MCPs detection show that the TST model presents the best performance as Nfsw = 2^12, in terms of the biggest values of hit rate and MWTD, and the smallest values of miss and error rates and MWED in all five tests. However, the efficiency of TST tends to be worse as the value of Nfsw takes too bigger or too smaller. Therefore, the size of the fixed slide window is a key factor for the FSW framework during the MCPs detection.

(a)

(b)

(c)

(d)

(e)

(f)
In all, these results in the two experiments above suggest that the proposed WED method can visually present the distribution of resultant eCPs in the error state and the normalized distance from the target position of zero in the x-axis. The simulation results suggest that the mean analyses of MWED can generally count the mean value of error ratio against total tests and then measure the efficiency of a certain model in the successive MCPs detection. The performances of different CPD models can be evaluated, and the better ones can be discerned from the others.
4. Conclusions and Discussion
In this study, a novel WEDM method is proposed for evaluating the overall performance of a CPD model across not only repetitive tests on single CP detection, but also successive tests on multiple change-point (MCP) detection on synthetic time series under different RSW and FSW frameworks. In this WEDM method, a concept of normalized error distance was introduced that allows comparisons of the distance between the estimated change-point (eCP) position and the target change-point (tCP) in the synthetic time series. Especially, both positive- and negative-error distances between resultant eCPs and actual tCPs are weighted or normalized for creating WED metrics.
As opposed to previous methods, our WEDM allows comparison when CPD is used across multiple time-series samples with different lengths and variances, especially cross multiple data segments in an identical time series, with different patterns, such as data distributions, segment sizes, and number and positions of targets tCPs. In the successive MCPs detection, our WEDM method first divides the original sample into a series of data segments in terms of assigned target change points and then calculates a normalized error distance (NED) value for each segment. Next, WEDM presents the frequency and WED distribution of the resultant eCPs from all data segments in the normalized positive-error distance (NPED) and the normalized negative-error distance (NNED) intervals in the same coordinates. Last, the mean WED (MWED) and MWTD (1-MWED) were obtained and dealt with as important performance indexes.
In our simulations, a series of MCPs detection tests were executed by using synthetic time-series datasets in the Matlab platform, and the proposed method was applied to the evaluation of the CPD utilizing TST, BST, KS, T, and SSA models under repetitive single CP detection in Exp1, successive MCPs detection under the RSW in Exp2, and FSW framework in Exp3, respectively. The results of the study showed its ability to compare the results from the CPD models working with a series of synthetic tests on multiple time-series samples. The WED metrics offer a new way of evaluating CPD performance. It allows better visualization of the distribution of the resultant eCPs when the CPD models work on multiple time series with different data features, as well as multiple data segments of a time-series sample with different data patterns. Meanwhile, the convergence of different CPD models was analyzed in terms of the dynamic tracks and mean analyses on the value of WED, as well as other measurements, including the rates of hit, error, and miss, and the computational cost. Our WEDM method can not only offer a visualizable and overall measure but also give better advice for users as to what CPD models to use based on the application.
Data Availability
Some synthetic time-series datasets were generated in the Matlab simulation platform, and no real datasets are used specially for the experimental validations in this study.
Conflicts of Interest
All authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
The authors would like to thank Prof. Qing Zhang and Prof. Mohan Karunanithi of the Australia e-Health Research Centre, CSIRO Computation Informatics, for their assistance, support, and advice for this paper. Also, the authors appreciate the editors and referees for their very helpful comments that led to a substantial improvement of this manuscript. This paper is supported by the National Natural Science Foundation of China (no. 61104154) and the Specialized Research Fund for Natural Science Foundation of Shanghai (nos. 16ZR1401300 and 16ZR1401200).