Abstract
The automated modal identification has been playing an important role in online structural damage detection and condition assessment. This paper proposes an improved hierarchical clustering method to identify the precise modal parameters by automatically interpreting the stabilization diagram. Two major improvements are provided in the whole clustering process. The modal uncertainty is first introduced in the first stage to eliminate as many as possible mathematical modal data to produce more precise clustering threshold, which helps to produce more precise clustering results. The boxplot is introduced in the last stage to assess the precision of the clustering results from a statistical perspective. Based on an iterative analysis of boxplot, the outliers of the clustering results are found out and eliminated and the precise modal results are finally produced. The Z24 benchmark experiment data are utilized to validate the feasibility of the proposed method, and comparison between the previous method and the improved method is also provided. From the result, it can be concluded that the modal uncertainty is more effective than the other modal criteria in distinguishing the mathematical modal data. The modal results by clustering process are not precise in statistic and the boxplot can find out the outliers of the clustering results and produce more precise modal results. The improved automated modal identification method can automatically extract the physical modal data and produce more precise modal parameters.
1. Introduction
During the last couple of decades, structure health monitoring has been developing rapidly in the area of civil engineering [1–3]. As being the basic parameters of a structure, the modal parameters can reflect the damage condition and service ability of a structure [4–10]. Long-term modal parameters can also reveal the evolution of structural service condition and are always obtained by analyzing the continuous monitoring data. However, continuous analysis requires massive labor work. Thus, to obtain massive amount of modal data, the automated modal identification technique gradually attracts people’s interests [11–15]. Many methods, no matter in time domain such as the stochastic subspace identification (SSI) [11, 12, 16] technique or in frequency domain [13], are proposed to automatically identify the modal parameters. Of all the proposed methods, the automated modal identification based on automated stabilization diagram interpretation plays an important role because the stabilization diagram provides more specific instruction on the true physical modes.
The main work in automated stabilization diagram interpretation is to automatically eliminate the spurious modal data, which originates from the overestimation of system orders, and pick out the truly physical modal results. The automated stabilization diagram interpretation process is accomplished by clustering method [11, 14, 16–21], especially hierarchical clustering method [11, 16, 19, 20]. For the clustering process, two critical problems, namely, the proper clustering threshold and the precision assessment of the clustering results, need to be carefully dealt with. Two kinds of thresholds, i.e., the static one and the automatically calculated one, are always used in the clustering process. The static threshold is determined normally based on the engineering experience, but, from the fully automated perspective, these static indexes are not universally suitable. Reynders et al. [16] proposed a method to automatically calculate the clustering threshold to make the modal identification process fully automated, and this method was later validated in many cases [19, 20]. However, the precision of the automated calculated threshold cannot be guaranteed. As a development, Neu et al. [19] later improved the whole clustering process and claimed that the parameters used in discriminating the mathematical modal data should be carefully selected and pointed out that the precision of the automatically calculated clustering threshold in [16] is challenging. Sun et al. [20] also proposed an approach to determine the clustering threshold and applied it to a cable-stayed bridge to automatically identify the modal parameters, but it lacks some physical explanation. Another limitation of all the automated modal identification methods is that the precision of the clustering results cannot be assessed. Even though an outlier detection method was proposed in [19] to detect the outliers of clustering results, it was conducted based on a prior distribution assumption, which does not meet the reality of automated modal identification.
In this paper, an improved hierarchical clustering process is proposed to automatically identify the modal parameters and estimate the precision of the clustering results. The modal uncertainty is firstly introduced to better eliminate the mathematical modal data, which improves the precision of the automatically calculated clustering threshold. The boxplot is introduced to detect the outliers of the clustering results and provide precise modal parameters.
This paper is organized as follows: first, the background of covariance-driven SSI (SSI-Cov) method and stabilization diagram is provided. Then, the proposed improved clustering process is explained and the key problems of the clustering process are discussed. Finally, the Z24 benchmark data are utilized to prove the applicability of the proposed method, and the comparison between the results of the improved method and the previous method is also provided in this section.
2. Background of Basic Theory in Automated Modal Identification
2.1. Review of Stochastic Subspace Identification and Stabilization Diagram
The SSI methods are based on the classical state-space form of the discrete-time equation of motion of a linear, time invariant N-DOF system under white noise excitation:where is the sampling instant; is the state matrix; is the output matrix, selecting the measured signals from the corresponding internal states collected in the discrete-time state vector ; is the measurements vector; and and represent the effect of unknown inputs, modelling inaccuracies and measurement noise.
These last vectors are assumed to be zero-mean realizations of stationary stochastic processes and independent of the actual state. Based on the basic assumption, the modal parameters can be extracted by relying only on the output response.
Two types of SSI method, namely, the data-driven SSI (SSI-Data) and covariance-driven SSI (SSI-Cov) [22], are always used in identifying the modal parameters. In this study, the SSI-Cov algorithm is utilized to extract modal parameters from output data of structures. The SSI-Cov algorithm consists of the following steps [20]: (1) computation of output covariance; (2) construction of the block Toeplitz matrix; (3) decomposition of the Toeplitz matrix; (4) estimation of the controllability and observability matrices; and (5) extraction of the modal parameters.
The ambient vibration measurements matrix is defined aswhere L is the total number of sensors and is the number of time steps in each set of sensor measurement.
Then the Hankel matrix is established as
The output correlations are then calculated according to
The calculated output correlations at different time lags are then combined to form a block Toeplitz matrix as
Then the Toeplitz matrix is decomposed via singular value decomposition aswhere and denote orthonormal matrices and ∑ denotes diagonal matrix which contains the positive singular values in descending order. The number of nonzero singular value of indicates the rank of Toeplitz matrix.
Based on the decomposition result, the observability matrix and the controllability matrix are formed as follows:where , and are the nonzero value of the corresponding vectors.
The system matrix A and C can be obtained bywhere is composed of covariances from lag 2 to 2i as
In the end, the modal parameters of the system can be extracted from the identified system matrix A and C aswhere denotes the time step; denotes the component of matrix ; and denote the real and imaginary components of , respectively; and and denote the damping ratio and modal shape for the rth mode, respectively.
In order to reduce the interference of noise signal to modal results, the reference-based SSI-Cov method (SSI-Cov/ref) was later proposed [22]. The main difference between SSI-Cov/ref and SSI-Cov is the Hankel matrix. For SSI-Cov/ref method, the Hankel data matrix is modified aswhere denotes the ambient vibration measurement matrix of the specifically selected channels, which are called reference channels. Monitoring data from reference channels are usually with high signal noise ratio, which reduces the interference of noise signal to the modal results.
2.2. Stabilization Diagram
For the SSI method, an input, i.e., the system order, has to be set before conducting the identification process. However, for a real structure, the system order is not known or cannot be estimated precisely beforehand. When the system order is set small the true modes may fail to be identified and when the system order is set big the spurious modal parameters will be calculated. The traditional method determining what value the system order should be set like looking for the gap in singular value graph fails because no obvious gap can be found for real structures especially for those under operational environment.
The stabilization diagram is a graph containing many identified modal frequencies at different system orders. The calculated frequencies are plotted in a graph with frequency as abscissa and system order as ordinate. In a stabilization diagram, the data spots representing the physical modal frequencies at different system orders look like several vertical lines because physical modes stabilize for system orders, while those spurious modal frequencies look scattered. The stabilization diagram provides a specific instruction that the modal data forming those vertical lines are true modal results and should be picked out. Thus, the modal identification process is transformed into the process of extracting the vertical lines in a stabilization diagram.
3. Automated Modal Identification
3.1. Overall Process of Automated Modal Identification
The aim of the stabilization diagram based automated modal identification is to pick out the stabilization axes automatically and precisely. Considering the limitations of the existing methods, an improved automated modal identification process is proposed. Four major steps are included in the proposed improved process, namely, the automated elimination of mathematical data, the clustering based automated stabilization diagram interpretation, the automated selection of physical modal clusters, and the boxplot-based outlier detection of clustering results. The modal uncertainty is introduced in the first step to better distinguish the mathematical modal data and produce clearer stabilization diagram for later clustering. In the fourth step, the boxplot is introduced to estimate the precision of the clustering results, and the outliers are eliminated and precise modal results are provided. The whole process can be explained in detail with flowchart in Figure 1.

3.2. Modal Validation Criteria
To get precise clustering threshold, the identified mathematical modal results must be eliminated as many as possible in the first stage. Reynders et al. [16] proposed a k-means clustering method, with k equaling 2, to separate the calculated modal results into certainly mathematical and probably physical by utilizing a vector consisting of some modal validation criteria. The authors summarized all the validation criteria and classified them into two categories, namely the hard validation criteria and the soft validation criteria. The hard validation criteria are (1) the identified damping ratios must be within 0∼0.1 and (2) the identified mode has a complex conjugate pair. The soft validation are some other criteria that cannot be utilized with static values or specific physical principles. The soft validation criteria are summarized in Table 1, and detailed information can be obtained in [16] and will not be repeated here.
is a distance between the continuous-time eigenvalues and of modes i and j. The eigenvalue is a combination of eigenfrequency and damping ratio. , , and are dimensionless distance measures of modal frequency, damping ratio, and modal transform norm between modes i and j. All these relative difference criteria can be calculated aswhere denotes , , , and , respectively.
and are two criteria related to modal shape. is the correlation coefficient between modal shapes i and j, and measures the complexity of modal shape i. is calculated aswhere and denote the modal shapes of two different modes.
All these criteria can be utilized to estimate the similarity between two modes. , , , and are relative criteria, and these criteria values equal 0 for ideal physical modes and 1 for ideal mathematical ones. and are the criteria estimating similarities of modal shapes, and the values go to 1 when the associated two modal shapes are ideally physical.
3.3. k-Means Clustering
The k-means clustering method is utilized in the first stage to separate the calculated modal data into two parts, i.e., the certainly mathematical modal data and the probably physical ones. The k-means algorithm is a typical partition-based clustering algorithm, and the main idea is to separate the sample data into several groups by minimizing some index iteratively. The clustering procedure can be explained as follows:
Suppose is a given data set of n samples. The k-means algorithm partitions into C clusters by minimizing an objective function:where K denotes number of modes in each cluster; denotes the sample vector at sample data ; and and denote the centroids of the physical and spurious mode clusters, respectively. This process works iteratively by continuously changing the centroids of each cluster until the objective function in equation (14) is minimized. The modal results are finally categorized into two clusters, and the cluster with a centroid is discarded.
3.4. Modal Uncertainty
Even though the soft validation criteria and k-means clustering process help to clear the stabilization diagram, however, our investigation shows, as is provided later, that the soft validation criteria cannot always work well in distinguishing the mathematical modal data. Further, if the mathematical modal data cannot be eliminated completely, the clustering threshold calculated later would not be precise, which may result in imprecise clustering results. In this study, a more effective validation criterion, i.e., the modal uncertainty, is introduced in the first stage to automatically distinguish the mathematical modal data.
The modal uncertainty originates from the non-white noise input signal of the SSI method. The authors in [23–25] deduced the calculation process, and then, it is promoted by an efficient calculation method [26]. The modal uncertainty is a good indicator which helps to distinguish mathematical and physical modes. However, the modal uncertainty cannot improve the precision of the identified modal parameter, which means the spots in stabilization diagram containing or without containing modal uncertainty are the same. Döhler and Mevel [26] validated the fact that modal uncertainties of mathematical modes are much bigger than the physical ones and suggested to use 1.5%, which is almost the same value as the static threshold of frequency in traditional clustering process, of the frequency value as the threshold distinguishing mathematical modal results. The general procedure of the calculation of modal uncertainty can be summarized in Figure 2.

3.5. Hierarchical Clustering-Based Automated Stabilization Diagram Interpretation
The main purpose of automated stabilization diagram interpretation is to automatically extract the true modal data, and this is always accomplished by using clustering method. The automated modal identification process aims to find out the stabilization axes formed by modal data similar in frequency, modal shape, and damping ratio. The clustering method groups the modal data based on the similarities between different modes and the modes with similarities smaller than a threshold are grouped together.
The hierarchical clustering method [16, 19, 20] is utilized in the second step to automatically interpret the stabilization diagram. The hierarchical clustering method was first introduced in automated modal identification in [27], using the eigenfrequency difference and the MAC value as distance measures to estimate the similarities between different modes. Then, the authors in [16] proposed a method to automatically calculate the threshold in clustering process to automate the whole process, and it was later used in [19, 20]. However, examples indicate that the precision of the threshold is challenging [19].
The identified modal parameters include three main modal indexes, namely, the modal frequency, modal shape, and damping ratio. The clustering process groups the similar modes together based on the similarities between different modes. The similarities can be estimated through , , and , which reveal the similarities between modes in frequency, modal shape, and damping ratio, respectively. For modes similar to each other, all three criteria are small, while for those dissimilar ones, these criteria are rather big.
The and are calculated aswhere denotes the frequency; denotes the modal shape; and denotes the matrix transposition.
However, due to the high discreteness of the identified damping ratios, no matter for operational modal analysis or experimental modal analysis, the values are much more discrete than the ones of and , which may severely influence the similarity estimation between different modal results. Thus, the and are commonly utilized in clustering process to estimate the similarities of different modal results [16, 19, 20]. Another problem is the non-uniform values of the similarities. The similarities between physical modes are not the same as each other and vary in a limited range. The clustering process groups the similar modes together by comparing the similarities with a threshold, which is also called cutoff distance, and the modes with similarities smaller than the threshold are grouped together.
In this study, the only input, i.e., the cutoff distance, is automatically calculated based on the and values of all the remaining probably physical modal results and is recommended to be calculated aswhere and denote the mean value and standard deviation of of all the remaining probably physical modal data in the cleared stabilization diagram, respectively.
The similarities between modes vary in a limited range. However, there is no specific method directing how the threshold should be selected. Considering the discreteness of the similarities, the threshold is automatically calculated from a statistical perspective. The mean value and standard deviation of can reflect the characteristics of the similarities and is utilized in many studies [16, 20]. Hierarchical clustering is one of the clustering methods which clusters the data by creating a hierarchical nested clustering tree. The cutoff distance determines whether the small data sets should be clustered into a new big one, and it directly determines the final clustering results.
The automatically calculated threshold is influenced by the elimination result of mathematical modal data. If the spurious modal data cannot be removed completely or the physical modal data are eliminated mistakenly, which can be called under-curing or over-curing, the remaining probably physical data leads to an inaccurate clustering threshold. So, if the mathematical modal data cannot be removed as many as possible, the threshold would not be precise, leading to inaccurate clustering results.
3.6. Selection of Physical Modal Results
After the clustering process, many clusters containing the similar modal data are formed. Due to the incomplete elimination of mathematical modal data, some clusters consisting of mathematical modal data are formed simultaneously. The main difference between the physical and mathematical clusters is the data number. The physical clusters contain more data while the mathematical clusters contain fewer. To make the modal identification process fully automated, the k-means clustering method [16], with k equaling 2, is utilized here again to automatically separate the clusters into two groups based on the number of data in each cluster. The group containing clusters with more data is considered physical and provided as final clustering results.
3.7. Boxplot-Based Outlier Detection
The precision of the clustering results mainly depends on the clustering threshold. The threshold determines how fat the extracted stabilization axes will be. In other words, the threshold determines how discrete the data forming a stabilization axis will be. The precision of the extracted modal parameters, especially for damping ratios, is important for model updating, condition assessment, and so on. However, until now, no specific index was provided to estimate the precision of the clustering threshold, which leads to the situation that the precision of the clustering results is questionable.
Neu et al. [19] proposed a method to detect the outliers of clustering results with an assumption that the modal data in the clusters obey the t-distribution. However, no mature evidence has ever been provided to indicate what distribution the modal data might obey. The distribution characteristic is not known before the true modal results are produced. Therefore, a method which can assess the precision of the clustering results without using the distribution information must be introduced.
To meet the above requirement, the boxplot is introduced in this paper to detect the outliers of the clustering results and provide precise modal parameters simultaneously. The boxplot has been widely used for outlier detection in statistic industry because the boxplot identifies the outliers based on the own characteristic of the data and needs no prior distribution information.
The boxplot was first proposed by John Tuckey in 1977. It is a kind of statistical chart used to analyze the distribution characteristic of the data. Boxplot-based outlier detection is often used in the process of detecting outliers in intelligent algorithms. It calculates the maximum, minimum, median, and upper and lower quartiles of sample data and uses the upper and the lower bound as the indices to determine whether a data should be treated as an outlier. The bounds are calculated aswhere and denote the upper and lower bounds, respectively. Q1 and Q3 denote the first and third quantile values of the sample data. IQR denotes the interquartile range and is calculated as
The calculation of quantile values of boxplot depends on no hypothesis of distribution characteristics. In other words, the boxplot needs no distribution assumption of the sample data and detects the outliers based on the own characteristics of the data to be analyzed. Even if there are outliers in the data sample, they have little influence on the quantile values, which ensures that the data after outlier detection process reveals the true characteristics of the data itself.
In this step, an iterative outlier detection process is established. The boxplot is applied to detect the outliers of the clustered frequency and modal damping. Once an outlier is found out in the cluster, it will be eliminated and the boxplot is used again to check whether there are still outliers in the remaining data. This process continues until no outlier is found. The boxplot-based outlier detection process is applied to every cluster from automated clustering process, and it consists of the following steps:(1)Start from the frequency values.(2)Calculate the upper and the lower bound of the data sample.(3)Look for the outliers of the data and eliminate them if the outliers are found.(4)Repeat steps (1) to (3) until no outlier is detected out.(5)Process the identified modal damping of the cluster with steps (2)∼(4), and this stops until no outlier is found.(6)Check again the processed data set. If no outliers are found out, the remaining data are provided as the final precise modal results; otherwise, repeat step (1) to step (5).
Boxplot-based outlier detection process identifies the outliers with no distribution assumption of the data, so it reveals the true characteristic of the identified results. The iterative outlier detection and elimination process work as data refining method and finally automatically produce precise modal results.
4. Validation Example: The Z24 Bridge
4.1. Structure Description
The Z24 bridge was part of the road connection between the villages of Koppigen and Utzenstorf, Switzerland, overpassing the A1 highway between Bern and Zurich, and it was a post-tensioned concrete two-cell box-girder bridge with a main span of 30 m and two side spans of 14 m. A full forced and an ambient operational vibration test were performed before the bridge was demolished, and 291 degrees of freedom have been measured in total with three acceleration components on the pillars and mainly vertical and lateral accelerations on the bridge deck. The data were collected in 9 different setups with 5 channels, which are called reference channels, that were common to all setups. Acceleration data of all the nine experiment scenarios were presented as benchmark data for assessing the performance of modal identification and damage detection method. Detailed description of Z24 bridge and the experiment scenarios are provided in [28].
The collected acceleration data of 9 different scenarios are utilized here to validate the proposed method. The roar acceleration data will be firstly processed by SSI-Cov/ref method to calculate the modal candidates to establish the stabilization diagrams. Then the proposed method is applied to the stabilization diagrams to automatically extract the physical modal parameters. The identified modal parameters were then used to assess the service condition of the bridge or update the FEM model, and this falls out of the scope of the paper and will not be discussed.
4.2. Modal Identification and Uncertainty Calculation
The acceleration data of nine scenarios have been processed with SSI-Cov/ref algorithm. The five reference channels were chosen for the calculation of modal uncertainty, l = 50 was chosen as half the number of block rows in the data matrix, and the model order range was set 2 to 160 in steps of 2, which are the same as [16]. The original experiment data were downloaded from the website of Leuven University, making sure that all the factors influencing the identification results stay the same.
Nine stabilization diagrams of nine experimental scenarios are created, and due to the space of the paper, results of the first, fifth, and ninth setup are only provided in Figure 3. Figures 3(b), 3(d) and 3(f) show the identified modal frequencies and the associated modal uncertainties, and the modal uncertainties are plotted as horizontal bars. Physical modes stabilize for different model orders and several vertical lines formed by physical modes can be seen, and these vertical lines are the potential stabilization axes needed to be automatically extracted out. Meanwhile, many spurious modal data also can be seen in the stabilization diagram, making the true stabilization axes unclear.

(a)

(b)

(c)

(d)

(e)

(f)
4.3. Mathematical Results Elimination
The soft validation criteria, hard validation criteria, and modal uncertainty threshold were used in sequence to eliminate the mathematical results. The soft validation criteria vector consisted of the following indexes:
The initial values of the clustering center of physical and mathematical modal data were set aswhere and denote the values of initial cluster centers of definitely physical and certainly mathematical modal data, respectively.
The 2-means clustering method was utilized to separate the cleared modal data into mathematical and probably physical, and the mathematical data were discarded. Then the probably physical results that do not meet the hard validation criteria were eliminated directly. The modal uncertainty was applied to the remaining probably physical data in the last, and the modal data with uncertainties bigger than 1.5% of the associated frequency values were considered mathematical and eliminated.
Figures 4(a), 4(c), and 4(e) show the cleared stabilization diagrams by the proposed improved method. Some obvious vertical lines formed by physical frequency spots shown up clearly, and most of the visually scattered modal data outside the stabilization axes, which are considered as mathematical modal results, are eliminated. The improved method eliminates almost all of the mathematical modal data and retains the probably physical ones in the stabilization axes.

(a)

(b)

(c)

(d)

(e)

(f)
To better illustrate the advantage of the improved method in mathematical data elimination, the cleared stabilization diagrams without using modal uncertainty are also provided in Figures 4(b), 4(d), and 4(f). Compared with the clearing results of the improved process, many scattered spurious modal data are retained, and these modal data are mistakenly treated as probably physical results for later calculation of clustering threshold.
Compared with Figures 4(a), 4(c), and 4(e), Figures 4(b), 4(d), and 4(f) contain more scattered frequency spots outside the vertical lines, and many single spots, which are definitely mathematical modal results, are retained in Figures 4(b), 4(d), and 4(f). The main purpose of mathematical data elimination process is to eliminate as many as possible the scattered data spots considered as mathematical modal data and retain the data forming the vertical line, which are considered as physical modal data. The remaining modal data determine the precision of the stabilization threshold calculated later. The more mathematical modal data are retained, the bigger the stabilization threshold will be because the similarities between mathematical modes are much bigger than the physical ones, as is verified in Section 4.3, leading to imprecise clustering results. The elimination result indicates that the modal uncertainty is more effective than the soft validation criteria in distinguishing mathematical modal data.
4.4. Automated Calculation of Clustering Threshold
Based on the cleared stabilization diagrams, the clustering threshold was automatically calculated. The sample mean and standard deviation of of the remaining probably physical modal data, which are shown in Figure 4, are used to calculate the clustering thresholds. Figure 5 provides the automatically calculated thresholds based on the cleared stabilization diagrams processed with proposed method and previous method [16], respectively. The calculated thresholds based on cleared stabilization diagrams processed by previous method [16] are almost twice bigger in many scenarios than the ones calculated by the proposed method, and this is because more mathematical data are retained in the stabilization diagrams processed by previous method. Most of the calculated clustering thresholds by the proposed method are near 0.07, which is almost the same as the static threshold by experience, indicating the feasibility of the proposed method.

The discreteness of similarities between mathematical modal data is much bigger than the physical ones, which results in bigger sample mean and sample standard deviation of and . The maximum threshold in setup 3 by previous method exceeds 0.2, as is also mentioned in [19], which is much bigger for civil structures. With introducing in the modal uncertainty, more mathematical modal results can be eliminated and more precise clustering thresholds can be calculated.
4.5. Automated Clustering
The automatically calculated thresholds were directly used as cutoff distances in hierarchical clustering process. Figures 6(a), 6(c), and 6(e) show the clustering results of scenarios 1, 5, and 9 by the proposed improved method. Different data clusters are plotted with different markers. The remaining physical stabilization axes are marked with solid vertical lines. The first five stabilization axes are found in all nine scenarios.

(a)

(b)

(c)

(d)

(e)

(f)
The final produced clustering results by the previous method [16] are also provided as comparison in Figures 6(b), 6(d), and 6(f), and more stabilization axes are identified. The stabilization axes, except the first two, are all visually fatter than the ones identified by the proposed improved method. Some data spots obviously outside the stabilization axes which are considered as mathematical results are grouped together, which is the limitation of the previous method [16]. Due to the limitation, the previous method might cause a problem that some produced clusters containing many spurious modal data would be considered as physical results because the truly physical clusters selected in step 3 are based on the data number of each cluster.
On the contrary, the identified stabilization axes by the proposed method are much thinner and clearer, and for each true modal cluster, no data points visually out of the stabilization axes are included, indicating the proposed method gets more precise clustering results. As is discussed before, the precision of the clustering results is influenced by the cutoff distance. The cutoff distance automatically calculated by the proposed method is more reasonable than the one by previous method [16], and this determines that the clustering results by the proposed method are more precise.
Different clustering results can be seen in the fifth and ninth scenarios. The identified first two physical clusters in all setups are almost the same and this is because the mathematical data near the physical modes are eliminated almost completely. The other physical clusters identified by the previous method are fatter than the ones by the proposed improved method, and this is because of the incomplete elimination of mathematical data and its associated bigger threshold. The remaining mathematical modal data near the physical data are grouped together because of the bigger threshold, increasing the number of data of the associated clusters. The misclustering of mathematical modal data enhances the risk that the clusters with less modal data, which are considered as truly mathematical results, might be treated as truly physical results because of its big amount of data. The identified stabilization axes near 20 Hz show big difference. The previous method produced 5 physical clusters in scenario 5 while the proposed improved method only provides 2. However, as is shown, the data spots in the extra 3 stabilization axes are visually scattered and show great discreteness, indicating that the reliability of the extra stabilization axes is questionable.
Figure 7 shows the identified first five modal shapes by the proposed method. The identified nine partial modal shapes of different scenery are assembled using least square method. The assembled modal shapes are the same as [16], indicating the correctness of the identified results.

(a)

(b)

(c)

(d)

(e)
4.6. Boxplot-Based Outlier Detection
The boxplot was then applied to each cluster. The frequency results were processed firstly and then the damping ratio. For each modal criterion, if a single value was identified as outlier, the data and the associated damping ratio are eliminated simultaneously, which ensures that the produced modal results are precise in both frequency and damping ratio. Figures 8 and 9 show the boxplots of the first five frequencies of scenarios 1 and 9 before conducting the outlier detection process, and many outliers were found out at each order because they do not meet the own characteristic of the data. The boxplots of the refined modal frequencies are provided in Figures 10 and 11, and no outliers can be found. The refined results are provided as the final clustering results.

(a)

(b)

(c)

(d)

(e)

(a)

(b)

(c)

(d)

(e)

(a)

(b)

(c)

(d)

(e)

(a)

(b)

(c)

(d)

(e)
The boxplot can reveal the distribution characteristic of the sample data. For data obeying normal distribution, the median value of the sample data represented by the red middle line in the box is the same as the mean value, which is represented by blue line in the box. From the boxplots of frequencies and damping ratios, not all normal distributions were found and distributions of most of the frequencies and damping ratios are skewed, which means the former method [19] identifying the outliers based on t-distribution is not strictly theoretically reasonable.
The outlier detection result of the method in [19] is also provided as a comparison. Figure 12(a) shows the relationship between damping ratio and frequency of the initial clustering result, and some obvious outliers can be found in each order. Figures 12(b) and 12(c) show the relationship between damping ratio and frequency after eliminating the outliers by the proposed method and previous method [19], and obvious outliers still can be found in Figure 12(c) in the fourth and fifth order damping ratio, indicating the previous method fails to find all the outliers. As is discussed before, [19] identifies the outliers based on the assumption that the sample data obey the normal distribution, which does not accord with the real distribution of the sample data. Another reason is that the mean value, which is used in the previous method [19] in outlier detection, is easy to be influenced by outliers. The mean value is normally bigger than the real one because of the corruption of outliers, which results in the situation that less outlier will be found out.

(a)

(b)

(c)
The clustering results calculated by provided method, previous method, and manual analysis are provided in Table 2. The mean values of frequencies and damping ratios are similar to each other [16]. The standard deviation of the modal results calculated by the improved automated method is smaller, indicating the proposed method can successfully extract the modal parameters and produce more accurate results.
Table 2 shows the mean value and standard deviation of Z24 bridge computed from the improved method , previous method , and manual analysis .
4.7. Sensitivity Analysis
For boxplot analysis, the criteria Q1 − 1.5IQR and Q3 + 1.5IQR are proposed from statistical perspective. To investigate the influence of the criteria to the outlier detection result, two more criteria, namely, twice and one time the IQR, are utilized to detect the outliers of the clustering results. Figure 13 provides the relationship between damping ratio and frequency after outlier detection with different criteria. When the criteria are set bigger, more data are retained and more discrete modal data are obtained. When twice the IQR is set, more data are retained and some obvious outliers, which are shown in the fourth order damping ratio, are retained, indicating the criteria are not so reasonable, while when one time the IQR is set, most of the outliers are eliminated. Theoretically, the smaller the criteria are set, the more precise the remaining data will be. However, it is not suggested to set the criteria too small because it might increase the risk that the true modal data might be eliminated. The criteria Q1 − 1.5IQR and Q3 + 1.5IQR are still recommended to be utilized because the outlier detection result is acceptable because all the obvious outliers are eliminated, which on one hand guarantees the precision and on the other hand could avoid the risk of eliminating the true data.

(a)

(b)
5. Conclusions
This paper presents an improved clustering process to automatically identify the modal parameters by automatically interpreting the stabilization diagram. The modal uncertainty is introduced to eliminate the mathematical modal data because of its high effectiveness. The boxplot is introduced to detect the outliers of the clustering results to produce precise modal results. The Z24 benchmark data is analyzed to validate the feasibility of the proposed method. The following conclusions can be drawn:(1)The modal uncertainty shows high effectiveness in distinguishing mathematical modal data compared to the other soft validation criteria. With introducing the modal uncertainty in the first stage, the proposed method identifies and eliminates more mathematical modal data and produces clearer stabilization diagram.(2)The automatically calculated thresholds with cleared modal data processed by the improved method are smaller and more reasonable than the previous ones because the improved method eliminates more mathematical modal data, which reduces the discreteness of df and of the remaining modal data. With precise clustering thresholds, more reasonable clustering results can be provided.(3)The produced modal results by clustering process are not precise in statistical perspective. With the help of boxplot, the outliers of the clustering results are found out and eliminated. The boxplot identifies the outliers of the modal results with no prior assumption of the distribution characteristic and identifies the outliers based on the own characteristic of the sample data, which is suitable for the automated modal identification.
Data Availability
The data used to support the findings of this study were downloaded from https://bwk.kuleuven.be/bwm/z24.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The authors would like to gratefully acknowledge the support from National Key R&D Program of China (Grant nos. 2018YFB1600300, 2018YFB16003001), the National Science Foundation of China (Grant no. 51878059), and the Foundation Research Funds for the Central University (Grant nos. 300102218406 and 300102219202). The great work from the measurement campaign of Z24 bridge is also acknowledged.