Constructing a Method for an Evaluation Index System Based on Graph Distance Classification and Principal Component Analysis

Shi, Keyou; Liu, Yong; Zhang, Zhijun; Yu, Qing; Zhang, Qiucai

doi:https://doi.org/10.1155/2019/6015754

Advances in Materials Science and Engineering

On this page

Abstract Introduction Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2019 | Article ID 6015754 | https://doi.org/10.1155/2019/6015754

Constructing a Method for an Evaluation Index System Based on Graph Distance Classification and Principal Component Analysis

Keyou Shi,¹Yong Liu,¹Zhijun Zhang,¹Qing Yu,¹and Qiucai Zhang¹

Academic Editor: Andrey E. Miroshnichenko

Received12 Dec 2018

Accepted12 Mar 2019

Published01 Apr 2019

Abstract

Based on the importance of having an evaluation index system, a new method that combines PCA with graph distance classification is presented to make up the deficiencies of principal component analysis in the process of index screening, and this method is applied in the construction of an evaluation index system for the environmental quality of decommissioning uranium tailing. The seepage indexes were classified into six classes using graph distance classification, which selects the representative elements, including pH, , ²¹⁰Pb, ²¹⁰Po, F⁻, and . All of the representative elements were analyzed by PCA while determining the seepage indexes, including pH, U, Ra, , NH₄-N, and F⁻, and establishing an index system for environmental quality evaluation that consists of two primary indexes (seepage and radiation environment) and 12 secondary indexes. The results showed that the model had ensured that the sifted indexes had a significant effect on the evaluation result and avoided the deletion of some important indexes and that it had stronger applicability and maneuverability.

1. Introduction

The selection of an evaluation index and whether the index system is reasonable or not have an essential effect on the evaluation results. Thus, how to select an index in a complicated and enormous index system is a problem in the construction of an evaluation index system for the environmental quality of decommissioning uranium tailing. If all of the indexes are evaluated, then the computing effort of the information processing will be greatly increased because of the excessive indexes. However, if only a few indexes are selected individually, then a large amount of the original data’s information could be lost, which causes inaccuracy of the evaluation results.

Principal component analysis (PCA), as an evaluation index screening method, could not only substitute a small number comprehensive variables for the original multidimensional variable and obviously simplify the data structure under the precondition of minimizing the loss of the original data’s information but also avoid subjectivity and arbitrariness. In the application, PCA is widely applied in index selection, evaluation, and prediction, with strong applicability [1–5]. At the same time, there are also some defects. For example, the contribution rate of the discard component on the data analysis could be similar to the chosen principal component, or there is a high level of correlation between the major elements of the chosen principal component and the elements of the discard component, which is prone to loose the important indexes and affect the evaluation results.

To make up for the deficiency of PCA, a new classification method that combines the correlation coefficient [6] with the shortest path theory [7, 8], namely, graph distance classification (GDC), was presented. Combining GDC with PCA, GDC was used to classify the indexes, and the representative elements selected from the classification were analyzed by PCA; this approach can simplify the data and reduce the redundant indexes to simplify the evaluation index system. This method was applied to simplify the evaluation index system for the environmental quality of decommissioning uranium tailing, which can reduce the computing effort of the information processing and ensure the rationality of the index system.

2. Construction of the Method of an Evaluation Index System

The establishment of an evaluation index system based on graph distance classification and principal components analysis can be divided into two stages: index classification and index screening, as shown in Figure 1. The first stage is the selection of the initial indexes, and then, the indexes are classified and selected by graph distance classification and principal components analysis. Eventually, the simplified index system is determined. In the graph distance classification method, the highly correlated elements are ascribed to the same class. Then, the representative elements that are selected from the classification are analyzed by PCA, which can reduce the information processing workload. According to the screening results of the PCA, the selected indexes and the same class indexes serve as important indexes when constructing the index system, which can avoid the loss of important indexes.

3. Index Classification Model Based on a Graph Distance Classification

3.1. Data Preprocessing and Standardization

Because of subjective and objective reasons, there are some cases in which data are missing in the sample data. Thus, the polynomial interpolation method is adopted. At the same time, the dimension of the unit value indexes often varies, and to eliminate the influence of the dimension, the standardization method of data [5] is often used to make the indexes dimensionless.

Let be the observed value of a certain index, and let be the standardized data of the index. The formulas of the standardization areand and represent the mean and standard deviation of the observed values, respectively.

3.2. Determination of the Distance between the Indexes Based on the Correlation Coefficient

Based on the correlation coefficient, the reciprocal of the correlation coefficient is used as the distance between the indexes. The higher the correlation coefficient between the indexes is, the shorter the distance is.

The formula for the correlation coefficient iswhere is the correlation coefficient between index and index . Here, and represent the observed values of index and index of the evaluation object , respectively, and and represent the average values of index and index , respectively.

Let the reciprocal of the correlation coefficient be the distance between the indexes. In other words, we have

3.3. Calculation of the Shortest Path Based on the Floyd Algorithm

The complete weighted graph is determined according to the distance between the indexes. The initial index is used as the vertex of graph , and the shortest path between the vertexes is the shortest distance between the indexes.

Let each index correspond to one vertex in graph . Suppose that graph is a complete simple graph with the vertex set and edge set .

Let the determining weights of each edge be

The graph is said to be a complete weighted graph if each edge has determining weights . Calculate the shortest distance between any two vertexes based on the shortest path algorithm. Let be the shortest distance between vertexes and . Calculate the shortest path between any two vertexes by using the Floyd algorithm. The steps of the Floyd algorithm are as follows:(i)Input the weight matrix of the complete weighted graph.(ii)For vertex and vertex , in the adjacency matrix, when we have , the data must be updated, using instead of . Repeat the steps until the shortest path is found and determine the shortest path matrix . Here, is the distance between vertex and vertex , namely, the determining weights of the edge (). Additionally, and represent the determining weights of edge and edge , respectively.

3.4. Index Classification

Index classification, in essence, is a partition of vertex sets in graph . In other words, we havewhere is the distance parameter, which is determined according to the actual situation.

4. Model of Index Screening Based on Principal Component Analysis

4.1. Selection of the Representative Element

According to the shortest distance relationship graph of all of the classes, the sum of the distance between one index and the other indexes can be calculated. The smaller the distance is, the closer the relationship is. The minimum distance index will be used as the representative element to be analyzed by the PCA. The representative elements for all of the classes are shown in Table 1.

4.2. Principal Component Analysis

The basic model of principal component analysis [9] is as follows:where is the index , is the principal component , and is the principal component load of index in principal component .

The concrete steps of selecting the index based on PCA are as follows:(i)Calculate the correlation matrix of the standardized data of the index.(ii)Calculate the eigenvalues and eigenvectors of the correlation matrix , variance contribution rate, cumulative contribution rate, and factor load of the principal components.(iii)Select the principal component and determine the number of principal components according to the eigenvalues or cumulative contribution rate.(iv)Screen the index according to the absolute value of the factor load of the principal component. The larger the absolute value of the factor load is, the more significant the influence of the index on the evaluation results is. Such an index should be retained.

5. Application Example

The construction of an environmental quality evaluation index system of decommissioning uranium tailing is taken as an example to realize the process of making and modifying the index system. Graph distance classification was used to classify the indexes, and the principal component analysis was used to select the indexes.

5.1. Selection of the Initial Evaluation Index

According to the evaluation purpose and considering the availability and integrity of the existing monitoring data, the construction of an environmental evaluation index system focuses on the pollution angle and selecting the seepage index and radiation index as the primary indexes. The seepage indexes include pH, , , U, Ra, ²³⁰Th, ∑Th, ²¹⁰Po, ²¹⁰Pb, Mn, NH₄-N, F⁻, , , Zn, and Cd. The radiation indexes include the radon concentration, radon exhalation rate, α aerosol, γ, surface α, and surface β.

5.2. Index Classification

This paper takes the seepage indexes of decommissioning uranium tailing as the research object, and the water monitoring data of six monoliths (A–F) of a decommissioning uranium tailing is targeted as the sample data. The original data originate from the environmental monitoring report of a decommissioning uranium tailing. The standardized data of the seepage indexes are shown in Table 1.

The standardized data of the seepage indexes were substituted in formula (2) to calculate the correlation coefficient, which is shown in Table 2. The distances between the indexes were the reciprocals of the correlation coefficients.

The vertexes of graph are determined according to the number of indexes. Let each index be a vertex of graph , and the vertex set is . Let be the weights of each edge , and the weight matrix is produced, where W is

Calculate the shortest path by the Floyd algorithm, using MATLAB programming, which is the shortest distance between each pair of indexes, as shown in Table 3.

Given , according to the shortest distance (Table 3), the vertex set is divided into six categories.which satisfies the conditions and .

According to the construction method of graph , six subgraphs (, , …, ) were obtained, as shown in Figure 2. The result of the index classification is shown in Table 4.

5.3. Index Screening

Taking as an example, the relation graph of the shortest distance in graph is shown in Figure 3. The sum of the distances between U and the other indexes is 3.296, and the other indexes (Ra, , and NH₄-N) have the sums 3.3977, 3.1888, and 3.2659, respectively. It can be seen that is minimized, which indicates that the relation is the closest. This finding is the reason why was chosen to be the representative element of graph . The other representative element is shown in Table 4.

According to the standardized data of the representative element, principal component analysis can be realized by SPSS [11]. There are two principles for the selection of the principal component: an eigenvalue greater than 1 and over 85% of the cumulative contribution rate. In this paper, the first and second principal components were extracted, as shown in Table 5. In the first principal component, we select the indexes for which the absolute value of the factor load is over 0.9. In the second principal component, we select the index with the largest absolute value of the factor load. Table 6 shows that pH and F⁻ have a higher load in the first principal component and has a higher load in the second principal component.

5.4. Determination of the Simplified Index System

According to the screening results of the PCA, the selected indexes have a significant impact on the evaluation results and are applied as the important indexes to construct the index system. Because the same types of indexes exist relativity often in the index classification, the same types of indexes also have a significant impact and were also used as important indexes.

The seepage indexes were classified into 6 classes through the index classification, and the representative elements were selected. The screening results of the PCA are shown in Table 6. The retained indexes were used as the important indexes of the evaluation index system, and the same class indexes as the retained indexes were also used as important indexes. The simplified index system of environmental quality assessment of decommissioning uranium tailing is shown in Figure 4.

6. Conclusions

(1)The seepage indexes were classified into 6 classes by using the graph distance classification: = {pH}, = {U, Ra, , NH₄-N}, = {, ²³⁰Th, ∑Th, ²¹⁰Pb, Mn}, = {²¹⁰Po, Zn, Cd}, = {F⁻}, and = {, }. The representative elements were selected, including pH, , ²¹⁰Pb, ²¹⁰Po, F⁻, and .(2)On the basis of the index classification, the representative elements were analyzed by PCA, selecting the retained indexes, including pH, , and F⁻. Considering the classification, there were 6 indexes that remained in the final evaluation index system, including pH, U, Ra, , NH₄-N, and F⁻, which is consistent with the analysis results on the main pollution sources of seepage in the environmental monitoring report.(3)In the graph distance classification method, the highly correlated elements were divided into one class. Then, the representative elements selected from the classification were analyzed by PCA, which can reduce the number of indexes when using PCA and the information processing workload and avoid repeated information analysis. According to the screening results of the PCA, the selected indexes were used as the important indexes with which to construct the index system, which indicates that the selected indexes had a significant impact on the evaluation results. At the same time, the same class indexes as the retained indexes also had a significant impact and were also used as important indexes, from which the loss of important indexes could be prevented. The method combined with graph distance classification and principal component analysis can make up for the deficiencies of PCA, which is suitable for multi-indexes and complex systems and favors the further application of PCA on the construction of index systems.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 51774187); the Natural Science Foundation of Hunan Province, China (No. 2018JJ3448), the Hunan Province Engineering Research Center of Radioactive Control Technology in Uranium Mining and Metallurgy and Hunan Province Engineering Technology Research Center of Uranium Tailings Treatment Technology (No. 2018YKZX1010), the Hunan Provincial Science and Technology Department Central Guide to Local Science and Technology Development Project Funded Project (No. 2017CT5006), the Key Research and Development Project of Hunan Province(No. 2017SK2280), the Key Research Foundation Projects of Hunan Education Department (Nos. 17A184 and 18B226).

References

H. Zhang and Z. Qiu-hong, “An economic indicator screening method based on fundamental principle of principal components analysis,” Journal of Shandong University of Finance, vol. 124, no. 2, pp. 52–61, 2013.
View at: Google Scholar
J. Zhang, “Pipeline risk assessment method based on principle component-clustering analysis,” Oil-Gas Storage and Transportation, vol. 33, no. 2, pp. 139–143, 2014.
View at: Google Scholar
Y. Nazzal, F. K. Zaidi, I. Ahmed et al., “The combination of principal component analysis and geostatistics as a technique in assessment of groUndWater hydrochemistry in arid environment,” Research Communications, vol. 108, no. 6, pp. 1138–1145, 2015.
View at: Google Scholar
X.-l. Wang, Z.-b. Wei, P. Shi-tao et al., “Comprehensive assessment model of liquid pipeline leakage consequences based on principle component analysis,” Journal of Safety Science and Technology, vol. 10, no. 5, pp. 84–89, 2014.
View at: Google Scholar
R.-y. Zhou, A. Zhong, R. Jin-zhou et al., “An accident forecasting method of ANN based on PCA and its application,” China Safety Science Journal, vol. 23, no. 7, pp. 55–60, 2013.
View at: Google Scholar
C. Guo-tai, T.-t. Cao, and K. Zhang, “The establishment of human all-around development evaluation indicators system based on correlation-principle component analysis,” Systems Engineering-Theory and Practice, vol. 32, no. 1, pp. 111–119, 2012.
View at: Google Scholar
F. Fu-gui, “Study on the algorithm and applications in graph theory,” Computer and Digital Engineering, vol. 40, no. 2, pp. 115–117, 2012.
View at: Google Scholar
Y.-j. MAO, “Floyd algorithm and MATLAB program realization of shortest path problem,” Journal of Hebei North University (Natural Science Edition), vol. 29, no. 5, pp. 13-14, 2013.
View at: Google Scholar
L. Jin-tao, X.-b. Li, F.-q. Gong et al., “Recognizing of mine water inrush sources based on principal components analysis and Fisher discrimination analysis method,” China Safety Science Journal, vol. 22, no. 7, pp. 109–115, 2012.
View at: Google Scholar
S. V. Mikoni, “Neural network approach to the formation models of multiattribute utility,” International Journal Information Models & Analyses, vol. 3, no. 1, pp. 3–9, 2014.
View at: Google Scholar
R.-l. Huang, Data Statistical Analysis: SPSS Principles and Applications, vol. 7, Higher Education Press, Beijing, China, 2010.

Copyright

Copyright © 2019 Keyou Shi et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies