Abstract
After a water inrush disaster occurs in the mine production process, it is urgent to identify the source of water inrush and formulate corresponding countermeasures in the complex hydrogeological condition of coal mines. Therefore, accurate identification of mine groundwater source is one of the keys to prevent mine water disasters. According to the difference between the hydrochemical compositions of three aquifers in Chengjiao coal mine, six primary ions (Na++K+, Ca2+, Mg2+, SO42-, Cl,- and HCO3-) were selected as the indexes for groundwater source identification. On this basis, a mathematical model for groundwater source identification was established by combining the analytic hierarchy process- (AHP-) entropy weight method and the set pair analysis (SPA) theory. Next, this model was used to identify the sources of 10 sets of water samples from the mine, and then, the identification results were compared with the results of conventional models established using Fisher discriminant analysis (FDA) and Bayes discriminant analysis (BDA) methods. The results show that the SPA-based model performs better in identifying the groundwater sources. Furthermore, the model was used to identify the source of water inflow in the No. 21304 panel. The analysis on identification results reveals that the area close to the F20 normal fault tends to receive water supply from the Ordovician limestone aquifer and the Taiyuan Formation limestone aquifer, so it should be regarded as a key area for water inrush prevention and control.
1. Introduction
China is the largest coal producer and consumer in the world. The coal mass production was 3.84 billion tons in 2020, and it kept increasing at the rate of about 0.9% according to the Energy Production Report (2020) released by the National Bureau of Statistics of China. At present, coal plays an important role in China’s economic development, accounting for 62% and 68.5% of China’s energy structure and energy consumption, respectively. However, water disasters happened frequently in the mining process due to complex hydrogeological conditions of Chinese coal mines [1]. Groundwater may inrush into the mine roadway suddenly when faults, mined-out space, and karst collapse columns are affected and broken during mining activities. The mine water inrush is ferocious, often engulfing the roadway in an instant and resulting in considerable casualties and economic losses. A total amount of 779 water disasters happened in China, which resulted in enormous casualties (3,831 deaths) and economic losses in the period 2000-2020 [2]. After a water inrush disaster occurs in mine production processing, it is urgent to identify the source of water inrush and formulate the corresponding control measures, and varied control measures should be made out for the different aquifer conditions based on water richness. For aquifers with less water, the water inrush hazards could generally be eliminated through temporary water bin construction in low-lying areas or enough water pump installation. However, for those with rich water aquifers, water inrush hazards were often controlled by grouting into aquifer’s fissures and transforming it into aquiclude or retaining water-resisting coal pillars [3]. In a word, rapid and accurate groundwater source identification is essential for reasonable selection and optimization of water control measures [4–6].
It is an essential step to reasonably determine the weight of each identification index in groundwater source identification [7–9]. While the above methods we mentioned promoted the identification accuracy, they still have some restrictions to comply with, for instance, the certain random weight assignment and fuzzy indexes were generally generated based on artificial experiences. Researches that were conducted by using this method could neither overcome the complicacy and fuzziness of the multiaquifer system nor fully represent the hydrochemical characteristics of the aquifers and increase the inaccuracy of the identification results. In order to solve this problem, this paper combined the AHP- (analytic hierarchy process-) entropy weight method and the SPA (set pair analysis) theory to establish a new groundwater source identification model. To be specific, the objective weight was calculated based on information entropy, while the subjective weight was calculated by the means of AHP. Then, the two weights were combined for calculating a comprehensive weight for each identification index. This weighting method not only reflects the knowledge and experiences of experts but also avoids the subjectivity of traditional experience-based methods [10–12]. It is successful to ensure the scientific nature and comprehensiveness of the weight of mine water source identification indexes.
The SPA theory, proposed by a Chinese scholar named Keqin Zhao in 1989, is a systematic analysis method for uncertain issues [13, 14]. It has been applied in various fields, such as building sustainable performance [15], disease hazard [16], efficacy of medicine, tourism resources [17], urban ecosystem [18], information technology [19], water environment [20, 21], and water resource system [22]. It focuses on the relationship between the accurate and inaccurate features of two related data sets and establishes the relationship between them in a mathematical form, and this form could be identified by identity-discrepant-contrary coefficients. There generally exist several aquifers that are capable of supplying inrush water to be identified. These aquifers could hardly be identified accurately through a single index due to their significantly varied hydrochemical characteristics and obsolete boundaries. This problem could be effectively solved by SPA. A SPA theoretical mathematical model was established to promote the accuracy of groundwater source identification based on the characteristic values of water samples from potential aquifers.
In this work, a mathematical model for mine water source identification was proposed based on the analytic hierarchy process- (AHP-) entropy weight method and the set pair analysis (SPA) theory. In addition, water samples extracted from different aquifers in Chengjiao coal mine were screened for excluding abnormal data by the means of the Piper trilinear diagram and the cluster analysis. Finally, characteristic values of water samples from potential aquifers were determined. This practical study not only provides a positive reference for identifying groundwater sources but also lays an important foundation for optimizing water disaster prevention and control schemes.
2. Materials and Methods
2.1. The AHP-Entropy Weight Method
The AHP-entropy weight method, a weighting method that combines the objective weight with the subjective weight, varied with the conditions of the object to be evaluated. It comprehensively considers subjective and objective situations to ensure the rationality of the assignment of index weight. The objective weight of each index was calculated by using the entropy weight method. According to the definition of entropy, the entropy of the th () index could be expressed as where is the total number of samples; is the total number of indexes; ; is the quantity value of each index of each water source; and if , then . The entropy weight value of the th index could be calculated by
The subjective weight of each index was obtained by means of AHP. First, the importance of indexes was compared to establish a judgment matrix. Meanwhile, the characteristic equation was solved: where is the maximum eigenvalue of the judgment matrix ; is the eigenvector of . Then, the subjective weight vector of the th index could be obtained after normalizing the eigenvector :
Finally, the comprehensive weight of the th index could be determined by combining the objective weight with the subjective weight:
2.2. The SPA Theory
SPA is a systematic analysis method to deal with the problem of uncertainty in nature. Its core idea is to use dialectical analysis (identity, discrepancy, and contradistinction) for describing the uncertainty of things, which means to describe the uncertainty with a certain degree of connection [23]. In the process of mine groundwater source identification, it is assumed that the degree of connection between the set and the set is expressed by , and the two sets constitute the set pair , and . The degree of connection could be expressed by a mathematical expression:
Equation (6), referred to as the ternary degree of connection, is the basic formula of SPA. , , and in the equation, commonly known as three components of the degree of connection, are the identity coefficients, the discrepant coefficient, and the contrary coefficient, respectively. In order to adapt to the complexity, ambiguity, and comprehensiveness of mine groundwater source identification, Equation (6) could be extended into where and are the coefficients of the water source type’s left and right adjacent intervals; and are the coefficients of the water source type’s secondary left and right adjacent intervals; , , , , and .The intervals of mine groundwater source identification are interpreted in Figure 1. The whole identification interval was equally divided into three parts, namely, the water source type’s membership interval (1/3 in total), the adjacent intervals (1/3 in total), and the secondary adjacent intervals (1/3 in total). That is, the left adjacent interval, secondary left adjacent interval, right adjacent interval, and secondary right adjacent interval account for 1/6 of the whole identification interval, respectively.

where and are the lower and upper limits of a certain index in the water source type’s membership interval, respectively; and are the lower and upper limits of this index in the water source type’s adjacent intervals, respectively.
Assuming that the water sample to be evaluated is , then the calculation formulas for the connection degree components of ’s th index are displayed as With the aid of the components of connection degree, the set pair trend between and a certain water source type’s identification interval could be calculated by where is the comprehensive weight of ’s th () index; is the set pair trend. When , and the water source type’s identification interval share the same trend in the dialectical relationship; and the larger the value of , the stronger the trend. Thus, ’s water source type could be determined by Equation (10).
3. The Geological Setting
3.1. Study Area
Chengjiao coal mine is located in Yongcheng City, Henan Province, China (Figure 2). Overall, the mine’s high altitude is in the north and the west and lower in the south and the east. The surface water system is poorly developed, and vertical infiltration of atmospheric precipitation is the main source of groundwater supply here. The main mining coal seam, i.e., the II2 coal seam, is located in the Lower Permian Shanxi Formation. The main sources of groundwater are the Shanxi Formation sandstone aquifer, the Taiyuan Formation limestone aquifer (in the upper section), and the Ordovician limestone aquifer. Besides, a fully developed fault destroys the continuity of the aquiclude and facilitates hydraulic connections between aquifers, which immensely impedes water control in the mine.

3.2. Major Hydrological Problem
The No. 21304 panel, whose plane position is exhibited in Figure 2, is the first mining face in the south wing of the mine. The F20 fault lies outside the panel, and its profile section is shown in Figure 2. Since the fault throw is greater than 400 m, the top surface of Ordovician limestone in the fault footwall rises to an elevation of about -510 m which is far higher than the current coal seam elevation (-880 m) of the panel. If the fault is able to conduct aquifers’ water, Ordovician limestone water near the opposite and foot of the fault is likely to flow into the panel through the fractures and cracks and pose considerable threats to mining operation safety. Hence, the groundwater source is necessary to be identified accurately in the panel, so that the corresponding water control measures could be formulated to ensure safe mining in this region.
4. Results and Discussion
4.1. Establishment of an Identification Model
To identify the groundwater source accurately in the No. 21304 panel, the AHP-entropy weight method and the SPA theory were combined to establish an identification model through the NumPy library of Python software. Figure 3 shows the workflow of the identification model.

The specific steps are as follows:
Step 1. Sampling.
Hydrochemical characteristics, especially main ion contents (Na++K+, Ca2+, Mg2+, SO42-, Cl-, and HCO3-), are the basis of establishing a groundwater source identification model. In this study, a total of 47 water samples (1 to 47 in Supplemental File) were extracted from different sampling sites of the three aquifers to establish the model.
Step 2. Exclusion of abnormal data.
The water samples were screened by means of the Piper trilinear diagram and the cluster analysis to exclude exceptional samples. In this way, the remaining ones could faithfully reflect the characteristic values of corresponding aquifers.
The Piper trilinear diagram, first proposed by Piper in 1944 [24], indicates the contents of six primary ions Na++K+, Ca2+, Mg2+, SO42-, Cl-, and HCO3- in a water sample. Corresponding analysis could be conducted according to the distributions of the six ions. The Piper trilinear diagram could illustrate the hydrochemical characteristics of groundwater through the relative compositions of the chemical components [25].
The Piper trilinear diagram was plotted based on the data of 47 water samples (Figure 4). As shown in Figure 4, the result demonstrated all water samples belong to NaSO4-type water, and the cationic compositions of the three water source types were quite different. Sample 10 (from the Ordovician limestone aquifer) and sample 45 (from the Shanxi Formation sandstone aquifer) have notably different hydrochemical characteristics, as marked by the red oval area.

The cluster analysis refers to the analysis process of grouping a set of physical or abstract objects into multiple classes composed of similar objects [26]. It classified the research objects (samples or indexes) according to their characteristics to exclude abnormal objects. Specifically, each water sample is regarded as a vector comprising indexes, and the space composed of -dimensional vectors is approximated as the distance space [27]. Under the condition that other factors exert a limited effect, the groundwater samples from the same source or with the same hydrochemical characteristics could be classified into one category because of the relatively short distance between them, while those from different sources or with different hydrochemical characteristics could be classified into different categories.
In this paper, the cluster analysis was also conducted based on the data of 47 water samples (Figure 5). The cluster analysis results also reveal the distinct hydrochemical characteristics of samples 10 and 45 from the Piper diagram as those of other samples. This confirms that the two exceptional samples should be excluded from further analysis.

Step 3. Establishment of an index system.
With the contents of six ions Na++K+, Ca2+, Mg2+, SO42-, Cl-, and HCO3- in the water samples taken as the indexes, the index set named as is established.
The evaluation set of water source types for the identification model, i.e., , was established in accordance with the three water sources types (represented by I, II, and III, respectively).
Step 4. Weight assignments for the indexes.
The objective weights of the indexes were calculated based on the mass concentration data of six indexes Na++K+, Ca2+, Mg2+, SO42-, Cl-, and HCO3- of the remaining 45 water samples (abnormal samples excluded). Next, the subjective weights and the comprehensive weights of the indexes were calculated through Equations (4) and (5) in the AHP-entropy weight method. The calculation results are listed in Table 1.
Step 5. Groundwater source identification based on SPA.
The box plot reflecting the mass concentration variations of various ions in different aquifers was drawn based on the remaining 45 water samples (Figure 6). The box plot is a statistical method that reflects the distribution characteristics of the original data [28]. The standard of response data characteristics of the box plot is based on the quartiles and the interquartile range. The quartiles have certain resistance. Up to 25% of the data could become arbitrarily far away without greatly disturbing the quartile. Using the upper and lower quartiles of the box plot as the water source discrimination interval could objectively reflect the hydrochemical characteristics of aquifers [29]. The upper quartile and lower quartile of the box plot are set as the lower limit and upper limit of a certain index in the water source type’s membership interval, and then, the corresponding and were calculated through Equation (8). The results are given in Table 2.

Then, the components of connection degree and the set pair trends between each water sample and each water source type’s identification interval could be calculated through Equations (9) and (10) in the SPA theory we mentioned above. The water source to which the sample belongs is determined by comparing the set pair trends.
4.2. Model Validation and Analysis
The established identification model was utilized to identify another 10 samples ( to ) taken from Chengjiao coal mine for verifying its accuracy. The samples to be identified and the identification results are exhibited in Table 3. Shanxi Formation sandstone water, Taiyuan Formation limestone water, and Ordovician limestone water were represented by I, II, and III, respectively.
A comparison between the identification results and the actual water source types indicates that for all the 10 samples, the identification results and the actual situation were completely consistent, which means the model achieves 100% identification accuracy. However, the values of and in the set pair trends of water sample were very close, which will reduce the confidence of the final identification result. There are two possible reasons for this phenomenon: One is the water sample may come from mixed water of more than one aquifer, which leads to an increase in the connection degree between some indexes of multiple aquifers in the SPA-based identification model, then resulting in the similar set pair trends calculated. On the other hand, the model employs too few water samples in each aquifer as the basis to fully reflect the hydrochemical characteristics of the aquifer, therefore increasing the error of the identification results. Additionally, FDA and BDA were also employed for groundwater source identification and their identification accuracy rates were 70% and 80%, respectively. To sum up, the SPA-based identification model performed better in identifying the sources of water samples.
4.3. Determination of the Source of Water Inflow in No. 21304 Working Face
A water sample collected from the No. 21304 panel was identified by using the model established above. The sample and identification result are disclosed in Table 4.
According to the identification result, there is no doubt that the Ordovician limestone aquifer water was the major source of inflow of the No. 21304 panel and that the Taiyuan Formation limestone aquifer water may be highly related to the inflow water. However, there is only a minuscule chance that the Shanxi Formation sandstone aquifer water relates to the inflow.
It is assumed that the three aquifers all contribute to water inflow in the No. 21304 panel to further verify the accuracy of the above identification results in this paper. According to the principle that the content of each ion component of the mixed solution is constant, the element composition and content remain unchanged after the water of the three water source types is mixed to form mine water. The migration of groundwater is accompanied by ion exchange, which will lead to the alternating adsorption of Na++K+ and Ca2++Mg2+ by rocks, resulting in the change in the concentration of Na+, K+, Ca2+, and Mg2+ in water [30]. Meanwhile, the content of HCO3- in water is also easily affected by other ions and pH. However, SO42- and Cl- whose contents are less affected by other ions and pH will not be adsorbed in rock mass and soil, so they are relatively stable ions in groundwater. Based on this premise, the mass balance equation of SO42- and Cl- could be established to calculate the relative proportion of the sources of water inflow. where , , and are the proportions of the Shanxi Formation sandstone aquifer water, the Taiyuan Formation limestone aquifer water, and the Ordovician limestone aquifer water of total panel inflow, respectively.
Solving Equation (11) yields , , and . The results show that the Ordovician limestone aquifer water accounts for 66.33% of total water inflow, while the Taiyuan Formation limestone aquifer and the Shanxi Formation sandstone aquifer water account for 33.25% and 0.42% of water inflow, respectively. This is basically consistent with the results of the groundwater source identification model based on the AHP-entropy weight method and the SPA theory, so the results of the model are reliable.
Specific water inrush controlling measures should be formulated for the three aquifers due to their varying water abundances. Although the Shanxi Formation sandstone aquifer is the closest to the II2 coal seam, its maximum unit water inflow is only 0.07 L/s·m. Considering its poor water abundance, its threat of water disaster could generally be eliminated through measures such as temporary water bin construction in low-lying areas and water pump installation. In contrast, the Taiyuan Formation and the Ordovician limestone aquifer possess abundant water, with their maximum unit water inflows being 2.87 L/s·m and 3.56 L/s·m, respectively. As a result, their water disasters are often controlled by measures such as aquiclude grouting reinforcement and water-resisting coal pillar retention.
Accordingly, the Ordovician limestone aquifer water was the major source of inflow of the No. 21304 panel. In the follow-up production, the area of the No. 21304 panel that is close to the F20 fault should be regarded as the focus of attention. In this area, safer and stricter mining methods, as well as more reliable and effective prevention and control measures, such as aquiclude reinforcement by grouting and water-resisting coal pillar retention, should be adopted to ensure production safety in the panel.
5. Conclusions
The identification of mine groundwater source is a complex problem. Under the influences of multiple factors, it is difficult to identify the groundwater sources accurately in the coal mining processing. Quick and accurate identification of mine groundwater source is of great significance to the prevention and control of water inrush accidents in coal mines. Thus, a reliable method is urgently needed to solve this problem.
In this study, the hydrochemical characteristics of water sources from different aquifers in Chengjiao coal mine were determined by means of the Piper trilinear diagram and the cluster analysis, and ions in the groundwater were analyzed by combining the AHP-entropy weight method and the SPA theory. On the basis of the analysis results, a mine groundwater source identification model was established, and the identification reliability of the model was verified. The verification results were showed that the model based on the AHP-entropy weight method and the SPA theory performs better in identifying the groundwater source compared with those established using FDA and BDA methods.
After systematical analysis of the established model, the source of water inflow in the No. 21304 panel was identified, and the primary source was revealed. The analysis on the identification results reveals that the area close to the F20 fault tends to receive water supply from the Ordovician limestone aquifer and the Taiyuan Formation limestone aquifer, so it should be regarded as the key area for mine water inrush prevention and control.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare no conflict of interest.
Acknowledgments
This work was supported by the Chengjiao Coal Mine in Henan Province, China. This research was financially supported by the National Key Research and Development Program of China (No. 2019YFC1805400), the National Nature Science Foundation of China (42172272), the Fundamental Research Funds for the Central Universities (No. 2020ZDPY0201), and the National Science Foundation of China (No. U1710253).
Supplementary Materials
The tables of 47 water samples from different aquifers (Line 187), classification function coefficients of the FDA method (Line 272), and classification function coefficients of the BDA method (Line 272) are in the supplementary files. (Supplementary Materials)