Abstract
It is difficult to form a method for recognizing the degree of infiltration of a tunnel lining. To solve this problem, we propose a recognition method by using a deep convolutional neural network. We carry out laboratory tests, prepare cement mortar specimens with different saturation levels, simulate different degrees of infiltration of tunnel concrete linings, and establish an infrared thermal image data set with different degrees of infiltration. Then, based on a deep learning method, the data set is trained using the Faster R-CNN+ResNet101 network, and a recognition model is established. The experiments show that the recognition model established by the deep learning method can be used to select cement mortar specimens with different degrees of infiltration by using an accurately minimized rectangular outer frame. This model shows that the classification recognition model for tunnel concrete lining infiltration established by the indoor experimental method has high recognition accuracy.
1. Introduction
With the rapid development of China’s transportation industry, it is estimated that the total operational mileage of rail transit in China will reach 8565 km by the end of 2020, with a significant portion of the rail being in underground or above ground tunnels. However, during operation, many tunnels have, in varying degrees, problems with leakage, lining cracking, and voids [1]. In the field of geotechnical engineering, many disasters are caused by water [2, 3], as water not only reduces the stability of the tunnel lining structure which reduces the strength of the lining but also causes traffic accidents due to slick surfaces (hydroplaning) and ice on pavements [4]. Statistically, 28.4% of railway tunnels and 30% of highway tunnels in China have serious water leakage, and approximately 30% of urban subway tunnels have water leakage damage. The issue of how to detect tunnel leakage is a problem that China needs to address in the coming decades.
At present, the main tunnel leakage detection method is manual inspection. This is mainly based on the results of visual observations, which are greatly influenced by human factors and have problems of low efficiency and poor accuracy. Furthermore, because many water leakages are at the top of the tunnel and the waist of the arch, examiners need to stand on a lifting platform and cooperate with other departments, which costs manpower and presents a safety risk with traffic below [5, 6].
Distributed optical-fiber temperature sensor (DOFTS), ground-penetrating radar (GPR), and infrared thermography (IRT) are nondestructive testing methods that have been used in geotechnical engineering in recent years [7–10]. By monitoring whether the temperature field around a monitoring point changes, DOFTS can be used to determine whether there is leakage around that point [11, 12]. However, this result is only qualitative; the degree and area of leakage cannot be obtained, and it is more difficult and costly to arrange the optical fibers [13–15]. GPR is an electromagnetic technology that uses antennas to transmit and receive high-frequency electromagnetic waves in order to detect the characteristics and distribution of underground structures [16]. When used for leakage detection, the presence and extent of leakage in the tunnel lining can be judged by comparing the signal reflection intensity in the radar spectrum [17]. However, it is suitable for a large amount of leakage behind the lining, the detection efficiency of this method is low in practical applications, and it cannot meet the requirements of rapid leakage detection for cable tunnels [18]. As a fast, quantitative, and nondestructive testing technology, IRT has been widely used [19, 20]. In tunnel leakage detection, the temperature difference observed in infrared images taken of leakage and nonleakage areas is used to evaluate leakage [21]. Asakura and Kojima used a vehicle-borne infrared camera for preliminarily detection of water leakage from a tunnel lining, which confirmed the feasibility of using IRT to detect tunnel lining water leakage. However, the key problem of IRT applied to tunnel detection is how to recognize objects in images. Traditional image processing methods include image preprocessing, image segmentation, feature extraction, object recognition, and structural analysis [22], but it is difficult in feature construction and feature selection and needs to be combined with manual correction.
Recently, the use of complex deep learning (DL) models has received significant attention [23]. DL is a method based on representational learning of data in machine learning. Unlike traditional machine learning, deep learning models can construct complex advanced features from low-level features, so as to automate the feature construction process of current problems and effectively solve the problem of processing of large amounts of detection data [24, 25]. At present, in the field of civil disease detection, some research based on deep learning has been carried out both in China and internationally. Hoang et al. [26] established and compared the performance of two intelligent approaches for automatic recognition of pavement cracks. The first model relies on edge detection approaches of the Sobel and Canny algorithms; the second model is constructed by the implementation of the Convolution Neural Network (CNN). Experimental results show that the model based on CNN achieves a better prediction performance than the method based on the edge detection algorithms. Kumar et al. [27] point out the closed-circuit television (CCTV) has been commonly utilized for sewer pipe inspection, but such a process requires a large amount of image preprocessing and the design of a complex feature extractor for certain cases. The feature extraction method uses preengineered features for classifying images, leading to poor generalization capabilities. For this problem, a method based on the deep convolutional neural network is proposed to detect and classify defects from CCTV inspections and achieves a better prediction performance. Dung and Anhb [28] proposed a crack detection method based on a deep fully convolutional network (FCN) for semantic segmentation on concrete crack images, and the FCN network achieves about 90% in average precision. Cha and Choi [29] used a deep CNN to detect concrete cracks. Under different conditions (such as strong spots, shadows, and very thin cracks), the recognition rate reached 98%. Compared with traditional Canny and Sobel edge-detection methods, their result demonstrated that the deep learning method can better solve the problem of concrete crack identification. Chen and Jahanshahi [30] proposed a deep learning model based on a CNN and a naive Bayesian data fusion scheme (called NB-CNN). The CNN was used to detect concrete cracks in each video frame, and the naive Bayesian decision effectively eliminated errors. This framework achieved a 98.3% hit rate. Xue and Li [31] used a fully convolutional network (FCN) model for classification. Comparing with a traditional method, the results show that the model is very fast and efficient, allowing automatic intelligent classification and detection of tunnel lining defects.
The method of tunnel diseases detection based on deep learning is advantageous because of its automatic construction features, fast recognition speed, and high accuracy. According to the Technical Specification of Maintenance for Highway Tunnel JTG H12-2015, the tunnel leakage diseases are divided into no leakage, infiltration, dripping, gushing, and spraying water. At present, most research on the detection of tunnel leakage water diseases directly analyzes an image of the scene, and the grey-white binary method is used to detect the leakage and the leakage area. Quantitative detection of the degree of leakage (infiltration) in a tunnel lining has not yet been reported. Therefore, this paper proposes a recognition method to obtain infrared radiation characteristics from concrete samples with different degrees of saturation by indoor experiments. The recognition model is established using a deep learning method (Faster R-CNN+ResNet101 network model), which shows the feasibility and validity of the recognition method.
2. Recognition Theory and Model
2.1. Classification Recognition Method for Infiltration Degree of Concrete Lining
There is a clear nonlinear relationship between the radiation intensity of an infrared thermal image and the degree of infiltration (i.e., saturation) of a tunnel concrete lining. Furthermore, the infrared radiation is different for different degrees of infiltration. In order to study this relationship, the data set is composed of infrared thermal images and infiltration degrees, in which is the set of collected infrared thermal images and represents the infrared thermal image of a certain infiltration degree; is a set of characteristic matrices of infrared thermal images (such as grayscale, geometric features, and texture features) and fk is a characteristic matrix; and is the infiltration degrees of the concrete lining. The entire infiltration interval is divided into subintervals , where denotes the th infiltration subinterval, i.e., the infiltration degree. Thus, covers all possible values of the degree of infiltration for a tunnel concrete lining. This method of expressing the saturation extent provides completeness and feasibility for subsequent identification tasks.
In order to establish the recognition model, firstly, infrared thermal images of the tunnel lining under different infiltration degrees are collected, and then the features of the infrared thermal images are automatically extracted using a deep learning algorithm. The correlation between infrared image radiation characteristics and the infiltration degree is established, that is, the classification recognition of the degree of tunnel lining infiltration based on infrared radiation characteristics is realized (Figure 1). Using this model, the infrared image of the unknown infiltration degree can be acquired in real time and then inputted to recognize the actual degree of infiltration in the tunnel lining. The degree of infiltration of the tunnel lining can be recognized rapidly and automatically, so the extent of infiltration damage in the tunnel can be evaluated.

2.2. Selection of Recognition Model Based on Deep Learning
After Krizhevsky et al. proposed the AlexNet network recognition model [32], deep learning in computer vision has been widely used in image classification, object detection, image segmentation, image question and answer, image description, image generation, and other fields. The classification and recognition of the degree of infiltration of tunnel concrete lining belong to the problem of object detection in the field of computer vision, so this paper uses the image detection network model to study the classification of the degree of infiltration of tunnel concrete lining.
At present, image detection algorithms can be divided into two categories: (1) two-stage detection algorithms, which divide the detection problem into two stages, and (2) one-stage detection algorithms, which directly generate the class probability and position coordinate values of objects in a single stage. In two-stage detection algorithms, first, region proposals are generated and then classified (generally, location refinement is also needed). Prototypical two-stage detection algorithms are those in the R-CNN series, such as R-CNN, Fast R-CNN, and Faster R-CNN [33–35] In contrast, one-stage detection algorithms do not need to generate region proposals. Typical one-stage detection algorithms are YOLO and SSD [36, 37]. Generally speaking, two-stage algorithms are more accurate, regardless of the cost of time and space, while one-stage algorithms are quicker if the time cost (speed) and space cost (memory consumption) are considered.
Because this paper is a preliminary study, the accuracy of detection is the first consideration, and the second is the detection speed and space. Moreover, in the tunnel field detection, it is not necessary to obtain the results in real time, so infrared images can be collected first and then processed; this study chose a two-stage detection algorithm (one-stage detection algorithms have poor ability to detect small targets, which is not conducive to future extended applications). Compared with the previous model, in the R-CNN series, Faster R-CNN shares convolution layer computing (Figure 2) in feature extraction. It integrates the Region Proposal Network (RPN) layer, which replaces the off-line Selective Search (SS) module, greatly reducing the time consumed by image recognition and eliminating the performance bottleneck. Additionally, Faster R-CNN implements an end-to-end training mode.

In the development of a network model, researchers generally believe that the deeper the network layers are, the more abstract the extracted features are, and the more semantic information they have, therefore the higher the accuracy is. However, with the deepening of the network, the accuracy of the training set decreases. In order to solve this problem, He et al. proposed the residual network [37] and achieved 3.57% top-5 error in the ImageNet challenge in 2015. It differs from the ordinary network in that it adds a shortcut connection, that is, the identity, so that the deep network can learn the characteristics faster and more easily. In order to ensure the detection accuracy and speed of the model, the Faster R-CNN+ResNet101 network model was applied after comparing the performance results of various tests (Table 1). The development framework adopts the TensorFlow open-source framework developed by Google.
3. Acquisition and Establishment of Image Data Set
The establishment of the image data set is the first step of deep learning recognition and plays an important role in model construction. A sufficient image data set can improve the abstract expression ability of the network model and can increase the robustness of the network model to the data and avoid overfitting of the model [38].
3.1. Acquisition of Image Data Set
As cement mortar material is similar to tunnel concrete lining material and has similar infrared radiation emissivity, the degree of infiltration of the tunnel lining was simulated by preparing several cement mortar material samples with different saturation levels. Acquisition of the image data set involved three laboratory tests: the preparation of cement mortar test, the preparation of cement mortar with different saturation levels, and the infrared image acquisition test.
3.1.1. Cement Mortar Specimen Preparation Test
In order to best simulate the surface of the tunnel concrete lining, for the cement mortar, the proportion of cement to sand to water was set to 1 : 3.19 : 0.6 by weight. The preparation steps were as follows: (1) mixing the cement mortar evenly according to their weight ratios and (2) putting the mixed mortar into the mold and curing it in the concrete steaming room for 28 days. A total of 36 specimens were made. The specimen size was .
3.1.2. Preparation of Cement Mortar Specimens with Different Saturation Levels
Considering the practical application value of tunnel infiltration detection, this paper divides the degree of tunnel infiltration into four levels: dry, semidry, semiwet, and wet, with corresponding saturation levels of 0–5%, 5–60%, 60–90%, and 90–100%, respectively. The 36 specimens were divided into nine groups, with four pieces in each group numbered 1, 2, 3, and 4 corresponding, respectively, to the four saturation levels.
The cement mortar specimens with different saturation levels were prepared in three steps. (1) The cement mortar test pieces were placed in the vacuum drying box (Figure 3(a) State Key Laboratory of Deep Geotechnical Mechanics and Underground Engineering, Beijing, China) for 24 h. They were then taken out to be weighed on the balance, with the weight recorded after the sample’s temperature was sustained at room temperature (22°C) for 12 h. The No. 1 sample of each group was placed into the sealing bag. (2) Sample nos. 2, 3, and 4 were placed into a water tank (Figure 3(b) Keheng Instrument Equipment Co., Ltd., Shanghai, China) boiling at an electronically controlled constant temperature for 8 h so as to reach a state of saturation, taken out after the test piece cooled, and weighed on a balance after the free water on the surface disappeared. Sample no. 4 from each group was placed into a sealed bag. (3) The remaining samples, sample nos. 2 and 3, were placed into the gaseous water adsorption intelligent test system for deep soft rock (Figure 3(c) State Key Laboratory of Deep Geotechnical Mechanics and Underground Engineering, Beijing, China). An evaporation experiment was conducted to prepare samples with saturation levels of 5–60% and 60–90%, and then the samples were placed into a sealed bag (Table 2).

(a)

(b)

(c)
3.1.3. Infrared Image Acquisition Experiment
Tau 640, an uncooled long-wave infrared thermal imager manufactured by FLIR of the United States, was used to collect infrared thermal images. It contains a highly sensitive microthermal infrared sensor. With 17 μm vanadium-oxide focal plane array pixels, high-definition infrared thermal images can be generated. In order to improve the infrared image acquisition, four specimens were collected in groups during the acquisition process (nine groups in total). Images of the front, back, left, right, top, and bottom sides of the specimens were collected twice on each side with a time interval of 30 s (twelve infrared images were collected for each group, 108 infrared images collected). In order to improve the generalization ability of the data, one each of the no. 1, 2, 3, and 4 specimens were randomly selected from the 36 total specimens and reassembled into a group to collect their infrared images, totaling three groups (collect 36 infrared images). A total of 144 infrared images were collected over the whole experiment. The image size is pixels. The experimental acquisition system is shown in Figure 4.

3.2. Specimen Labelling
After the 144 images collected in the experiment were screened to remove some unclear images which may be because the infrared camera does not focus well, 121 infrared images remained. Due to the use of the supervised learning method for model training, it was necessary to label the images to determine the target content of the model images. Labellmg was used to set image labels; 0–5% was labelled as I, 5–60% as II, 60–90% as III, and 90–100% as IV, representing dry, semidry, semiwet, and wet infiltration degrees, respectively (Figure 5). Before deep network training input, each image was cut into four pictures according to different annotated rectangular frames to form a total of 484 image sample data sets. As the sample data sets were small, they were divided according to the approximate proportions of the training set to the verification set 7 : 3, forming 339 images in the training set and 145 images in the verification set.

4. Training and Effectiveness of the Recognition Model Based on Deep Learning
4.1. Training Process
The training phase is an important process for automatically determining the weight parameters of the deep convolution network model. An Intel(R) Xeon(R) Bronze 3104 @ 1.70 GHz, 6-core 12-thread processor, with 32 GB RAM and three NVIDIA GEFORCE GTX 2080TI GPUs, was used for this training. Firstly, the training set data is input into the deep learning network model and stops after 300,000 iterations. The whole process takes 22 hours (using one GTX 2080TI), and the learning rate was set to 0.0003. Loss function can be used to show the difference between the predicted value and the actual data. The crossentropy loss function was selected during the training, with the results presented in Figure 6. Figure 6(a) presents the bounding box classification loss. This mainly judges the accuracy of the target categories (I, II, III, and IV) of the extraction area. The smaller the value, the higher the recognition accuracy. It can be seen from Figure 6(a) that the convergence rate is approximately 50,000 steps and the classification loss is approximately 0.0707. Figure 6(b) presents the bounding box localization loss, which mainly judges the accuracy of the minimized rectangular outer frame in the picture. The smaller the value, the more accurate the box selection target is. From Figure 6(b), it can be seen that approximately 100,000 steps were required for convergence, and the localization loss is approximately 0.4018. Figure 6(c) shows the total loss function, which is the sum of all the loss functions, with convergence achieved after approximately 200,000 steps, with a total loss of approximately 0.6129. At the same time, in order to effectively control the influence of gradient explosion and gradient disappearance in the training process, a gradient threshold was set in the training process so that the final total loss function can be reduced to satisfactory results.

(a)

(b)

(c)
4.2. Analysis of Model Effectiveness
4.2.1. Detection Accuracy
Detection accuracy is the critical index used to measure the quality of trained models. In order to evaluate the trained model, the verification set data (145 infrared images) were used to detect the model, and the test results were analyzed. The specific test results are shown in Figure 7, where I, II, III, and IV indicate infiltration degree, and 99% is the recognition accuracy on the condition that intersection over union (IoU) equals 0.5 (IoU is the result of dividing the overlapping part of two regions by the set part of two regions). It can be seen that the shapes and outlines of the specimens are all accurately selected by accurately minimized rectangular outer frames, and the infiltration degrees of the different specimens are also accurately calibrated. The detection accuracy is evaluated in terms of the mean average precision (mAP, P stands for precision accuracy, AP is the average accuracy rate of a single category label which is the average of the maximum accuracy rate in each recall rate, and mAP represents the average accuracy rate of all class labels.) with IoU thresholds of , , and , which have values of 0.99, 0.99, and 0.95, respectively. Thus, the recognition effectiveness is very good, which shows that the recognition method for classification of the degree of tunnel concrete infiltration based on deep learning is feasible, and the recognition accuracy is high.

(a)

(b)

(c)

(d)
4.2.2. Detection Speed
Another important performance index is the detection speed, and the speed of model detection is key to detection in real time. The time required to process a picture is often used to evaluate the detection speed. Here, during model testing, the annotation of the bounding box takes 270 ms, and classification of the results takes 50 ms, giving a total time of 320 ms, and the recognition speed is faster. It should be noted that the total detection time is based on the computer configuration mentioned earlier and may vary for different processor specifications.
4.2.3. Robustness Analysis
Robustness refers to how the control system maintains some level of performance under certain parameter perturbations (e.g., in structure or size). In deep learning, robustness is often used to assess the quality of the trained model. In this test, the robustness of the trained model was tested using transformations of the original data set image. Through the test results, it was found that the model can learn the geometric features, texture features, and local features (such as distortion, extrusion, size, and edge transformation) and has strong adaptability (see Figures 8(a)–8(d)).

(a)

(b)

(c)

(d)
4.2.4. Model Comparison
Different CNN network models based on Faster R-CNN were selected for comparison, and the results are shown in Table 3. The results demonstrate that the difference between mAPs for and of the two models is the same, while the difference between mAPs for is large. This shows that Faster R-CNN+Resnet101 is more accurate in location and detection than the another model. The detection speed of Faster R-CNN+Resnet101 has been significantly improved. The time to recognize a single image is 320 ms, which is much lower than the Faster R-CNN+Inception2 network model. In summary, the Faster R-CNN+Resnet101 network model selected in this paper performs better when considering detection accuracy and speed.
5. Discussion
In this paper, a classification recognition method for the degree of tunnel lining infiltration was established via laboratory experiments. The recognition model was used to identify cement mortar specimens with different degrees of infiltration, and the results are good, which show that the method is effective. However, there are some problems in this study. It should be noted that, although the experimental method can ensure the quality of the training data set, the laboratory environment is relatively isolated and the test conditions are relatively stable, which is quite different from the complex environment of a tunnel site. In a tunnel, wind, light, and lining roughness will affect the acquisition of the infrared image. Therefore, it is necessary to further develop the recognition model based on complex laboratory environment including wind, environment temperature, lining roughness, and so on. In the field of tunnel detection, people are more concerned about the leakage of the tunnel rather than the infiltration grade of the tunnel, so we may apply it to the determination of the water content of rock in the geotechnical field. The strength of the rock is closely related to its water content; many geological disasters are caused by water. In engineering, engineers often want to know its water content to judge the strength of the rock, so we apply this method to it which may be more significant.
6. Conclusion
In this paper, cement mortar specimens with different levels of saturation were made to simulate different degrees of infiltration of tunnel concrete lining via laboratory experiments, and a data set of infrared thermal images with different degrees of infiltration was established. Finally, based on a deep learning method, the Faster R-CNN+ResNet101 network was used to train the data set, the recognition model was established, and its effectiveness was tested. The following conclusions were obtained.(i)The recognition model established using the deep learning method displayed good recognition ability for cement mortar specimens with different degrees of infiltration. The specimens with different infiltration extents were selected by accurately minimized rectangular outer frames. The mean average precisions for intersection over union (IoU) thresholds of , , and were 1, 1, and 0.948, respectively. The classification recognition method for infiltration of tunnel concrete lining is thus shown to be feasible and accurate(ii)The recognition model detects a single picture in 320 ms (processed on an RTX 2080TI card). With an improved processor configuration, it is possible to realize on-site real-time detection(iii)The recognition model can learn geometric features, texture features, and local features (such as distortion, extrusion, size, and edge transformation) of the image and has strong adaptability. However, it is sensitive and has poor adaptability to image color features (such as brightness and contrast). Further work should consider increasing the diversity of the image data, as well as changing the annotation methods and forms, in order to improve the robustness and adaptability of the recognition model to the environment(iv)Faster R-CNN+Resnet101 models have significant advantages over Faster R-CNN+Inception2, with significant improvements in detection accuracy and speed
Data Availability
The data are available and explained in this article. Readers can access the data supporting the conclusions of this study.
Conflicts of Interest
The authors have no conflicts of interest to declare.
Acknowledgments
The authors offer many thanks for the support of NSFC for this research. This work was supported by the National Natural Science Foundation for Young Scientists of China (grant no. 51604276).