Abstract
In recent years, various accidents caused by pipeline corrosion and groundwater infiltration have brought irreparable losses to people’s lives and property safety. Therefore, it is very important to lay sensors on the pipeline for real-time monitoring and alarm. Aiming at the problem of distributed optical fiber sensing in the real-time pipeline monitoring process, this study proposes a pipeline invasion detection analysis method based on sliding window outliers. By continuously optimizing the value of the sliding window and the length of the abnormal state diagnosis window, the false alarm rate of the system is greatly reduced. This method only uses amplitude data to realize the intelligent identification of abnormal pipeline conditions. In addition, a simulation experiment of pipeline groundwater invasion monitoring has been carried out, and the results show that the algorithm does not generate false positives when the device is in good condition. Once the pipeline is invaded, the algorithm can quickly determine the location, and the error range is stable at 0.5 m. This method is an unsupervised artificial intelligence pipeline invasion detection processing method, and it has a good application prospect in the monitoring of abnormal pipeline conditions.
1. Introduction
As an effective, economical, and environmentally friendly transportation method, pipeline transportation has unique advantages in the transportation of resources such as oil and natural gas. At the same time, pipeline safety and regulatory requirements are constantly improving. Excellent intrusion detection systems are currently the focus of attention [1, 2]. Among the existing pipeline abnormal state monitoring methods, the stress wave method [3] has a short application distance and cannot be used in complex underground pipeline networks; the negative pressure wave method [4] overcomes the limitation of pipeline length, but is not sensitive to the problem of small flow leakage. The monitoring effect of abnormal conditions such as invasion is poor; ultrasonic detection [5] is highly demanding for operators and cannot be monitored in real time. Real-time monitoring by the pressure gradient method [6] and the mass flow balance method [7] is also accompanied by a very high false alarm rate.
In recent years, due to its high sensitivity, strong anti-electromagnetic interference ability, and good corrosion resistance, the optical fiber distributed sensing system had irreplaceable advantages in long-distance continuous sensing [8, 9], and people had begun to try to use it for abnormal state monitoring of pipelines. In 2001, Vogel et al. [10] used the Raman scattering distributed optical fiber sensing technology to monitor oil and gas pipeline leakage and realized leakage identification by monitoring temperature changes around the pipeline, but the problem is that environmental factors have a greater impact on the results. In 2012, Jia et al. [11] used the Brillouin scattering distributed optical fiber sensing technology to monitor oil and gas pipeline leakage and realized leakage identification by monitoring the stress change during pipeline leakage. This method eliminated the influence of environmental factors, but only had simulated data and lacked real data. In 2018, Stajanca et al. [12] used distributed optical fiber acoustic sensing (DAS) system based on the Rayleigh scattering to monitor gas pipeline leakage. A method based on the DAS signal in the frequency domain was proposed to achieve the identification and location of the pipeline leakage. However, the experimental results were greatly influenced by external factors, the positioning accuracy was low, and the fiber consumption was huge [13]. Based on the above analysis, this study proposed a new algorithm to monitor abnormal state of pipelines, namely a method of positioning and monitoring abnormal state of pipelines based on sliding window outlier analysis.
When calculating the pipeline data as a whole, it is found that the change curve of the amplitude data is not a stationary state [14, 15], which cannot meet the requirements of the Gaussian normal distribution [16, 17]. Therefore, it is not possible to directly analyze the amplitude data of the pipeline. However, this result is mainly due to considering the entire pipeline as a whole, and on the shorter time scale, the amplitude monitoring data of the pipeline show the stationarity on the short-term time scale. Once a pipeline intrusion occurs, the amplitude data at the intrusion point will be significantly abnormal. Therefore, this study proposes a sliding window outlier analysis method. The algorithm firstly segmented the amplitude data and then realized the real-time diagnosis of the abnormal state of the pipeline using the stationary characteristics of the amplitude data on the short-term time scale.
2. Research on Pipeline Invasion Identification Method Based on Sliding Window Outlier Analysis
2.1. Distributed Optical Fiber Monitoring Spatiotemporal Big Data
As shown in Figure 1, a pipeline invasion monitoring system based on distributed optical fiber amplitude measurement usually consists of three parts: a sensor system, a DAS system, and an invasion diagnosis system. Among them, the sensor system consists of distributed fiber-optic acoustic sensors, usually placed under the pipeline; the DAS system emits pulsed laser light to the fiber-optic sensors. When the external vibration detects a certain position of the transmission fiber, the optical fiber at that position will be feeling the effect of external stress or strain causes the fiber to stretch and the refractive index change, which in turn causes the phase of the backscattered light to change during transmission, so the amplitude measurement is realized by detecting the phase change. The invasion diagnosis system realizes the location and identification of the abnormal state of the pipeline by analyzing the amplitude data in real time.

The distance between the measuring points of the DAS system must be smaller than the spatial resolution to achieve monitoring at any position of the pipeline. Besides, higher sampling frequency can improve the real-time performance of the system. The amplitude monitoring data related to spatiotemporal have the volume characteristics of big data. At the same time, the higher sampling frequency of continuous monitoring is also in line with the velocity characteristics of big data. The distribution of amplitude data in time and space shows the variety characteristics of big data [18].
The large spatiotemporal amplitude data obtained by the underground pipeline groundwater invasion into the distributed optical fiber monitoring system could be expressed in the form of a matrix:where is the spatiotemporal big data matrix of monitoring amplitude, and the dimension is is the position of measuring points distributed along the sensor path, with a total of measuring points; is the measurement moment, a total of measurement moments; and is the current moment [18].
2.2. Analysis Method for the Diagnosis of Pipeline Groundwater Invasion
The refractive index of the fiber at the point of intrusion changes with the intrusion event, resulting in a phase change in the backscattered Rayleigh light. Due to the interference effect, there will be a corresponding change in the intensity of the back Rayleigh scattering. The Rayleigh scattered light reflected from different positions of the fiber will be detected by the detector, and the weak interference signal will be extracted [19, 20]. In the single-mode fiber, according to the one-dimensional impulse response model of the fiber back Rayleigh scattering, the incident laser is a rectangular pulse, and the amplitude 21 of the back Rayleigh scattering wave can be obtained by injecting the fiber at time = 0 [13].where is the number of scattering centers, is the amplitude of the ith scattered wave, is the attenuation coefficient, is the speed of light, is the refractive index, is the time delay of the ith scattered wave, is the frequency, and is the pulse width, and is the length of the fiber from the ith scattering center to the input [13]. When , the rectangular function ; otherwise, it is 0.
The modulation frequency of AOM is f. After continuous injection of m pulses, the input of the detector will obtain a continuous backscattered Rayleigh wave with a period of and its amplitude is expressed as follows [13]:
The light power is [13]:where represents the sum of the optical powers of each independent backscattering center, while is the sum of the light power generated by the interference of the backscattered Rayleigh light, which has serrated ripples. When the sawtooth ripple is generated by , the phase difference between the two scattering waves is [13]:
At this time, the data of formula (1) are transformed into two-dimensional data of time-phase difference and space-phase difference. Affected by complex factors such as soil and underground pipelines, the distribution of buried pipelines is uneven. In addition, severe weather conditions such as heavy rain and high temperature will also affect the pipeline. The surrounding soil amplitude has an impact; at the same time, factors such as observation noise and systematic errors of monitoring instruments will also cause deviations in the measurement results.
Outlier analysis technology is included in the method of pipeline abnormal state diagnosis, which is suitable for data with the Gaussian distribution characteristics. It is first assumed that the data can be characterized by its variance and mean, statistics that define the outlier metrics of the system. For data , the inconsistency index is defined as formula (6). A threshold value is set for the data that satisfy the Gaussian normal distribution. When the threshold value is determined, the state of the current data can be judged according to the calculation result of formula (6), that is, to judge the current state of the system.
Among them, is the amplitude data at the current moment; and are the standard deviation and mean of the data.
The steps of the pipeline intrusion diagnosis and analysis method are as follows (Figure 2 shows the algorithm process in the form of a flowchart):(a)Data acquisition: after the construction of the entire pipeline system is completed, the distributed optical fiber monitoring system is used to carry out online monitoring of pipeline groundwater invasion, and real-time distributed amplitude monitoring data along the pipeline are obtained.(b)Determination of sliding window: after obtaining the source data, the total length of the monitoring data (the value of ) is determined by trial and error, that is, the length of the sliding window.(c)Statistical calculation: first, the values of the sliding window length and the abnormal state diagnosis window length are determined. The values and require multiple attempts to be optimal. For the monitoring data of any detection point in the window, we need to calculate its and .(d)Calculation of inconsistent indicators: the amplitude data of any position and any moment on the pipeline are substituted into formula (6) to calculate the inconsistency index, and this will get .(e)Determination of the threshold: whether it is an outlier is judged by a threshold value. Furthermore, defining a boundary that distinguishes outliers from normal values is essential. Generally, the calculated in step d is sorted from the largest to the smallest, and the ranked before is taken as the outlier threshold .(f)Abnormal state diagnosis: the current time is proceeded, the amplitude data of the current time into the formula (2) are substituted, the inconsistency index is calculated, and the index with the threshold value is compared. If does not exceed the threshold value, the system is in a normal state and the window slides toward . If exceeds the threshold value, it indicates that the monitoring data at moment may be an outlier, and the window will no longer slide. However, the influence of data disturbance needs to be considered, and invasion alarm can not only be conducted with the single calculation result at the current moment, and it is necessary to continue monitoring, and nonuniformity index is obtained using the monitoring data of ( is the number of samples in the abnormal state diagnosis window, usually much less than ) time; if the threshold is exceeded continuously, the invasion alarm is given, and it also determines the location of the invasion . If some data exceed the threshold, it is considered that the current situation is not an abnormal state, and the algorithm continues to run.(g)The window continues to slide forward, repeating step a∼f for pipeline invasion diagnosis.

The abovementioned pipeline abnormality diagnosis method introduces the outlier analysis technology in the data analysis method to improve the accuracy and rigor. Common outlier analysis methods cannot be applied to nonstationary data, but the introduction of sliding window technology makes real-time identification of pipeline anomalies possible. In addition, to avoid the problem of false alarm caused by accidental external disturbance or measurement error, the concept of abnormal state diagnosis window is proposed to improve the reliability of diagnosis. The method only needs to consider the amplitude data in the window and does not require machine learning training with labels, which can realize real-time analysis of distributed amplitude data and real-time identification of abnormal status of pipelines.
3. Physical Model Verification of Pipeline Groundwater Invasion
3.1. Test Device
To verify the effectiveness of the above method, we conducted a distributed optical fiber monitoring test for buried pipeline invasion. First, a box of size is assembled. An opening at the bottom of the box was used for drainage; the upper part of the gravel layer was soil 20 cm thick, which was compacted in layers and then was backfilled. A water pipe is connected at a fixed position of the water pipe to simulate intrusion, and the external liquid is pumped into the pipe through the water pipe and pump, and valves and flow meters on the water pipe are installed. The water pipe realizes the simulation of intrusion control and the monitoring of the intrusion flow through the opening.
The test pipeline was a steel tube with a length of 12 m and a diameter of 250 mm, with a wall thickness of 10 mm. As long as an abnormal state such as an invasion occurs in a certain place of the pipeline, the amplitude of that place would be different from the amplitude of the previous running state. Therefore, the experiment designed different invasion locations, three temperature differences inside and outside the pipe, and different invasion flow rates to study the effectiveness of the pipeline abnormal state location monitoring method based on sliding window outlier analysis in diagnosing groundwater invasion. The data are shown in Table 1. The test simulated the simultaneous occurrence of groundwater invasion in two places in the pipeline, and the invasion points were located at 3.5 m and 10 mm in the longitudinal direction of the pipeline (denoted by L1 and L2). Figure 3 shows the composition of the device.

In the test, the invasion simulation of the cases in Table 1 was performed after the liquid in the tube was static to a stable state, and the distributed optical fiber sensor was used to monitor the amplitude change during the entire invasion process.
3.2. Data Analysis
3.2.1. Amplitude Data
The pros and cons of the algorithm depend on the change in the pipeline amplitude before and after the invasion. In cases 1 and 2, the temperature in the pipe was different, but the invasion flow rate was both 2.0 L/min. It is monitored with a fiber-optic sensor for about 30 minutes before the experiment and the change in the pipe amplitude is recorded. According to case 1 before and after the invasion, the amplitude monitoring data curve along the pipeline is shown in Figure 4. It could be seen that the pipeline amplitude hardly changed with the change in position, and its fluctuation range was within 2 mm. In the experiment, the duration of groundwater intrusion was about 20 minutes. After the invasion was over, the amplitude data were analyzed and it was found that there was significant jitter near the intrusion point. In case 1, the abnormal amplitude range near the invasion point L1 was 3.8∼5.4 m, and the highest amplitude was about 8.05 mm; the abnormal amplitude range near the intrusion point L2 was 9.8∼11.8 m, and the highest amplitude was about 12 m.

Although the highest amplitude point did not appear at the invasion point, the amplitude anomaly range included the invasion point, indicating that there was a clear correspondence between the invasion point and the amplitude anomaly. The monitoring data of case 1 showed that the invasion of liquid would cause the amplitude of the soil near the leakage point to increase significantly, showing a typical local amplitude abnormal phenomenon. The abnormal situation of the amplitude monitoring data changing with the location provided reliable evidence for the location diagnosis of the pipeline groundwater invasion.
According to the monitoring data in Figure 4, we tried to analyze the change curve of the amplitude of the invasion point L1 over time by changing the invasion flow rate when other variables were unchanged. The result is shown in Figure 5. Due to the two different invasion flows, in Figure 5, the invasion start time was 1830 s forward as the static and stable process. Before the invasion, the amplitude monitoring data were relatively stable in both cases, and its fluctuation was within 2 mm. As the invasion continued, although the invasion flow rate was different, the two invasion points showed a significant increase in amplitude, and the trend and speed were basically the same. The invasion ended after 3300 s, and the amplitude monitoring data at the invasion point L1 stabilized again. Analyzing the amplitude data at the intrusion point, it was found that the distributed optical fiber sensor was more sensitive to the amplitude anomaly caused by different intrusion flows and showed the obvious process of increasing the amplitude jitter.

To sum up, the amplitude data obtained by this algorithm provided the information of groundwater intrusion, but from the analysis of Figures 4 and 5, it could be seen that due to the influence of surrounding physical factors, the position and time-history curve data of the amplitude showed a certain degree of fluctuations. At the same time, it could be seen from Figures 6(a) and 6(b) that the choice of different coefficients had a great impact on the accuracy of the alarm, so a fixed threshold could not be set in location and time for invasion identification. Amplitude data must be analyzed in time and space to arrive at accurate diagnostic results.

(a)

(b)
3.2.2. Determination of the Length of the Sliding Window and the Abnormal State Diagnosis Window
In the pipeline groundwater invasion diagnosis, the values of the sliding window length and the abnormal state diagnosis window length were the key parameters that determined the accuracy of the invasion recognition. If the value of was too large, the stationarity requirements of the monitoring data could not be met; if the value of was too small, the normalization characteristics of the sampled data could not be satisfied. When the value of is set too large, it will affect the diagnosis result; if the value of is set too small, the normal data fluctuation will also cause an alarm. To determine the optimal value of and , we had designed different and for trial calculation. At the same time, we counted the number of false positives in invasion diagnosis, and the results are shown in Table 2.
When the sliding window length was 100, different values all had false alarms, and the smaller the value, the more the false alarms. Therefore, in the case of a short sliding window, the statistical characteristics of the amplitude monitoring data could not be accurately extracted from the monitoring data in the time window, resulting in reduced reliability of the outlier analysis results, and false alarms were prone to occur.
When the length of the sliding window was 700, the number of false alarms varied with the value of , which was basically the same as when = 100, indicating that when the sliding window was longer (about 7000 s, closed to 2 h), due to environmental factors, the amplitude of the soil around the pipeline had changed to a certain extent; that is, the amplitude of each place on the time scale could no longer meet the stationarity assumption. In this case, the obtained results had lost stability and could not reflect the true state of the pipeline amplitude variation. Data within the window could easily exceed the threshold, leading to diagnostic errors.
When the sliding window length was 200, if the abnormal state diagnosis window length was short ( took 10, 20), false alarms would also occur, but as the window length increased, the number of false alarms would gradually decrease. When the sliding window length was 500, there would be no false alarms only when = 50.
It could be seen from the above table that the number of false alarms would decrease with the increase in the length of the abnormal state diagnosis window, especially when was 200 and 500; as the value of increased, false alarms no longer occurred. However, the increase in value had an impact on the real-time performance of the diagnosis results. Therefore, to identify pipeline invasion in time and ensure the stability of monitoring data on the time scale, in the subsequent analysis, = 200 and = 30 were used to diagnose pipeline groundwater invasion.
3.2.3. Pipeline Invasion Diagnosis Based on Sliding Window Outlier Analysis
The distributed amplitude monitoring data are processed by the data dimensionality reduction analysis method, and the abnormal state of the pipeline is identified and located according to the result of the sliding window outlier analysis. The plot of the analysis results of cases 2 and 3 is shown in Figure 7. It could be seen that in any case before the pipeline intrusion, the outlier analysis judged that the pipeline was in a normal state and no errors occur. Since and were determined by the result of case 1, the diagnostic results of case 2 and case 3 further demonstrated the effectiveness of this method.

(a)

(b)

(c)
Compared with case 1, the invasion position and invasion flow rate of case 2 were the same, but the amplitude of the liquid in the tube was lower. The analysis results in Figures 7(a) and 7(c) were similar to those in Figures 4 and 5. The final invasion range was 3.42∼5.15 m and 10.55∼11.75 m. The former covered the invasion point L1, and the latter was only the same as the invasion point L2. The distance was about 0.55 m. The time when the abnormal state was first diagnosed in case 2 was 2210 s (the invasion time was 1840 s); that is, the first outlier appeared 370 s after the invasion occurred, which was 270 s (about 4.5 min) behind the time when the invasion was diagnosed in case 1. Analysis believed that this was because the temperature in the tube was low, and the impact on the amplitude was relatively small. Therefore, the temperature difference between the inside and outside had an impact on the diagnosis of pipeline intrusion, but it did not affect the positioning accuracy.
Compared with case 2, the invasion position and the temperature difference between the inside and outside of the pipeline were basically the same in case 3, and the invasion flow was smaller. According to Figures 7(b) and 7(c), the invasion positions of the pipeline were located at 3.73∼4.95 m and 10.54∼11.75 m, respectively, including the intrusion point L1, which was only about 0.65 m away from the actual position of the intrusion point L2. Case 3 was first diagnosed with abnormality at 3110 s (the invasion time is 2750 s), the time delay was 360 s, and the response speed was basically the same as that in case 2, and it showed that the small traffic intrusion could still be diagnosed by the algorithm, and the positioning accuracy would not be reduced.
Table 3 is a summary of monitoring data for cases 2 and 3.
Analysis showed that the monitoring data analysis method based on sliding window outlier analysis could accurately identify the location of abnormal pipeline conditions and had high real-time performance. In addition, compared with the pressure gradient method, the false alarm rate produced by this algorithm was greatly reduced. Under the premise of considering practical factors, due to the existence of a large number of redundant optical fibers underground, the cost of detection was greatly reduced. Moreover, this algorithm could directly produce results in the form of images, which was more convenient to operate compared with ultrasonic detection methods.
4. Conclusion
This study proposes a method for locating abnormal pipeline conditions based on sliding window outlier analysis based on distributed optical fiber sensor monitoring amplitude, which realizes real-time diagnosis of abnormal state of pipelines under unknown conditions. Aiming at the stationary characteristics of amplitude data on short-term time scales, this method introduces sliding window technology, which can analyze the amplitude data changing in time and space in real time. Through the invasion test of the buried pipeline, the spatiotemporal monitoring data of the amplitude, the temperature difference inside and outside the pipe, and the invasion flow conditions at different invasion positions are obtained. In addition, the optimal values of the sliding window and the abnormal state diagnosis window are determined, which greatly reduces the false alarm rate of the system. At the same time, it realizes the identification and localization of pipeline invasion in different situations. The results show that the method will not cause false alarms when the pipeline is in good condition; once the pipeline invasion event occurs, the method can quickly identify the event and accurately locate the invasion position, and the error range is stable at 0.5 m.
Data Availability
The data included in this study are available without any restriction.
Conflicts of Interest
The authors declare no conflicts of interest.
Authors’ Contributions
All authors have seen the manuscript and approved it to submit to the journal.
Acknowledgments
The study was supported in part by the Technology Innovation Center for Geological Environment Monitoring of China under Grant 2020KFK1212005 and the Science and Technology Development Plan of Jilin Province under Grant 20200602046ZP.