Abstract

Due to the important role of crude oil desalting for the whole petroleum refining process, the near-infrared spectroscopy resulting from molecular vibration is used to detect and isolate potential faults of the desalting process in this paper. With the molecular spectral data reflected by the near-infrared spectroscopy, the principal component analysis is adopted to monitor the process to see if it is in a normal operating condition or not. Considering the feature that the dimension of near-infrared spectroscopy is much larger than the sample size, the least absolute shrinkage and selection operator is employed to achieve an automatic variable selection procedure of the observed spectral data. Simultaneously, if some faults occur, the least absolute shrinkage and selection operator can be used to locate the spectral region affected by the failure. In such a way, the roots of faults can be tracked according to the change of the wavelength numbers. Performances of the proposed fault detection and isolation approaches are evaluated based on the near-infrared spectroscopy sampled for the crude oil desalting process to show the effectiveness.

1. Introduction

Due to different reasons, different types of inorganic salts are commonly detected in crude oil, for example, NaCl, CaCl2, and MgCl2. During the refining process, however, all these chlorides may lead to severe problems such as decreasing the efficiency or the life of catalysts (which is costly), equipment corrosion, pipeline corrosion, fouling, and plugging, reducing the benefits of the petroleum refining industry further [13]. Consequently, the crude oil desalting process is an indispensable step in the petroleum refining industry. On the other hand, the modern petroleum refining process is highly integrated. Its possible failure may cause an overall plant to shut down and even lead to a catastrophic accident [46]. Besides, due to the crucial role of desalting, a real-time monitor is necessary. Accordingly, the existing monitoring techniques can be conveniently classified into two main categories: the model-based approaches and the data-driven approaches [712]. Compared with the model-based methods, the data-driven monitoring is based on analysing the conventionally measured variables of the industrial process, such as temperature, pressure, rate of flow, and level [1316], which is more suitable for practices.

With new analytical techniques, near-infrared (NIR) spectroscopy has been widely developed in many fields due to its unique features of noninvasiveness, less pollution, and being suitable for online analysis [1720]. Essentially, a vibrational spectroscopy technique allows the probing of overtones and combination bands of the fundamental frequencies, mainly hydrogen bonds, such as -OH, -NH, and -CH [21]. The information presented in NIR spectra can be used to determine the composition of the sample or its physical properties. Generally speaking, NIR spectroscopy provides new perspectives for the analysis and synthesis of industrial processes from the level of molecules. Because of these, therefore, a significant number of works have exploited NIR for process monitoring with several effective results. For example, taking the NIR as an in-site monitor, Santos et al. proposed a method to monitor the coffee roasting process analytically in an online manner [22]. With the same purpose but in different fields, NIR is employed to monitor the simulated moving bed and control the fluidized bed granulation and coasting processes [23]. To monitor the production of 2G ethanol from lignocellulosic sugarcane residues, Pinto et al. combined NIR with partial least squares (PLS) regression to predict glucose and ethanol concentration [24].

Nevertheless, to the best of the knowledge, the application of NIR to monitor the crude oil desalting process has not been detailed in the existing literature yet. To be specific, the oil desalting unit separates the salts from the raw feed by the meaning of solvent addition and extraction equipment. A simple desalting process with the NIR analyser layout is illustrated in Figure 1. The primary and significant task is to monitor the process and make sure the desalted oil transported to the following processing stage is qualified. If the function is abnormal, it is essential to capture the faults earlier and show the reason more accurately to allow the operators to do troubleshooting as soon as possible.

Fault diagnosis techniques are one of the essential methods to ensure the safety of the industrial process. The overall concept of fault diagnosis includes fault detection, fault isolation, and fault analysis. Many different methods of fault diagnosis have been introduced in the literature [2527]. By analysing the current results, we can find that there are a few results of the application of NIR for diagnosing or isolating the process faults. Recently, Sales tried to use the contribution plotting method to remove the disturbance affected by the disturbances [28]. Although this method is well-understandable and easy to follow, the results are vulnerable and easily affected by the smearing phenomenon. To be specific, a fault occurring at a process variable may cause a significant increase of the indexing contributions to fault-free variables [29]. This phenomenon is more evident for NIR. For example, taking the additional water as a fault can change the concentration so that all the spectral regions will change [21]. Consequently, the contribution plots cannot serve as a satisfying index that points out the original root causes of the failures anymore.

Based on the above motivations, the primary task of this paper is to monitor the crude oil desalting process using the NIR measurements. The contributions of this paper are specified as follows: (1) the proposed method can isolate the fault and identify the variables primarily responsible for the failure after the detection; (2) considering the characteristic that the dimension of NIR spectral is highly more significant than the sample size, the minor absolute shrinkage and selection operator (LASSO) is employed to reduce the dimension of spectra by imposing a bound on the -norm of regression coefficients; and (3) with the molecular spectral data collected by NIR, the least angle regression (LARS) algorithm is explored to find out the spectral region most affected by the failure, showing the causal origins of the fault clearly. The proposed fault detection and isolation approaches to the crude oil desalting process demonstrate the effectiveness and advantages of the theoretical results proposed.

The rest of this paper is organized as follows: in Section 2, the fault detection based on principal component analysis (PCA) for the oil desalting process is reviewed. Then, the fault isolation method based on LASSO is elaborated in Section 3, where the motivation and advantages are discussed. Section 4 presents the application of the results to the crude oil desalting process. At last, the conclusions are presented in Section 5.

2. PCA Algorithm for Fault Detection

The PCA method has been widely used for the purpose of fault detection, due to its simple structure and user-friendly features [3036]. In this section, we briefly review the PCA method in the framework of fault diagnosis based on the spectra data.

Following the standard step of modelling, the obtained spectra data are split into two parts: the training data set and the testing data set. The training data set is to identify a nominal model in a fault-free environment. Once the fault-free feature of the process is extracted, any abnormal condition will deviate from the normal conditions and show inconsistency.

Collect all the spectra data in a matrix with dimension , where denotes the number of samples, and the column number shows the wavelength values. Basically, . To build a PCA model, each column of is needed to be normalized to remove mean and trend to have unit variance. Then, the PCA model can be formulated based on the normalization data as follows:where the number of principal components (PC) in the model is denoted by , is the loading vector for the corresponding PC, is the score vector, and represents the residual. Omitting the error matrix , the remaining part is named as the structured results and is named as the PC model. To show the uncertainties of the PC model, accordingly, the covariance matrix is commonly computed by the following equation:in which is a diagonal matrix containing all the eigenvalues of , and with being the eigenvectors, where denotes the identical matrix. To show the uncertainties of each PC, the cumulative percent variance (CPV) index can be used to determine the PC. That is, we can compute it by

To reduce the dimension of original data, only a part of PC is selected. Basically, the number of the PC chosen is determined by the principle that how may percentage of the variance would like to be captured. Generally, the threshold number used is 95%. Once the PC are selected, we can use the Hotelling and SPE statistic as the fault detection indices [37]. The SPE statistic represents the change in the residual space projection of the observed data at the moment . The statistic is a measure of the difference in the primary metric space. It reflects the distance of the magnitude or trend of each data sampling point relative to the origin of its primary metric subspace. In other words, the SPE is used to reflect the possible failures of all variables in the data matrix. In contrast, the statistic demonstrates the shortcomings of the variables associated with a variable in the data matrix relative to the principal element. Here, the statistic is computed as follows:where is the target statistic performance index, the number of PC determined, is the corresponding PC selected, and is the inverse matrix of the where .

To operate the fault diagnosis task, the control limit is needed to be determined. For the Hotelling , since it follows the F-distribution, we havewhere is the resulting control limit and represents the F-distribution. Here, and are the degrees of freedom, and denotes the degree of confidence. SPE statistic is expressed with the PC of the residual subspace; the expression is written aswhere presents the statistic and is the resulting PC. According to its property, the control limit for SPE follows the Chi-distribution. Therefore, we can obtain the control limit aswhere denotes the control limit, is a parameter that shows the vector containing the eigenvalues of the covariance matrix of , and is the standardized normal variable. By using this index, the confidence limit is , which has the same sign as .

In simple terms, the training set is used to get the PCA model. Then, the statistics (Hotelling and SPE) are applied to detect deviations from normal behaviour using the control limits obtained from handling the training set. Finally, the test set is processed to calculate the new statistics firstly and then, compared with the control limits, to judge its status. Two statuses for normal and fault are defined as follows:

Sometimes, and (or with ) can also be observed. In these cases, we can also state that a fault is detected. However, in this paper with the consideration of robustness, a fault is detected only when these two performance indexes go beyond the thresholds simultaneously.

3. Fault Isolation Based on LASSO Algorithm

Once faults have been detected, some changes of NIR spectra definitely happen. If we could lock which specific ranges of the spectra or molecular structure varying accordingly, the fault might be located or isolated subject to the chemical properties of the desalting process and NIR spectrometric knowledge. However, the dimension of the preprocessed NIR spectra data used for the fault detection is high, much larger than the sample size, making it difficult to figure out the spectral region affected by the fault. Therefore, a suitable method to deal with this issue needs to be developed [15].

3.1. Fault Isolation Using LASSO

Considering that an observation can be detected as a fault using the reconstruction method, the corresponding normal value can be expressed as follows [38]:where is the Hadamard product (if , then element in is the product of the elements of the original two matrices and ), is a direction vector with elements 1 or 0 representing whether the values are influenced by the fault or not, and the elements in denote the degree of the corresponding values influenced by the fault. The purpose of this method is to eliminate the effects of the fault by making closest to the normal region. According to the monitoring statistics of the PCA-based fault detection approach in this paper, the above method can be formulated as an optimization problem which is shown as follows:

Since the information reflected by the NIR spectra within a certain band is the same, the dimension can be reduced by incorporating some constraints. Therefore, the fault isolation problem can be considered as the selection of subsets. Introducing the -norm penalty, the optimization problem can be reformulated aswhere is a redenoted notation, represents the -norm, and is the penalty factor. Consider the following linear regression model:where is the response matrix, is the predictor matrix, is the residual matrix, and contains the unknown regression coefficients to be estimated. The LASSO algorithm introducing the -norm penalty term into the objective function is expressed as follows [39]:

This algorithm can not only compress the regression coefficients but also select the variables automatically. This characteristic makes it possible to solve the issue of high dimension for the NIR spectra. Noticing the form of equation (12) and conducting the Cholesky decomposition on, it gets

This equation can then be transformed into the form as follows:

Comparing equations (14) and (15), LASSO can be applied to solve the fault isolation based on the reconstruction, which is formulated in equation (12) by denoting , , and .

3.2. Algorithm of LASSO

It is seen that if the penalty factor is introduced, equation (15) can be easily solved. Nevertheless, different values of will work out different results of . Larger values of will cause fewer nonzero items in . In other words, with the increasing value of , fewer bands are treated as the spectral regions affected by the potential fault. If is too large, some bands affected by the fault may not be able to be isolated. On the contrary, if is too small, some bands not affected by the fault may be selected. Thus, a suitable algorithm of selecting should be given. According to the research results in [40], there exists a finite sequence:. For , all values of are shrunk to zero, where is the -th entry of calculated by solving the optimization problem in equation (12). In each interval, the active set and the sign vector do not change with the value of , where and the is the sign function defined as follows:

are named as transition points. In other words, the first transition point can isolate the first value most affected by the fault, the second transition point can isolation the second value most affected by the fault, and the rest can be done in the same manner. Using the LARS algorithm proposed by Efron et al. in [41], the spectral region most affected by the fault can be located. The specific Algorithm of LARS is as follows (Algorithm 1).

(1)Set and the beginning active set . Start with .
(2)Get the correlation vector and find the greatest absolute correlation value , where is the -th element in . Then, update the active set as .
(3)Calculate the equiangular vector , where for , is the -th column in the matrix , , and is ones vector.
(4)Get the step size as , where is the complementary set of with and , is the -th element in for , and indicates that the minimum is taken over only positive components within each .
(5)Let and update for .
(6)Return to Step 2 until which also means that the monitoring statistics fall within the corresponding control limit in Section 2.
(7)Finally, the variables in the active set are isolated as the spectral region affected by the fault. Besides, due to the steps, the spectral region affected by the fault can be isolated from the most to the less.

Note that the effectiveness of the PCA and LASSO methods has been well-documented [35, 42], the effectiveness of the proposed method can be expected, and we will illustrate this point in Figure 2.

4. Case Study

4.1. NIR Spectra Acquisition and Preprocessing

According to Figure 1, the NIR spectrometer probe was inserted into the desalted oil output pipe. The NIR spectra were continuously collected during the oil desalting process for 7 days including the fault part. Each spectrum was collected with a resolution of . The raw NIR spectra are shown in Figure 3. Preprocessing methods such as standard normal variate (SNV) and the first derivative are applied to eliminate the spectra shift along the process temperature and light source life [43, 44]. In this study, the first derivative was used, and the preprocessed spectra are shown in Figure 4.

4.2. Fault Detection and Isolation Results

According to the collected spectra, the first 3 days’ spectra data are normal (training set) and the remaining 4 days’ spectra data are abnormal including the fault part (test set). The preprocessed spectra data are used to do the fault detection based on the PCA algorithm. One preprocessed spectrum constitutes a row of the matrix; the value loading on each wavelength constitutes the elements of the matrix. The data analysis is performed using the Unscrambler 9.6 (CAMO, Oslo, Norway) and MATLAB 7.5 software (MathWorks, MA, USA).

The detection results for different faulty scenarios are shown in Figures 58. Specifically, we test the proposed method for the case that a fault occurs at sample = 300 in Figure 5, while an abnormal (not fault) operating condition exists from sample = 300 to sample = 700 in Figure 6. In Figure 7, fault occurs at sample = 300 and disappears at sample = 700. On the contrary, a normal operating condition only happens from sample = 300 to sample = 700 in Figure 8. All these results give us a consistent conclusion that the proposed method can be used with the NIR measurements to give a satisfying monitoring performance for the desalted oil process.

Furthermore, the moment 300 is the earliest spectra data containing the early information of the fault, as shown in Figure 5. Using this sample data can acquire more accurate results and provide earlier advice for the operators to take appropriate measures. The fault isolation result based on the data (samples = 300) using the algorithm mentioned in Section 3 is obtained. By reducing the value of , the affected spectral region is selected one by one. Compared with the wavelength number of preprocessed NIR spectra, the first band isolated in the black area ranges from (), and this region is the first overtone of C-H stretching. The second band isolated in the red area ranges from (), and this region is the first overtone of O-H stretching. The significant changes in O-H stretching illustrate that a water-related event may cause the process issues. After consulting with process reliability engineers, it was found that the incident happened due to the corrosion of a separator heated by hot water. The leakage of water into the oil stream caused extremely high water concentration which may bring huge increase of chlorides and damage the process equipment quickly. This result of fault isolation based on LASSO algorithm shows the effectiveness for the oil desalting process and similar processes.

5. Conclusions

The potential of NIR associates with the LASSO algorithm to establish monitoring schemes for the crude oil desalting process has been investigated in this paper. The fault detection method based on PCA is adopted to monitor the process. Then, the LASSO algorithm is utilized to find the spectral region affected by the disturbances to provide some advice for the root cause diagnosis. With the help of the LARS algorithm, the spectral area affected by the fault from primarily to less is located to work out the problem of fault isolating accurately. Because the changes in the molecular level can be identified earlier than the physical appearances on the process, NIR spectra-based monitoring has the advantage of more sensitivity to early failure, which allows the operators to capture the faults earlier and deal with the problem with enough time. In our future work, the adaptive LASSO algorithm will improve the fault isolation performance by applying adaptive weights to -norm penalty.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Key Laboratory of Advanced Perception and Intelligent Control of High-End Equipment, Ministry of Education, Anhui Polytechnic University, under Grant no. GDSC202019.