Abstract
Accurate identification of lithology is the basis and key process of fine logging interpretation and evaluation. However, reservoirs formed by different sedimentary environments and tectonic movements generally have the characteristics of complex and diverse lithology and strong heterogeneity, which brings great difficulty to the identification of reservoir lithology. This paper proposes an automatic identification technology for lithology logging based on the GWO-SVM algorithm model. The technology is actually applied, and the results are compared with the results of the support vector machine cross-validation optimization model, PNN (probabilistic neural network) model, and ELM (extreme learning machine) model. The results show that the GWO-SVM lithology logging recognition model can efficiently solve the lithology recognition and classification problems in complex reservoir analysis and has strong adaptability and higher recognition accuracy.
1. Introduction
Accurate identification of lithology is the most direct and effective method for stratigraphic evaluation, fine reservoir description, and reservoir identification [1–3]. It is the basis for deepening reservoir understanding and improving the accuracy of reservoir logging interpretation. At this stage, the method to accurately determine the lithology of the reservoir is to analyze core data by coring. However, due to the limited and expensive coring data, it is difficult to obtain detailed core data for each well. Logging data have the characteristics of high vertical resolution, strong continuity, and convenient data acquisition. Commonly used logging lithology identification methods are mainly to establish the mapping relationship between logging parameters and rock types and use mapping to identify the rock types of unsampled well sections. Now, with the improvement of logging technology, more and more comprehensive logging parameters such as electrical imaging logging data, lithology scan data, and element logging data are being measured. Some new lithology identification technologies have emerged. The comprehensive lithology recognition method is based on image and element logging data [4], deep learning rock image for intelligent lithology recognition [5, 6], and using WorldView-3’s VNIR data to analyze pixel and object-based images to perform lithology identification [7]. But for some old wells lacking these new types of data, these methods are also not applicable [8, 9].
With the maturity of machine learning algorithms, more and more intelligent algorithms and mathematical methods are applied to lithology identification. Therefore, common logging lithology identification methods can be divided into the following three categories: the intersection graph method, cluster analysis, and pattern recognition. The intersection graph method is to analyze the core data to establish the intersection graph of logging curve parameters and lithology types. Finally, the joint constraint conditions of multiple parameters are determined to complete the identification of lithology. The clustering analysis method is simple in algorithm, easy to implement, and the algorithm converges quickly. However, this method has very high requirements on data types and is sensitive to “noise” and outlier data. A small amount of this type of data can have a great impact on the average. Therefore, for reservoirs with complex lithology, such as carbonate reservoirs and shale reservoirs, this method is often difficult to apply. Nowadays, many pattern recognition methods are used in lithology recognition: the lithology recognition method based on the gradient boost decision tree [10], the neural network lithology recognition method based on differential evolution based on logging information [11], using the convolutional neural network automatically classifies lithology from drill core images [12], the semisupervised learning method using the Laplace support vector machine for lithology recognition [13], attention-based bidirectional gated recurrent unit neural network lithology identification [14], the lithology identification method based on the recurrent neural network [15], the lithology identification method based on the parameter optimization AdaBoost algorithm [16], the lithology identification method based on the adaptive kernel density Bayesian probability model [17], the lithology identification method based on parameter optimization integrated learning [18], and the logging lithology identification method based on improved multigranularity cascade forest [19, 20]. These lithology recognition methods related to machine learning have been proposed one after another.
Lithology identification is an important part of reservoir lithology evaluation. The high-precision lithology identification results can be used to determine the mineral composition of the subsequent formation and analyze the physical properties of the reservoir, especially the calculation of porosity and permeability to lay a solid foundation and guarantee. This paper proposes a lithology logging recognition technology based on the gray wolf optimization algorithm optimization support vector machine algorithm. The main idea is to use the gray wolf optimization algorithm to optimize the key parameters of the SVM (support vector machine). The novelty of this study is based on the gray wolf optimization algorithm because compared to other functional optimization algorithms, the gray wolf optimization algorithm has the following advantages: simple principle, few parameters, excellent local search ability, and high solution accuracy, as well as the advantages of support vector machines in processing small sample data. The technology is applied to the measured data, and the classification results are compared with the cross-validation model of the support vector machine, the ELM (Extreme Learning Machine) model and the PNN (Probabilistic Neural Network) model. Comprehensively evaluate the performance of this new lithology identification technology and analyze its advantages and disadvantages. By processing the measured data, the results show that the accuracy of lithology identification based on GWO-SVM reaches 94.4444%. The algorithm proposed has been developed on MATLAB [21]. The proposal of this technology enriches and perfects the existing reservoir lithology identification technology and provides more choices for the selection of lithology identification methods.
2. Intelligent Algorithm
This paper uses the gray wolf optimization algorithm to optimize the two key parameters of the support vector machine to establish a lithology recognition model. The following section briefly introduces the basic principles of the two intelligent algorithms.
2.1. Support Vector Machines
SVM (support vector machine) is a supervised machine learning algorithm for small sample data that considers both empirical risk and structural risk minimization and is based on statistical learning VC theory. It is often used to solve the problem of data classification in the field of data mining or pattern recognition. As shown in Figure 1, the basic idea of the algorithm is to find a coefficient to separate categories to maximize the distance between the hyperplane and the nearest data point, and this distance is defined as the maximum classification interval. The support vector machine algorithm believes that the larger the classification interval, the better the corresponding hyperplane. The hyperplane corresponding to the maximum classification interval is considered to be the optimal solution for the support vector machine. The sample points that the dashed lines on both sides of the hyperplane pass through are called “support vectors,” and they support or define the hyperplane. Finally, through the nonlinear kernel function, the linearly inseparable problem in the low-dimensional feature space is transformed into the linearly separable problem in the high-dimensional feature space. Find the optimal hyperplane in this high-dimensional linear feature space so that the distance between all samples and the optimal hyperplane is minimized, which is transformed into a convex quadratic programming optimization problem and is solved by the related ideas of operation research. The objective function is the maximum classification interval in the SVM algorithm. The constraint is the hyperplane. In the solution process, in order to control the error, a penalty factor c is added. The kernel function parameter and penalty factor c are the main parameters that affect the accuracy of SVM prediction. Among them, the penalty coefficient c affects the complexity and stability of the model. If c is too small, it is easy to generate large errors and underfitting. Although c is too large, it will improve the learning accuracy, but it will reduce the generalization ability, and it is prone to overfitting. The phenomenon of fitting, the kernel function parameter , mainly reflects the correlation between the support vectors. The smaller the is, the weaker the connection between the support vectors and the worse the generalization ability. The tighter it is, the less accurate it is to learn. Therefore, choosing a suitable (c, ) has a huge impact on the performance of the SVM model. It is necessary to obtain the optimal combination of parameters through optimization algorithms to ensure the prediction accuracy of the model.

2.2. Gray Wolf Optimization Algorithm
The optimization principle of the GWO (gray wolf optimizer) is derived from the hierarchy mechanism and hunting method of the gray wolf group [15], the predation action is led by the head wolf, and other gray wolf individuals besiege. According to the gray wolf algorithm’s social distribution level (as shown in Figure 2), the algorithm design of adaptive control is realized. The algorithm divides the gray wolf population into four levels: and . Among them, the wolf of the next level must obey the wolf of the previous level or higher to complete the tracking and hunting of the prey. The position is the optimal solution of the objective function. Following simulates the process of the gray wolf population during predation:

First, the gray wolf group surrounds its prey by formulas (1) and (2). In this process, the positions of low-level wolves are continuously adjusted according to the update of the positions of high-level wolves so as to find the optimal hunting position.
In the formula, is the current iteration number, is the current position of the prey, is the current position of the wolf, is the distance between the wolf and the prey, and and are coefficient vectors.
Secondly, after the gray wolves surround their prey, the hunting process begins. According to the range of the prey currently obtained, three optimal solutions are determined, namely, the positions of the wolves of the first three levels, and the positions of other wolves are dynamically updated according to the movement of the positions of these three wolves. In this process, the wolves of the first three levels will guide the direction, approaching the position of the prey until it is captured, and the position of the individual gray wolf will be dynamically updated according to
In the formula, and are the positions of the wolves of the first three levels, is the position of the current solution, , and are randomly generated vectors. , are the distances to other wolves, respectively, according to the formulas (3) and (4). Calculate the final position of the current solution after defining the distance, are random vectors, and is the final position of .
Finally, determine the location of the prey and attack it. In order to perform a mathematical transformation on the nearest prey, the size is constantly adjusted to determine whether the position of the prey is found.
3. Optimization of Lithology Logging Parameters
3.1. Lithology Classification
The data set of this study is quoted from the University of Kansas Hugoton and Panoma gas field. The lithology data of the study area are all accurate core naming data, and all have clear corresponding lithology information, which provides strong support for the research work of this article. This study is based on the following sample selection principles: avoid using the sample points at the thin layer and the layer interface and select the sample point data of the relatively stable lithology and relatively smooth section of the wellbore. Finally, 880 sample points were selected, and based on the lithological composition and structural characteristics, thin slices, and core identification data, the lithology of the study area was classified into 6 categories: marine, paralic, floodplain, channel, splay, and paleosol. As shown in Table 1, we set category labels for it to facilitate training.
3.2. Optimal Selection of Lithological Sensitivity Logging Parameters
Logging parameters are a comprehensive reflection of the changes in various physical and chemical properties of the formation. The difference of the corresponding logging parameters in different formations mainly depends on the lithology, the particle size of the rock, and the fluid properties in the pores. Different lithologies correspond to different logging response characteristics, and there are certain rules to follow. The difference in the logging response of different lithologies is manifested in the logging parameters in the range of the log curve amplitude. As shown in the box diagram in Figure 3, the value ranges of the six lithologies on the six logging curves are very different. The difference in maximum, minimum, and mean represents the overall difference between different lithology categories, and the difference in the median and mean reflects the difference in the degree of data aggregation. In the figure, U, TH, and K are radioactive logging curves, RHOB is the density logging curve, PHIN is the neutron logging curve, and UMAA is the volume photoelectric cross section coefficient. The difference in statistics of different lithology logging parameters also verifies the separability of lithology categories based on logging parameters from the side.

When creating a lithology recognition model based on logging curve parameters, it is necessary to perform a sensitivity analysis of logging parameters to optimize parameters with high sensitivity. As shown in Figures 4 and 5, two-dimensional and three-dimensional intersection maps of different lithologies under different logging parameters are made, and the intersection maps are analyzed. The results show that different logging parameters of different lithologies have different degrees of overlap. Different types of lithology correspond to different high-sensitivity parameters. It is difficult to realize the division of lithology with any single logging parameter. Only the response characteristics of the six lithologies to all logging parameters are considered comprehensively. This enables accurate identification of lithology. Finally, six curves of uranium, thorium, potassium, neutron, density, and volume photoelectric absorption index which are more sensitive to lithology are selected.


4. The Realization of the GWO-SVM Lithology Logging Model
This part mainly involves the design idea of the GWO-SVM model, accuracy evaluation index, algorithm flow, example application, and analysis.
4.1. Model Analysis and Design
The GWO-SVM lithology logging recognition model is mainly composed of the selection and processing of sample data, the optimization of key parameters, and the accuracy evaluation of the model. In this paper, the process of establishing a lithology logging identification model based on logging curves is mainly divided into five steps: (1) The selection of sample data and its normalization preprocessing, and finally, the samples are divided into test samples and training samples. (2) Analysis of the relationship between logging parameters and lithology types. (3) Find the best combination of parameters c and through the GWO optimization algorithm. (4) Create the GWO-SVM lithology logging recognition model. (5) Use the test set to test the model and analyze and evaluate its performance.
4.2. Model Accuracy Evaluation Index
The accuracy of the model is the basic index to evaluate the performance of the model. In actual research and application, the evaluation indicators of accuracy are different, and the focus is also different. Some indicators focus on measuring the accuracy of the test sample prediction, and some indicators focus on measuring the correlation between the predicted value of the lithology category and the true value. In specific experiments, various measurement indicators should be comprehensively considered to objectively and comprehensively evaluate the performance of the model. At present, the measurement indicators for the accuracy of lithology logging recognition mainly include the following: Mean Square Error (MSE), Mean Absolute Error (MAE) and Accuracy (Accuracy), their basic definitions are as follows:(1)Mean square error (MSE): the average of the square of the difference between the true value and the predicted value. The smaller the MSE value, the higher the prediction accuracy of the model.(2)Mean absolute error (MAE): the average value of the absolute error. MAE can better reflect the actual situation of the prediction error. The smaller the value, the higher the prediction accuracy of the model and the better the prediction effect.(3)Accuracy: the proportion of the number of samples with the correct prediction category in the number of samples in the test set. It can intuitively and clearly reflect the number of samples for the prediction. The larger the value, the higher the prediction accuracy of the model and the better the prediction effect.where is the true value, is the predicted value, is the number of samples correctly predicted in the test set, and is the total number of samples in the test set.
4.3. Data Selection and Preprocessing
The lithology classification data are used as the training set and the test set of the lithology logging recognition model. The total number of samples is 880. Each sample contains 6 logging parameters as input parameters and the lithology category label as an output parameter. The number of training sets and the number of test sets are randomly generated at a ratio of 8:2.
Since there are many types of logging data and different properties, the existence of these factors will affect the effect and performance of the model. Therefore, the logging data must be preprocessed. Data preprocessing mainly has the following two purposes: one is to make all kinds of data normal, dimensionless, simplify calculations, and eliminate unreasonable phenomena. The second is to solve the three problems caused by the large difference between the data: slow convergence, too long training time, and unsatisfactory prediction performance.where is the normalized data, is the original data, and and are the maximum and minimum values of the original data, respectively.
4.4. GWO-SVM Model Construction and Result Analysis
The GWO-SVM lithology logging recognition model was constructed according to the flowchart in Figure 6. First, input the sample data and divide the sample data into a training set and a test set according to a certain proportion. Set the number of wolves, the maximum number of iterations, the number of parameters to be optimized, the upper and lower limits of the parameters, and the position of the initial wolves. Calculate the fitness and save the three wolves with the largest fitness value. Second, perform continuous iterations and loops until the maximum number of iterations is reached to stop training and output the optimal c and g. Finally, the best parameter combination and model obtained by training are applied to the test set to complete the division of lithology categories.

As shown in Figure 7, the best parameter combination (c, ) obtained by using the GWO-SVM model is (14.2772, 60.9962). The results show that accuracy = 94.4444%, MSE = 0.22778, and MAE = 0.10556, and a good prediction effect is obtained.

4.5. Comparative Analysis
In order to further verify the superiority of the GWO-SVM model, the same lithology identification experiments were carried out using the SVM cross-validation model, the ELM model, and the PNN model. As shown in Table 2 and Figure 8, the result analysis shows that accuracy:GWO-SVM > PNN > SVM > ELM.MSE: GWO-SVM < SVM < PNN < ELM.MAE:GWO-SVM < SVM < PNN < ELM. The greater the accuracy, the smaller the MSE and the smaller the MAE, indicating that the performance of the model is better. From these three evaluation indicators, the GWO-SVM model does have high recognition accuracy.

In addition, the error curve of each model is also introduced. A value of 0 represents a sample with the correct prediction result, and the classification effect of each model can be seen intuitively through the error curve. As shown in Figure 9, it can be seen that the error curve of GWO-SVM has the most sample points of 0, which can also prove that GWO-SVM has higher accuracy than other models.where is the true value of the lithology category label, and is the predicted value of the lithology category label.

5. Conclusions and Recommendations
This paper proposes a method to improve the accuracy of lithology logging recognition based on the gray wolf algorithm to optimize the parameters of the support vector machine. Through the actual application of the model and comparative analysis of the prediction performance of the four models of GWO-SVM, SVM, PNN, and ELM, the following conclusions are obtained:(1)The logging data can continuously and in situ reflect the physical characteristics of the formation. The logging curve is the comprehensive response characteristic of the formation rock. The difference of the observed value of each logging curve mainly depends on the lithology. Therefore, the idea of inversion can be borrowed to use log features that identify lithology. The high-precision lithology identification results can lay a solid foundation for the subsequent calculation of formation mineral components, reservoir physical property analysis, oil-bearing analysis, etc., and provide basic law understanding. Traditional methods of lithology identification have been difficult to meet the requirements of fine reservoir evaluation, and lithology identification by using machine learning and intelligent algorithms has become the mainstream research method. In addition to the GWO-SVM model proposed in this paper, more other options can be tried in the future, such as combinatorial optimization of SVM, PNN, ELM, BP, and other methods through PSO, GWO, genetic algorithm, etc.(2)The core of the GWO-SVM model is to use the gray wolf optimization algorithm to optimize the key parameter penalty factor c and the kernel function parameter of the support vector machine. The gray wolf optimization algorithm has excellent local search capabilities. The advantage of support vector machines is to deal with the problem of the small amount of data. The combination of the two intelligent algorithms can improve the generalization ability and classification accuracy of the support vector machine classification, and the model parameters are fewer, easy to implement, simple, and efficient.(3)In practical applications, the lithology predicted by the GWO-SVM lithology logging recognition technology matches the lithology of the core analysis as high as 94.4444%. Compared with SVM, PNN, ELM, and other lithology recognition models, comprehensively refer to the three accuracy evaluation indicators of MAE, MSE, and accuracy. The results show that compared with the other three models, the GWO-SVM model has smaller MAE and MSE and higher accuracy, so the GWO-SVM lithology logging recognition model has much superior performance.(4)The lithology identification methods commonly used in the past have strong geographical restrictions when used and are not universal. However, the lithology logging identification technology based on the GWO-SVM algorithm model breaks through the geographical limitation and only needs to process the logging curve parameters. It has strong universality and is suitable for popularization and application.
Data Availability
The data that support the findings of this study are available on request from the corresponding author (RD).
Conflicts of Interest
The authors declare no conflicts of interest.
Authors’ Contributions
Shengyan Lu conducted conceptualization, data curation, methodology, investigation, validation, and writing of the original draft. Moujie Li and Rui Deng conducted supervision, formal analysis, funding acquisition, provision of resources, writing, review, and editing. Na Luo, Wei He, Xiaojun He, and Changjian Gan conducted supervision, formal analysis, and provision of resources.
Acknowledgments
The research was funded by the Major National Science and Technology Projects of China “Multidi-mensional and High Precision Imaging Logging Series” (No. 2017ZX05019001) and the Key Project of Science and Technology Research Program of Hubei Provincial Department of Education (grant number D20191302).