Abstract

As one of the most serious geological hazards, landslides affect infrastructure construction. Thus, it is vital to prepare reliable landslide susceptibility evaluation maps to avoid landslide-prone areas in construction projects. More studies on landslide susceptibility using machine learning have emerged in recent years, but there is a need to study ways to draw up high-precision evaluation maps. In this article, a 60 km oil pipeline in China’s Kunming was selected as the research area. The data of 141 landslide points in the research area were obtained through field work and data collection. Meanwhile, lithology, elevation, aspect, slope, stream power index, topographic wetness index, average annual rainfall from 2017 to 2021, distance to roads, and terrain roughness were selected as causal factors of landslide susceptibility in the research area. First, the information value method was used to quantify the impact of conditional factors on landslides. Genetic algorithm (GA), particle swarm optimization (PSO), and bat algorithm (BA) were then used for parameter tuning, and the support vector machine (SVM) was used to analyze landslide susceptibility in the research area. Finally, the receiver operating characteristic (ROC) curve was used to test the model performance after parameter tuning with GA, PSO, and BA. The results show that the area-under-the-curve (AUC) values obtained through SVM, GA-SVM, PSO-SVM, and BA-SVM are 81.1%, 86.2%, 89%, and 91.8%, respectively. SVM had the best performance after parameter tuning with the BA algorithm.

1. Introduction

The surface of the earth is the basis of human existence, but it is constantly changing. Human and natural factors affect changes in the earth’s surface [1]. These changes result in various geological disasters such as landslides and collapse of buildings, causing varying levels of damage to the surface of the earth and homes. Economic development and rise of urbanization have had a major impact on human life [1]. In particular, industrialization, extreme climate, and human factors have led to higher risks of geological disasters, while disaster prevention and mitigation are challenging. Due to its large size, complex terrain with plenty of plateaus, and mountains and hills, China has long been one of the countries that experience the most serious geological disasters [2]. Thus, there is an urgent need to draw up accurate landslide susceptibility maps in the planning, construction, operation, and maintenance of various infrastructure projects.

Landslide susceptibility evaluation predicts geological hazards in a specific area and determines the probability of landslides. Studies on landslide susceptibility can be classified into two categories: qualitative and quantitative [3]. Qualitative research can be defined as a simple method that involves expert ratings. It is an on-site evaluation based on subjective judgment and experience of experts. Quantitative research is more rigorous and objective than that of qualitative methods. Quantitative methods can also be categorized into statistical methods, deterministic approaches (geotechnical engineering methods), and machine learning methods [3]. In statistical methods, the principles and functions of statistics are used to analyze the probability of an incident, such as landslides. Statistical methods include logistic regression (LR), certainty factor (CF), and analytic hierarchy process (AHP). Many scholars currently use statistical methods. For example, Rai et al. used logistic regression to study landslide susceptibility in Nepal’s Dailekh district [4], while Xu et al. used certainty factor to examine landslide susceptibility after the Wenchuan earthquake in China [5]. Moreover, Agrawal and Dixit used AHP and fuzzy AHP methods to study landslide susceptibility in India’s Meghalaya [6], and Zhang et al. used the mixed model of logistic regression and index of entropy (IOE) to evaluate landslide susceptibility [7]. The deterministic approaches have been widely used in landslide susceptibility evaluation to analyze the stability of slopes with geotechnical parameters. However, this method can only be used in a relatively small area because the difficulty and cost of deriving geotechnical parameters will increase once it is used in a large area [8].

With the development of computer science and interdisciplinary fields, smart computing technologies such as machine learning and data mining have been widely used in the classification and regression of big data. Their algorithms have also been gradually used in landslide susceptibility evaluation. Compared with data- and knowledge-driven methods, machine learning has increasingly been recognized as a more accurate prediction method and does not heavily depend on data quality. Machine learning methods mainly include artificial neural network (ANN), support vector machine (SVM), random forest (RF), adaptive neuro-fuzzy inference system (ANFIS), and extreme gradient boosting (XGBoost) [913]. Jennifer and Saravanan et al.used the artificial neural network model in landslide susceptibility mapping and concluded that ANN can objectively determine the significance of conditional factors without any assumptions or biases [14]. Meanwhile, Alqadhia et al. used particle swarm optimization (PSO), ANN, and other methods to study landslide susceptibility [15]. Husam et al. used Ohe-X transformation to significantly improve the performance of ANN in landslide susceptibility evaluation [16]. On the contrary, Kalantar et al. believed that SVM provided good classification accuracy in landslide susceptibility evaluation [17], while Saha et al. asserted that SVM can solve regression analysis and classification issues and reduce the error rate [18]. Lucchese et al. concluded that random forest had higher accuracy than artificial neural network in the landslide susceptibility evaluation of the Itajaí-Açu river valley [19]. Many swarm intelligence algorithms have been used in a hyperparameter optimization of SVM. For example, Al-Shabeeb et al. used GA for hyperparameter tuning of SVM and landslide sensitivity evaluation [20]. Meanwhile, Chen et al. used the coupling algorithm of ant colony optimization and particle swarm optimization for tuning of SVM to conduct landslide sensitivity evaluation of the Anninghe Fault Zone [21].

In a nutshell, various algorithms have been widely used in the landslide susceptibility evaluation, and the evaluation accuracy is dependent on the model used, as well as data accuracy and effectiveness. There is currently a need to further improve the prediction of the landslide susceptibility evaluation model. Therefore, an algorithm that can improve the landslide sensitivity evaluation model is required. In previous studies, algorithms such as particle swarm optimization (PSO) and genetic algorithm (GA) were used for the tuning of machine learning models in most landslide sensitivity evaluations. On the contrary, it is still rare to use the bat algorithm in the landslide sensitivity evaluation. Thus, a 60 km oil pipeline in China’s Kunming was selected as the research area in this article. The conditional factors of landslides were extracted from the compiled landslide data. The particle swarm optimization (PSO), genetic algorithm (GA), and the novel bat algorithm (BA) were used for tuning of the SVM model. The landslide susceptibility in the research area was evaluated, and the evaluation results before and after parameter tuning with GA, PSO, and BA algorithms were analyzed and compared to improve landslide susceptibility evaluation in the planning, construction, operation, and maintenance of various infrastructure projects.

2. Study Area

The study area is located in China’s Kunming. Its geographic coordinates are between 102°29′58.90″–102°58′E and 25°15′37.13″–24°51′33.53″N. It spans an area of about 1,447 km2, as shown in Figure 1. With an average altitude of 1,891 m and annual average temperature of 12–22°C, the study area has a subtropical monsoon climate. The average annual rainfall ranges from 900 mm to 1,200 mm. The rainy season takes place from June to September, accounting for 85% of the annual rainfall and landslides develop most frequently during this period. The oil pipeline route selected in this article encompasses five districts of Kunming, from Changpo Oil Transportation Station to Yang Tian Chong Oil Transportation Station. During the data collection and field work stage, we found that this section of the pipeline was prone to hazards with complex geological conditions, which were common compared with geological hazards in other sections of the pipeline. This area was also more suitable in examining landslide susceptibility.

3. Research Methods

3.1. Data Preparation

Landslide inventory mapping is the first step of landslide susceptibility analysis, including the collection of all available information and data of landslides in the area. The accuracy of landslide data has a major impact on the validity of research methods. This article obtained a landslide inventory map of the article area after using historical geological disaster data, satellite images, and field survey data. A total of 141 landslide points were identified in the study area. Moreover, 141 nonlandslide point samples that were 500 m away from landslide points were randomly selected, and the data of the existing landslide points and nonlandslide points were combined into the dataset. 70% of the data will be used for machine learning model training and the remaining 30% will be used to test the accuracy and generalization ability of the machine learning model. The distribution of landslide points and nonlandslide points is shown in Figure 2.

3.2. Identification and Classification of Conditional Factors

While taking into account previous studies and the research area’s conditions, this article selected 9 conditional factors of landslides, including lithology, elevation, aspect, slope, stream power index (SPI), topographic wetness index (TWI), annual average rainfall (2017–2021), distance to roads, and terrain roughness. Elevation, aspect, slope, SPI, TWI, and terrain roughness in the research area were processed through a digital elevation model (DEM). The distribution of the hydrologic system usually has a significant impact on landslides, but this article did not consider the “distance to the river” factor because the research area is located in a plateau and mountainous area with low river density. There is only one river that passes through the research area, and the river almost had no impact on the density of existing landslide hazards in the research area. Thus, “distance to the river” was excluded as a factor. In the susceptibility evaluation of geological hazards, conditional factors represent the environmental information of a specific location. Hence, it is necessary to check the collinearity of factors in the landslide sensitivity modeling process to avoid poor prediction performance [22]. This article used the Pearson correlation coefficient to calculate the conditional factors and obtain the Pearson correlation of conditional factors (Table 1). Table 1 shows that the absolute value of the correlation between the two selected factors is less than 0.8. This means the factors are not highly correlated and are suitable in this article. Natural breaks’ classification was used to partition the intervals of conditional factors in this article [23].(1)Lithology is one of the most common factors in the article of landslide susceptibility. Lithology usually has a major impact on geological hazards in a specific area [24]. The geological map of research was acquired after vectorization of the existing geological map on Bigemap GIS office. Table 2 shows the lithology characteristics, and Figure 3 is a lithology distribution map of the study area.(2)Elevation is a key factor affecting the local climate, vegetation, and potential energy [2527]. Rising elevation will form terrains with relatively large height differences, which will increase the potential energy of landslides. There are different temperatures, types of vegetation, vegetation coverage, intensity of human activities, and degrees of rock weathering in different elevation zones. The elevation distribution of the study area is as shown in Figure 4. It can be partitioned into five intervals:1,785.96 m–1,957.21 m, 1,957.21 m–2,062.85 m, 2,062.85 m–2,172.72 m, 2,172.72 m–2,292.57 m, and 2,292.57 m–2,565.19 m.(3)Aspect is also an important indicator in landslide susceptibility studies. The direction of the slope will not directly affect the stability of the slope. However, it is reflected in solar radiation, vegetation coverage, and evaporation controlled by aspect, thus indirectly affecting slope stability [20, 28]. The aspect distribution in the study area is shown in Figure 5. It can be classified into nine intervals, including flat ground, north (337.5°–360°, 0°–22.5°), northeast (22.5°–67.5°), east (67.5°–112.5°), southeast (112.5°–157.5°), south (157.5°–202.5°), southwest (202.5°–247.5°), west (247.5°–292.5°), and northwest (292.5°–337.5°).(4)Slope refers to the angle between the tangent plane passing through any point on the ground and horizontal plane. It can directly affect the stability of the slope. The slope controls not only the water content of soil and pressure level of pores [20] but also the stress distribution in the slope and affects surface runoff and loose deposits on the slope [29]. The slope of the study area is 0–65°. As shown in Figure 6, it can be partitioned into five intervals:0°–4.33°, 4.33°–9.67°, 9.67°–15.53°, 15.53°–23.42°, and 23.42°–64.92°.(5)Stream power index (SPI) is another landslide susceptibility factor that should be studied. It is a measure of the erosive power of flowing water at a given point of the topographic surface [30]. When the catchment area and slope increases, the amount of water flowing through the slope and the flow rate of water will also increase, and the flowing water will erode the surrounding slopes in the flow direction. The stability of the slope will deteriorate and the slope will be more susceptible to landslides due to water erosion at the slope toe [31, 32]. The SPI map was generated by processing DEM in ArcGIS. The equation to calculate SPI is SPI = ln[SCA  Tan(slope)], where SCA represents the specific catchment area. The SPI distribution map area is shown in Figure 7. It can be partitioned into five intervals, including −8.46–1.87, −1.87–1.94, 1.94–3.85, 3.85–6.91, and 6.91–15.88.(6)Topographic wetness index (TWI) is a conditional factor that determines topographic changes in a watershed and its impact on soil runoff. It can be used to identify rainfall and runoff patterns, potential areas with an increase in soil moisture, and stagnant water [33]. TWI quantifies topographic controls on a hydrological process. Its equation is written as TWI = ln[SCA/Tan(slope)], where SCA represents the specific catchment area. Figure 8 shows the TWI distribution map in the study area. It can be partitioned into five intervals:2.94–6.11, 6.11–7.99, 7.99–10.87, 10.87–15.63, and 15.63–28.12.(7)Rainfall is a major causal factor of landslides. The intensity of rainfall affects the susceptibility of landslides. The study area is located in the mountainous area of southwest China, where the landform is complex with heavy rainfall in the rainy season that can trigger landslides. It is difficult to get data on hourly, daily, and total rainfall of a specific location to evaluate landslide susceptibility. On the contrary, it is easier to obtain accurate data of average rainfall. This article compiled the rainfall data of seven meteorological stations in Kunming from 2017 to 2021 and imported the annual average rainfall data into ArcGIS 10.2 to draw up the annual average rainfall distribution map of the study area through Kriging interpolation. As shown in Figure 9, the annual average rainfall of the study area ranges from 918.78 mm to 977.07 mm and can be partitioned into four intervals: 918.78 mm–934.78 mm, 934.78 mm–946.90 mm, 946.90 mm–960.39 mm, and 960.39 mm–977.07 mm.(8)Terrain roughness characterizes the unevenness of the ground surface, which can reflect the fluctuation and erosion degree of the surface and affect slope stability [3436]. Areas with a high degree of terrain roughness need to accumulate greater stress to keep the slope stable [36]. The equation to calculate terrain roughness is T = 1/Cos(slope  3.14/180). The terrain roughness in the study area is shown in Figure 10. It can be partitioned into five intervals, including 1–1.016, 1.016–1.058, 1.058–1.144, 1.144–1.436, and 1.436–2.356.(9)In terms of distance to roads, slope excavation is a key factor affecting the stability of the slope [37]. Landslides are more likely to occur when there is a high road density, more human activities, and the study area is close to roads. In this article, the road vector data downloaded in the Bigemap GIS office and Euclidean distance in ArGIS 10.2 were used to calculate the distance from the study area to roads. Figure 11 shows the distance distribution from the study area to roads. It can be partitioned into five intervals:0–896.96 m, 896.86–2,167.65 m, 2,167.65–3699.95 m, 3,699.95–5,680.74 m, and 5,680.74–9,530.18 m.

3.3. Data Processing Based on the Information Value Method

Based on the previous studies and the study area’s conditions, a 30 m × 30 m grid was selected as the evaluation unit for landslide susceptibility assessment, and 1,642,020 grids were partitioned in the study area. The processed conditional factors were superimposed, and the information on each grid formed a conditional factor matrix. Each unit in the study area was then simulated and predicted through the model trained with landslide data.

This article used the information value method to quantify the impact of each conditional factor on landslides. The information value method (Info Val) is an indirect statistical method that can objectively evaluate landslide susceptibility. This method enables the quantification of susceptibility predictions through ratings, even on terrain units that have not been affected by landslides. Similar to other binary statistical techniques used in landslide susceptibility mapping, each variable will be superimposed with the landslide’s location to determine the significance of each variable [38]. The greater the information value of the conditional factor, the higher the probability of landslides in this interval of the study area and vice versa. The algorithm formula of information value is ln (ratio of landslides to total number/ratio of the interval to the total area). The information value of each conditional factor in the study area is shown in Table 3.

3.4. Landslide Prediction Model Based on SVM
3.4.1. Support Vector Machine

Support vector machine (SVM) is a supervised learning model derived from a statistical learning theory. It maximizes the width of the gap between two classes, which can be formalized to solve a convex quadratic programming problem. Former Soviet scholar Vapnik first developed SVM through his Vapnik–Chervonenkis dimension theory [39]. The theory was said to improve the generalization ability of the model based on the principle of structural risk minimization. The algorithm can avoid model overfitting to a certain extent. The SVM is also simpler and more practical than a neural network as its decision function is only determined through a small number of support vectors. Moreover, the computational complexity depends on the number of support vectors, rather than the dimensions of the sample space to prevent the “curse of dimensionality.” This method is not only simple but also more robust since only a small number of support vectors are needed to determine the final results, which capture key samples and remove a large number of redundant samples [39, 40].

Suppose that the landslide sample data set X has a set of samples Xi, where i = 1, 2, ..., n. Xi is the input vector of landslide evaluation factors. The binary variable yi ∈ {−1, 1} is its learning goal, i.e., the two output values of landslide (1) and nonlandslide (−1) corresponding to Xi, in the landslide susceptibility evaluation, while n is the number of landslide susceptibility evaluation factors in the sample X. The classification goal of SVM is to find an optimal hyperplane that partitions the sample dataset into two output classes. If a hyperplane exists as the decision boundary in the feature space, where the input data are located to separate the learning objectives into landslide (1) and nonlandslide (−1) and the distance from the sample point to the plane is greater than or equal to 1, it can be expressed as follows:Decision boundary: Distance from point to plane: ω is the normal vector of the hyperplane and b is the intercept. When ω and b reach the optimal value, it indicates that the optimal classification hyperplane that can maximize the distance between landslide and nonlandslide samples in the binary classification is determined.

Linear separability means that some training samples cannot meet the condition of yi(ωTXi+b)  1. Since all sample points should be considered in the initial expression of the optimization problem, the maximum geometric interval between positive and negative classes is determined on this basis. The geometric interval represents the distance, which is not negative. There are no solutions to this optimization problem in situations with noisy data. Thus, slack variables can be used to allow certain constraints to be violated, such as not being affected by distance of points to the plane. The slack parameter ε is added to the specific constraints. The expression is then written as follows:

All the training points will meet the above conditions when ε gradually becomes larger to a certain extent. However, a bigger ε does not necessarily mean it is better; thus, penalty parameter c needs to be added to the objective function to obtain the following equation:

In the above equation, the penalty parameter c represents the error tolerance. The larger c is, the lower the error tolerance and the easier it is for overfitting to occur. On the contrary, underfitting may arise when c is too small. In other words, the selection of penalty parameter c is critical as either a too big or too small value will affect the model’s generalization ability.

Since landslide susceptibility evaluation is a nonlinear separable problem, a nonlinear function should be used to map the nonlinear separable problem from the original feature space to Hilbert space with a higher dimension and subsequently transform it into a linearly separable problem. The decision boundary is expressed as follows:where φ is the mapping function. The kernel method can be used as the mapping function is complex and it is difficult to calculate its inner product. This means that the inner product of the mapping function is defined as a kernel function κ(X1, X2) =φ(X1)Tφ(X2) to avoid the explicit calculation of the inner product. The kernel function allows the mapping of a low-dimensional space to high-dimensional space and applies linear classifiers to nonlinear problems in a higher-dimensional space [40, 41].

3.4.2. Selection of Kernel Function

The selection of the SVM kernel function is vital to its performance. Commonly used kernel functions of SVM include linear kernel, polynomial kernel, radial basis function (also known as Gaussian function), and Sigmoid kernel [41].

Linear kernel: κ(x1, x2) =<x1, x2>. It is mainly used for linearly separable data.

Polynomial kernel: κ(x1, x2) = (<x1, x2>+R)d. Polynomial kernel can map a low-dimensional input space to high-dimensional spaces, but it has many parameters. Higher order polynomials are more challenging to calculate.

Radial basis function: κ(x1, x2) = exp(−), where Gamma= −. Gamma (expressed as below) is a parameter in the radial basis function to perform high-dimensional mapping on low-dimensional samples. The larger the , the higher the mapping dimension, but a large is prone to overfitting. RBF is a localized kernel function that can map samples to higher-dimensional spaces and can be extended to infinite dimensions. RBF is also one of the most widely used kernel functions.

Sigmoid kernel: κ(x1, x2) = tanh(<x1, x2>+θ). Using Sigmoid kernel is the same as using SVM to implement a multilayer neural network.

Based on experience, linear kernel should be selected for large sample features or large number of samples. On the contrary, RBF kernel function should be used for small number of features and normal number of samples. This article studied a nonlinear problem, with a normal number of samples and small number of features. RBF should be used after taking into account all factors.

3.4.3. Optimization of Support Vector Machine Algorithms

In the conventional SVM algorithm, the most important parameters are penalty parameter c and in RBF [42]. However, these parameters control the complexity and accuracy of the model [42, 43]. Thus, caution must be exercised for the parameter tuning of SVM or accurate results cannot be obtained. This article used the genetic algorithm (GA), particle swarm optimization (PSO), and bat algorithm (BA) for parameter tuning of SVM, and their improvements on SVM were compared.

Proposed by John Holland in the 1970s, the genetic algorithm (GA) is a random search algorithm based on the laws of biological evolution. Through mathematical methods and computer simulation calculations, this algorithm converts the problem-solving process into a process similar to the crossover and mutation of chromosome and genes in biological evolution.

The calculation process of GA is as follows:Step 1: the value of each gene of the chromosome in the population with the initial size of N was generated using a random number generator and satisfied the range defined by the problem. The current evolution algebra (Generation) is equal to 0.Step 2: an evaluation function was used to evaluate all chromosomes in the population, the adaptive value of each chromosome was calculated separately, and the chromosome with the biggest fitness value (Best) was saved.Step 3: roulette wheel selection was used on the chromosomes of the population, and a population with the size of N was generated.Step 4: chromosomes were selected from the population for mating based on probability (P). For every two mating parent chromosomes that exchange some of their genes, two new offspring chromosomes were produced. The offspring chromosomes replace the parent chromosomes and were part of the new population. Chromosomes that did not mate were replicated as part of the new population.Step 5: gene deformity mutation of chromosomes in the new population was performed based on probability P. The values of mutated genes were changed, with the mutated chromosomes replacing the original chromosomes as a part of the new population. Chromosomes that did not mutate directly enter the new population.Step 6: the new population after mutation replaced the original population, while the adaptive value of each chromosome in the population was recalculated. If the maximum adaptive value of the population is greater than the adaptive value of Best, the chromosome corresponding to the maximum adaptive value replaces Best.Step 7: the current evolution generation is added by one. If Generation exceeds the specified maximum evolution generation or Best meets the specified error requirement, the algorithm ends. Otherwise, return to Step 3.

The particle swarm optimization (PSO) algorithm is a swarm intelligence algorithm proposed by Kennedy and Eberhart [44]. It was developed by modeling and simulating the foraging behavior of birds. It searches for an optimal solution through cooperation and information sharing among individuals in the population. PSO is initialized as a swarm of random particles (random solution). An optimal solution is then found based on iterations. The particles update themselves by tracking two extreme values in each iteration. One is the optimal solution found by the particle itself, which is known as local extremum. The other is the optimal solution found in the entire population, which is known as global extremum.

We assume that N particles form a colony in a D-dimensional space, where the ith particle is represented as a D-dimensional vector:

The speed of the ith particle is expressed as follows:

The optimal solution pbest of each individual and current optimal solution of the population are saved. The ith particle updates its speed and position based on the following equation:where pid is the known optimal solution of individual, pgd is the known optimal solution of the population, is the inertia weight, c1 and c2 are the learning factors, and r1 and r2 are random numbers within [0, 1].

Developed by Xin-She Yang, the bat algorithm (BA) is a metaheuristic search algorithm based on swarm intelligence, which is an effective search method for global optimization [45]. This algorithm is an optimization technique that is initialized as a set of random solutions before searching the optimal solution through iteration and generating new local solutions through random walk to enhance local search. The BA algorithm is similar to the particle swarm optimization (PSO) algorithm, but the former has two more parameters: frequency and loudness than the latter. BA outperforms other algorithms in terms of accuracy and effectiveness and the tuning of many parameters is not needed.

Bats fly randomly at position xi at speed while searching for a target, object, or prey, with a static frequency fmin, varying wavelength λ, and loudness A0. The frequency varies from fmin to fmax and the loudness of the sound ranges between A0 and Amin. Yang developed a series of rules to update the speed, position, and loudness of bats while searching for prey. The equation of the bat algorithm is expressed as follows [45]:where β is a uniformly distributed random number in [0, 1], represents the global optimal solution in the current population, r is the pulse rate, and α and γ are constants, 0 < α < 1 and γ > 0.

Once the global optimal solution is selected, each local solution (xold) in the current population updates its position using the following equation:where ε is any number between [−1, 1].

4. Results and Discussion

This article developed the SVM model using Professor Chih-Jen Lin of Taiwan University’s LIBSVM software package and MATLAB programming language [46]. A conventional SVM model without optimization and the SVM model after parameter tuning with GA, PSO, and BA algorithms were used to generate a landslide susceptibility map and compare the optimization results of GA, PSO, and BA algorithms. First, the preprocessed data were input into the model for training. The trained data were then used to predict the research area with the SVM, GA-SVM, PSO-SVM, and BA-SVM models. The radial basis kernel function was selected. C and obtained [1.18, 2.56], [2.31, 1.15], and [0.35, 0.3] after optimization with GA, PSO, and BA algorithms, respectively. The output results were normalized to the interval of [0, 1] and imported into ArcGIS 10.2 for visualization, and a landslide susceptibility map was generated. Natural breaks classification was used to partition the data into five intervals: very low, low, moderate, high, and very high. Figure 12 shows the landslide susceptibility maps of the conventional SVM model. Figures 1315 show the landslide susceptibility maps of GA-SVM, PSO-SVM, and BA-SVM optimization models. The closer the value is to 0, the lower the probability of landslides is. In contrast, the closer the value is to 1, the higher the probability of landslides is.

The results showed that very low, low, moderate, high, and very high susceptibility areas account for 30.8%, 22.4%, 21.2%, 11.7%, and 11.1% in the landslide susceptibility evaluation map of the research area using the conventional SVM model. Very low, low, moderate, high, and very high susceptibility areas accounted for 32.2%, 19.3%, 14.3%, 20.4%, and 13.8% in the landslide susceptibility evaluation map of the research area using the GA-SVM model. Moreover, 41% of the landslide points were located in areas with a very high susceptibility. In contrast, very low, low, moderate, high, and very high susceptibility areas accounted for 37.5%, 17.5%, 15.5%, 14.7%, and 14.8% in the landslide susceptibility evaluation map of the research area using the PSO-SVM model. In addition, 44% of the landslide points were located in areas with a very high susceptibility. In the landslide susceptibility evaluation map using the BA-SVM model, areas with very low, low, moderate, high, and very high susceptibility made up 22.2%, 23.4%, 15.8%, 25.7%, and 13.9% of the research area, respectively. Fifty-one percent of the landslide points were located in areas with a very high susceptibility. The prediction results of the four models were consistent with the conditions of landslides. Compared with the conventional SVM model, the BA-SVM model increased the very high susceptibility area by 2.8% and the landslide points of the area rose about 20%, which improved the prediction accuracy of the area with a very high susceptibility and was of great significance to disaster prevention and mitigation.

To analyze and compare the optimization of the GA-SVM, PSO-SVM, and BA-SVM models, this article tested the performance of the landslide susceptibility model with the receiver-operating-characteristic (ROC) curve. The ROC curve was initially used to test radar performance, but it is now widely used to test the accuracy of machine learning models. The ROC curve is drawn based on a series of different binary classifiers with a true positive rate on the ordinate and false positive rate on the abscissa. The area under the curve (AUC) is calculated to evaluate the prediction ability and accuracy of the selected model [4749]. The larger the AUC value, the better the effect of the classifier. An AUC greater than 0.65 usually indicates that the classifier is good. This article used the SPSS statistical software to generate the ROC curves of the conventional SVM, GA-SVM, PSO-SVM, and BA-SVM models (Figures 1619). As shown in the figure, the AUC of the conventional SVM model is 81.1% and the AUC of the GA-SVM model is 86.2%. Meanwhile, the AUC of the PSO-SVM model is 89%, while the AUC of the BA-SVM optimization models is 91.8%, respectively. This suggests that the performance of all four evaluation models is relatively good, and the model prediction performance improved by 5% and 8% after hyperparameter tuning with GA and PSO, respectively. Furthermore, the BA-SVM optimization model’s prediction improved significantly and its accuracy grew nearly 11% after parameter tuning with the BA algorithm.

5. Conclusion

A 60 km oil pipeline in Yunnan Province, China, was used as the research area in this article. Through field work and collection of geological environment data in the research area, a conventional SVM model and after parameter tuning with GA, PSO, and BA algorithms were used to evaluate the landslide susceptibility. The accuracy of the SVM evaluation model before and after parameter tuning with GA, PSO, and BA algorithms was compared and analyzed. The following conclusions can be drawn:(1)The performance of the SVM model significantly improved after parameter tuning with GA, PSO, and BA. The AUCs of the landslide susceptibility evaluation in the research area with the conventional SVM model and GA-SVM, PSO-SVM, and BA-SVM evaluation models were 81.1%, 86.2%, 89%, and 91.8%, respectively. In addition, optimization with GA, PSO, and BA algorithms saw the AUCs of the SVM model increasing by 5%, 8%, and 11%, respectively, indicating significant improvements with the BA algorithm.(2)The prediction results of the four models were consistent with the conditions of landslides. Compared with the conventional SVM model, the BA-SVM model increased the very high susceptibility area by 2.8%, and the landslide points of the area rose about 20%, which improved the prediction accuracy of the area with a very high susceptibility and was of great significance to disaster prevention and mitigation.

Data Availability

All data included in this study are available upon request by contact with the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest, financial, or otherwise.

Acknowledgments

The authors acknowledged the Key Research and Development Program of Yunnan Province in 2022 (202203AC100003) and the National Natural Science Foundation of China (Grant no. 42267020).