Abstract

This study aimed to map the landslide susceptibility in the Chemoga watershed, Ethiopia, using Geographic Information System (GIS) and bivariate statistical models. Based on Google earth imagery and field survey, about 169 landslide locations were identified and classified randomly into training datasets (70%) and test datasets (30%). Eleven landslides conditioning factors, including slope, elevation, aspect, curvature, topographic wetness index, normalized difference vegetation index, road, river, land use, rainfall, and lithology were integrated with training landslides to determine the weights of each factor and factor classes using both frequency ratio (FR) and information value (IV) models. The final landslide susceptibility map was classified into five classes: very low, low, moderate, high, and very high. The results of area under the curve (AUC) accuracy models showed that the success rates of the FR and IV models were 87.00% and 90.10%, while the prediction rates were 88.00% and 92.30%, respectively. This type of study will be very useful to the local government for future planning and decision on landslide mitigation plans.

1. Introduction

Landslide is a major natural hazard that poses a significant threat to human lives and infrastructure [1, 2]. Natural hazards such as landslides, flood, earthquake, and drought risk cannot be avoided completely but the processes and consequences can be mitigated [3, 4]. The Chemoga watershed, located in the northern part of Ethiopia, is prone to landslide hazards due to its steep slopes, rugged topography, and intense rainfall. The increasing population pressure and the rapid expansion of infrastructure have also contributed to the occurrence of landslides in the area [5, 6].

In Ethiopia landslides mostly manifest as rock fall, earth slide, debris and mudflow especially in the steep and hilly areas of the highlands greater than 1,500 m altitude [7, 8]. According to Meten et al. [9], from 1960 to 2010 about 388 people are reported dead, 24 injured, and a great deal of agricultural lands, houses, and infrastructures were affected. The occurrence of landslides is an extremely complex phenomenon which depends upon various factors such as geologic structure, lithological association, topography, rainfall, earthquake, and human activity [10]. One of the most widely used approaches to reduce the landslide damages is preparing a landslide susceptibility mapping using suitable models and selecting the effective conditioning factors [11, 12]. Over the last decades, many studies utilized used different models to prepare landslide susceptibility mapping. These models include the frequency ratio (FR) model [2, 4, 1318]. Frequency and Shannon entropy models [1924], weights of evidence model [12, 2529], and Shannon entropy model [11, 3033]. Landslide susceptibility models based on the bivariate frequency and weights of evidence models [34] and FR and information value (IV) models [1, 10, 35], machine learning models [36, 37], and deep learning models [38, 39] have been developed. With the development of Geographic Information System (GIS), other researchers have used bivariate FR and multivariate logistic regression models [4044] to help in the calculation and visualization of the cumulative effects of conditioning factors on landslides.

In this study, we aimed to develop a landslide susceptibility map using GIS and bivariate statistical models in the Chemoga watershed, Ethiopia. We collected landslide inventory data through field surveys and prepared various thematic layers such as slope, elevation, aspect, curvature, topographic wetness index (TWI), normalized difference vegetation index, road, river, land use, rainfall, and lithology from the digital elevation model (DEM) and satellite imagery. Two bivariate statistical models, namely, FR and IV, were used to analyze the relationships between the landslide occurrences and the thematic layers. The accuracy of the models was evaluated using a validation dataset.

The results of this study can provide valuable information for land use planning and management in the Chemoga watershed. The development of a landslide susceptibility map can help in identifying areas that are prone to landslide hazards and prioritizing mitigation measures to reduce the risk of landslide disasters.

2. Materials and Methods

2.1. Description of the Study Area

The Chemoga watershed is located in the upper Abay River basin Ethiopia with an area 1,414.85 km2. According from UTM coordinate system (zone 37N), the location of watershed is approximately between longitudes 330,000–380,000 m E and latitude 1,110,000–1,170,000 m N and topographically, the altitude ranges from 863 to 3,946 m, shown in Figure 1. Topographically, the altitude ranges from 863 to 3,946 m and the slope angle varies from 0° to 67°. In terms of land use, most of the watershed is covered by scrub/shrub and crop lands. The study area receives high amount of rainfall during the summer season. Based on Ethiopian National Metrological Agency, the average recorded annual precipitation and temperature of the area was 1,376 mm and 16.95°C, respectively.

2.2. Data Source and Methodology

In this study, I used both primary and secondary data. The primary data were collected from field survey and observation and the secondary data were acquired from governmental and nongovernmental institutions, journals, internet, and other documents. The main data used for this study were sentinel-2 images and 30 m DEM of the area, Google earth imagery and topographical map of the area. The data layer of land use and NDVI were derived from Sentinel-2 images and DEM data used to create the slope, elevation, aspect, curvature, and TWI data layers and their extents through spatial analysis tools. The data of annual rainfall were obtained from the National Meteorological Agency of Ethiopia. The main road and river were digitized from the topographical map of Ethiopia and the geological map was used to create the lithology layer of the study area. All the data layers have been constructed and combined in ArcGIS 10.4 tool. Accordingly, the FR and IV models were used to generate elaborative landslides susceptibility map. The conditioning factors considered, their format and sources is presented in Table 1, while the methodological workflow is shown in Figure 2.

2.3. Landslide Inventory Map

Landslide inventory mapping is the systematic mapping of existing landslides in a region using various techniques such as field survey, aerial photographs or Google earth imagery interpretation, satellite image interpretation, and literature search technical and scientific reports, governmental reports, and the interview of experts [45, 46]. In this study, the landslides inventory map which has a total of 169 individual landslide locations was generated according to the integration of different data sources such as Google earth imagery digitized into points and field surveys, i.e., GPS points (period between 2016 and 2022). Landslide types in the study area include rockslide, soil slide, debris flow, earth flow, rock fall, and rock toppling. Though there is no specific rule for defining how landslide occurrence will be allocated into training and validation data sets [47], usually research work has been done by using 70% of landslides events as training data sets and the rest 30% for validation of the output model [11, 14, 48]. In this study, 118 (70%) of the landslides inventory data were used for model training and the remaining 51 (30%) of the landslides inventory data were used for validation.

2.4. Landslide Conditioning Factors

To identify landslide occurrence conditioning factors is a very complex phenomenon, because there is no standard rule to select which factor to be used [49]. In this study, 11 conditioning factors were selected based on the literatures, effectiveness, availability of data, and the relevance with respect to land slide occurrence [23]. These conditioning factors are slope, elevation, aspect, curvature, TWI, NDVI, road, river, land use, rainfall, and lithology. Each factor was converted to a raster format and was classified based on Jenks natural breaks method in ArcGIS, shown in Figure 3.

In landslide susceptibility studies, slope is considered one of the major contributing factor [21, 50]. According to the importance of slope contribution factor landslide occurrence, the slope data were classified into five classes. With increase in slope angle, the possibility of landslide occurrence increases [19, 51, 52]. Elevation is an important conditioning factor in landslide susceptibility mapping and it also impacts the environmental conditions on slopes such as human activity, vegetation, soil moisture, and climate [53, 54]. Curvature has an important role in the surface runoff and ground infiltration thus affects the erosion of the surface and ground water condition of the region [17]. The curvature map was classified into concave (negative), convex (positive), and flat (zero) surfaces. In the case of curvature, the more negative the value, the higher the probability of landslide occurrence [29]. Aspect represents the direction that a slope faces [53]. Slope aspect affects erosion, surface evaporation, desertification, solar heating and surface weathering, thus affecting the occurrence of landslides [50, 55]. TWI is among one of the important factors responsible for the landslide, which can quantitatively display the control of terrain on the spatial distribution of soil moisture, is a widely used terrain attribute. The TWI conditioning factor was obtained from DEM with 30 m spatial resolution by Equation (1) to express as follows:where As is the specific catchment area (m2/m) and β is slope angle in degrees [56]. TWI is used to measure topographic control of hydrological procedures [57]. Rainfall is considered to be one of the landslides occurrences conditioning factor. Rainfall map was prepared using five station locations in the study area through the IDW interpolation method of annual average precipitation (1990–2021). Road is one of the most effective factors on landslide occurrence [1]. Road construction near the hillside may lead to changes in the natural conditions of areas. River networks plays an important role in landslide occurrence factor closely to surface water. The NDVI conditioning factor was obtained from Sentinel-2 satellite imagery with 30 m spatial resolution by Equation (2) to express as follows:where IR is the infrared and R is the red bands of the electromagnetic spectrum. NDVI values between −1.0 and 1.0, where any negative values are mainly generated from clouds, water, and snow and values near zero are mainly generated from rock and bare soil and the positive value indicates that the ground is covered by vegetation. Land use is an important conditioning factor that affects the occurrence of landslides. The map of land use was derived from Sentinel-2 satellite imagery, by using a supervised classification technique and classified in to six classes. The study area is predominantly covered with the cropland and scrubs. The lithology also classified into four classes and the dominant lithology is tertiary extrusive and intrusive rocks in the study area.

2.5. Landslide Susceptibility Modeling
2.5.1. Frequency Ratio (FR) Model

FR is one of the most widely adopted and popular methods for landslide susceptibility assessment [14, 16, 58]. The FR is the ratio of the area where landslides occurred in the total study area and also is the ratio of the probabilities of a landslide occurrence to a non-landslides occurrence for a given attribute [59, 60]. Generally, a greater ratio indicates a stronger relationship between a conditioning factor and landslide and vice versa. FR value is greater than 1, it indicates a high probability of landslide occurrence, and a value less than 1 indicates a low relationship between probabilities of landslide occurrence. The landslides susceptibility map (LSM) can be calculated by summing the FR of all of the factors considered Equation (3) as follows:where LSM is landslide susceptibility map and FR represents for each factor type or class, n is the number of factors. The FR can be obtained by Equation (4) as follows:where the number of landslide pixels in class i of the factor X is represented by Npix (SXi); the total number of pixels within factor Xj is represented by Npix(Xj); m is the number of classes in factor Xi; and n is the total number of factors in the study area [60].

2.5.2. Information Value (IV) Model

The IV model is a bivariate statistical approach that objectively assesses landslide susceptibility using information theory, providing an advantage in accurately identifying areas at risk of landslides and the model was originally proposed by [61] and later slightly modified by [46]. The information value model is used to evaluate the spatial relationship between the conditioning factor classes and the probability of landslide occurrence. Generally, the higher value of IV model corresponds to the stronger relationship between the probability of landslide occurrence and the conditioning factor class. IV value is greater than 0 indicates a high probability of landslide occurrence, and a value less than 0 indicates a low relationship between the probabilities of landslide occurrence. Therefore, the LSM for each pixel was computed by summing the information values of each factor class as follows:where LSM is the landslide susceptibility map and IVi is the information value each factor class, n is the number of factors. IV was applied, and the weights were assigned to each class of each conditioning factor. The information value (IV) can be calculated using the following formula [61]:where Nslpix is a number of landslide pixels in a given class, Ncpix is the number of pixels in a given class, Ntspix is a total number of landslide pixels in the study area, and Ntapix is a total number of pixels in the study area.

3. Results and Discussion

3.1. Application of Frequency Ratio (FR) Model

FR was measured for each class of every landslide conditioning factor by dividing the landslide occurrence ratio by the area ratio. The results of the FR model for each of the classes of effective factors are shown in Table 2. In general, the FR value of 1 indicates the average correlation between landslide occurrence and effective factors. A FR value greater than 1 indicates a high likelihood of landslide occurrence, while a FR value less than 1 indicates a low likelihood of landslide occurrence [47]. The analysis of FR for the relationship between landslide occurrence and slope degree indicate that class 33°–67°, the highest FR value of 9.27 among the other classes of slope degree. The remaining classes of slope have low probabilities of landslide occurrence. In the study area, it was observed that the probability of landslide occurrence increased with slope gradient up to a certain extent, and then decreased, consistent with results from other literature studies [20]. This is because higher slope values increase the effects of gravity and shear stress [46]. The relationship between landslide occurrence and elevation indicated that the range between 1,509 and 2,042 m, with a FR value of 2.78, had a high probability of landslide occurrence in the study area. The elevation ranges between 863–1,509, 2,042–2,513, 2,513–3,059, and 3,059–3,946 m, have lower FR values (0.28, 0.99, 0.41, and 0.53, respectively), indicating low probabilities of landslide occurrence. Commonly, as the elevation increases, the probability of landslide occurrence increases. The aspect factor classes with the highest abundance of landslide occurrence probability were east facing (FR = 1.03), south east facing (FR = 1.60), south facing (FR = 1.24), south west facing (FR = 1.42), west facing (FR = 1.24), and northwest facing (FR = 1.04), indicating a high probability of landslide occurrence in these areas. However, the remaining aspect classes have lower abundance of FR value less than 1, it indicates that a low probabilities of landslide occurrence. Considering the land use, results show that the water body, forest area, grass and scrub/shrub and bare land use types have values of FR (2.07, 1.50, 1.54, and 22.92, respectively), implying a high probabilities of landslide occurrence. The highest FR value of bare land are due to its exposure to erosion and soil moisture [41]. In the case of curvature factor classes of concave (−16.55 to (−0.98)) and convex (0.75–20.22), have the highest value of FR (1.39 and 1.38), respectively, indicating a high probabilities of landslide occurrence. The other curvature class of flat slope has a low FR value (0.70), indicating that a low probabilities of landslide occurrence. Distance from the road classes 6,985–11,577 m with a value of FR (2.05), has the greatest impact on landslide coherence. Commonly, the landslide frequency increases as the distance from roads decreases. Therefore, the existing road and the on-going constructions disturb the stability of slope there by increasing the probability of landslide occurrence [19, 20]. According to Guzzetti [62], the landslides probability decreases with the increasing distance from river networks. In this study area, distance from river network between 2,560–4,133 m exerts the highest influence on landslide occurrence. The reason is that permanent rivers are the main source of moisture for landslide occurrence. In the NDVI, the FR value is greater than one, where the NDVI classes −0.04 to 0.10 and 0.23–0.48, indicating a high probabilities of landslides occurrence. This range of NDVI values represents the bare land, built up areas and scrubs. However, the remaining NDVI classes have low FR value less 1. The relationship between TWI landslide probabilities showed that the range of TWI value from 2.72 to 5.30 has the highest FR (1.98). With regard to the rainfall, the range 1,484–1,519, 1,519–1,539, and 1,539–1,563 mm/yr have higher FR value than the other classes contributing more to landslide occurrence. Lithology factor classes are the most abundance on Precambrian (FR = 1.89) and Triassic and permain (FR = 2.29), indicating that a high probabilities of landslides occurrence. However, the remaining lithology classes have the lowest abundance of FR value less than 1, it indicates that a low probabilities of landslides occurrence.

3.2. Application of Information Value Model

The information value of each conditioning factor was calculated through Equation (5), and the spatial relationship between each conditioning factors and flood occurrence is shown in (Table 2). If the factor class of IV value is negative, there is a low likelihood of landslide occurrence. On the other hand, if the value is positive, there is a high-probability value is landslide occurrence [46]. The slope indicate that 33°–67° is highly prone to landslide having the highest IV value of 0.967, whereas the flat slope shows less probability. The occurrence of landslides tends to increase with higher slopes and decrease with lower slopes. The elevation factor indicate that the class 1,509–2,042 m (IV = 445), has a high probabilities of landslide occurrence and all other classes have very low impact. Generally, landslides mostly occurred on the higher area. But in this study, the landslides occurred in the lower area. The aspect conditioning factor classes have the lowest abundance on flat facing (IV = −0.534), north (IV = −0.434), and northeast (IV = −0.166) indicating a low probabilities of landslide occurrence. The remaining categories with positive IV values indicate a high probability of landslide occurrence. In terms of curvature, the flat class has the lowest IV value (−0.156) indicating a low probability of landslide occurrence, while the convex and concave classes have higher IV values (0.142 and 0.139, respectively), indicating a high probability of landslide occurrence. Distance from the road factor also shows that the class between 6,985–11,577 m has the highest IV value (0.312), indicating a high probability of landslide occurrence. The distance to river factor has a high IV value (0.178) for subclass 2,560–4,133 m, while the remaining subclasses have low IV values indicating a low probability of landslide occurrence. NDVI classes −0.04 to 0.10 and 0.23–0.48 have positive IV values indicating a high probability of landslide occurrence, while the remaining NDVI classes have negative IV values indicating a low probability of landslide occurrence. TWI classes 2.72–5.30 also have a positive IV value (0.297), indicating a higher landslide occurrence. In terms of land use, settlements and crops can reduce the likelihood of landslide occurrence, while forest area, water body, grasses, and bare land have a high impact on landslide occurrence. The relationship between average annual rainfall and landslide occurrence shows that classes with higher rainfall (1,484–1,519, 1,519–1,539, and 1,539–1,563 mm/yr) have positive IV values, indicating a high probability of landslide occurrence, while the other classes have negative IV values indicating a low probability of landslide occurrence. The other important conditioning factor is lithology in this study. Lithology factor classes are the most abundance on Precambrian and Triassic and per main (IV = 0.278) and Triassic and permain (IV = 0.360), indicating a high probabilities of landslide occurrence. However, the remaining lithology classes have negative IV value which indicates a low probabilities of landslide occurrence.

3.3. Landslide Susceptibility Maps

The calculated FR values for each pixel indicate the relative susceptibility to landslide occurrence. The higher pixel values of LSM have the higher landslide susceptibility while the lower pixel values have lower susceptibility. The LSM values for the FR and IVs models in the study area ranges varies from 14 to 81 (Figure 4(a)) and −5.59 to 3.91 (Figure 4(b)), respectively. These values were classified into five susceptibility classes of very low, low, moderate, high, and very high susceptibility in both models using the geometrical interval method for visual interpretation, shown in (Table 3).

3.4. Validation of Landslide Susceptibility Maps

The FR and IV models were validated to check their reliability and performance. In the present study, the performance of the LSM produced by FR and IV models was evaluated using area under the curve (AUC). The AUC is the measure that indicates the accuracy of the landslide susceptibility maps by creating success and prediction rate curves [63]. The success rate curve represents the model fitness to the existing landslide. The prediction rate curve indicates the model efficiency to predict future landslide [47]. The AUC rate curves were drawn through the x-axis both the training and validation landslides (true positive rate) and y-axis (false positive rate). The total AUC value can be utilized as a qualitative measure to determine the accuracy of the susceptibility map, where a larger value indicates a higher level of accuracy achieved. The AUC value ranges from 0.5 to 1.0 are used to evaluate the accuracy of the model [63]. The qualitative relationship between AUC and prediction accuracy can be classified as follows; excellent (0.9–1.0); very good (0.8–0.9); good (0.7–0.8); average (0.6–0.7), and fair (0.5–0.6), [63]. If AUC value is close to 1.0, then the model will have ideal performance, where as a value is equal or less than 0.5, then the model will have poor performance [64]. The results indicated that the AUC values for the success rate curves were 0.870 and 0.901 for the FR and IV models, respectively, which can be interpreted as prediction accuracies of 87.00% and 90.10%, respectively (Figure 5(a)). The results indicated that the AUC values for the prediction rate curves were 0.880 and 0.923 for the FR and IV models, respectively, which can be interpreted as prediction accuracies of 88.00% and 92.30%, respectively (Figure 5(b)). The success rate and predictive rate value range between 0.8–0.9 indicate a very good performance of FR model. Also, the success rate and predictive rate value range between 0.9–1.0 implies excellent performance of the IV model.

4. Conclusion

The use of GIS and bivariate statistical models proved to be an effective approach in mapping landslide susceptibility in the Chemoga watershed, Ethiopia. The study identified several factors that influence landslide occurrences in the Chemoga watershed, such as slope, elevation, aspect, curvature, TWI, normalized difference vegetation index, road, river, land use, rainfall, and lithology. A landslide inventory map was prepared using Google earth imagery and field survey assessment. For this process, 169 landslide locations were identified and mapped. The susceptibility maps produced with the FR and IV models were divided into five susceptibility classes including very low, low, moderate, high, and very high susceptibility. The AUC rate curve quantitatively indicates the performance of the susceptibility maps. The results of this study showed that the IV model outperformed the FR model, with the accuracy of success rate 90.10% and 87.00% and the predicative rate 92.30% and 88.00%, respectively. Finally, this study confirmed that the integration of GIS and bivariate statistical models provides an effective approach in mapping landslide susceptibility in the Chemoga watershed, Ethiopia. The findings of this study can contribute to the development of a comprehensive disaster risk reduction strategy in the study area and other landslide-prone regions in Ethiopia.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares that there is no conflict of interest.

Acknowledgments

The author convey their thanks to staff of Civil Engineering Department, Debre Markos University, Ethiopia. I thanks also research square online available preprint according from the following link: https://www.researchsquare.com/article/rs-2319713/v1. This research was funded by the author.