Abstract
The permeability coefficient (k-value) of the soil is an important parameter used in the civil engineering design of roads, tunnels, dams, and other structures. However, the determination of k-value by experimental methods in the laboratory or the field is still costly and time-consuming. Moreover, it requires special equipment and special care in the collection of soil samples for laboratory study. Therefore, in this study, we have proposed machine learning (ML) hybrid model: teaching learning-based optimization of artificial neural network (TLBO-ANN) to predict the k-value of soil based on limited parameters (natural water content, void ratio, specific gravity, liquid limit, plastic limit, and clay content) which can be determined easily in the laboratory. Test results of 84 soil samples obtained from the Da Nang-Quang Ngai expressway project in Vietnam are used in the model development. Statistical indicators such as correlation coefficient (R), root mean square error (RMSE), and mean absolute error (MAE) are used to validate and evaluate the accuracy of the model. The results show that the TLBO-ANN model is an effective tool in predicting correctly the k-value (R = 0.905) of soil for the consideration in the design of structures founded on the soil.
1. Introduction
Permeability of soil is one of the important parameters in the design of most civil engineering structures such as roads, tunnels, and dams constructed on soil [1]. The permeability coefficient of soil (k-value) is a coefficient that evaluates the ability of liquids to flow through interconnected voids in soil from high to low to hydraulic gradients [2]. The k-value is used in many different theoretical and practical problems, for example, in modeling the underground water flow, consolidation settlement rate, and slope stability of groundmass [3]. The requirement of the desired k-value often changes depending on the type of soil and the service life of structures. For example, a higher k-value is required for filter layer and drainage construction, while a lower k-value is required in the case of roadbeds or dams. Many factors such as density, void (size and type), particle (size, distribution, and shape), and surface roughness of the soil are the major factors that govern the variety of k-value [1, 4].
The accurate determination of the k-value is not an easy task because of field conditions and laboratory test methods. The common feature of these experiments is that it is complex, time-consuming, and costly. Thus, the other way to predict the k-value is based on the empirical formulas. The formulas of David [5], Alyamani and Sen [6], and Chapuis [7] considered the particle size to estimate the soil permeability. David’s [5], Cheng and Chen [8], Terzaghi’s [8], Milan, and Andjelko’s [9] formulas show that the k-value is depended on porosity, particle size, and another factor. Lebron et al. [10] predicted the k-value based on bulk density, particle size, and shape. It can be seen that the formulas provide a relatively fast and simple tool for calculating the k-value. However, the k-value obtained from experimental results and empirical formulas show in many cases significant differences. It is indicated that the formulas should therefore be applied only in preliminary calculations. Furthermore, empirical formulas are not applicable to all soil types [1].
Therefore, artificial intelligence (AI) or machine learning (ML) methods have been developed in recent decades to accurately predict the k-value of the soil and to reduce cost and time using limited geotechnical parameters. Such methods include artificial neural network (ANN) [11–14], adaptive neural fuzzy system (ANFIS) [15, 16], and hybrid optimization models of genetic algorithms with adaptive neural fuzzy inference system (GA-ANFIS) [15], support vector machine (SVM), random forest (RF) [12], M5P, and Gaussian process (GP) [17].
It is observed that soft computer-based models (AI or ML) are excellent tools for predicting the k-value [18]. In which, the ANN model is used commonly because of some advantages: (i) it has a simple architecture, (ii) it is easy to train and generalize, and (iii) it can solve nonlinear problems with high accuracy [19]. However, this method also has some weaknesses such as slow convergence speed and also being prone to local errors. To overcome its drawbacks and improve its prediction performance, the optimization algorithms will be helpful [20]. The optimization algorithm is used to change the properties of the neural network such as the weight and the learning speed to reduce the loss [21].
Teaching learning-based optimization (TLBO) has been proposed in recent years [22]. This is a new swarm intelligence optimization algorithm that simulates the teaching-learning phenomenon of a classroom [23]. It has been tested on several unconstrained and unconstrained nonlinear programming problems, including some combinatorial optimization problems, and has achieved considerable success [24]. According to recent literature reviews, the TLBO seems to have the potential to solve combinatorial optimization problems [25]. However, its performance has yet to be tested on shelf-space allocation issues [26]. It is a fact that the continuous development of metaheuristics helps to provide effective solutions to optimization problems [27].
Therefore, in the present study, we have used the following ML hybrid model: teaching learning-based optimization of artificial neural network (TLBO-ANN) to predict the k-value by combining the advantages of both TLBO and ANN. To the best of the authors’ knowledge, for the first time, the TLBO-ANN model is used in determining the k-value in the Vietnamese study area. The main objective of this study is to apply a newly developed hybrid model (TLBO-ANN) for the prediction of the k-value based on collected data from the Da Nang-Quang Ngai Expressway project site in Vietnam to assess its capability in highly accurate prediction for further use in other areas. Statistical metrics such as correlation coefficient (R), root mean square error (RMSE), and mean absolute error (MAE) have been used to validate the model performance. The MATLAB software is used to process the data and to simulate this model.
2. Materials and Methods
2.1. Data Used
The k-value is affected by many factors such as porosity, particle composition, mineral composition, and physical and mechanical parameters [1, 2, 4, 28–30]. However, this study will focus on the key factors that significantly influence soil permeability to reduce the complexity of the model. In the present study, data on 84 soil samples were collected from the Da Nang-Quang Ngai expressway project.
The experimental program in the laboratory consists of the following two parts:(1)Specific tests determined water content (), void ratio (e), specific gravity (γ), clay content (CC), liquid limit (LL), and plastic limit (PL). The collected results were used as input parameters for the predictive model.(2)Permeability test determined the k-value. The collected results were used as output parameters for the predictive model.
Statistical analysis of these input parameters is provided in Table 1. The results show that the maximum, minimum, mean, and standard deviation values of 06 input variables and 01 output variable were used in this study.
All data, including input and output parameters, is normalized. Data normalization or scaling was performed to minimize information clutter and errors in the model study. As part of the normalization process, the values in the dataset were changed to a general scale that did not distort the difference in the value range 0–1. Normalization data in the columns is carried out according to the following equation:where α and β are the maximum and minimum values of the parameter i.
2.2. Methods Used
In this study, a hybrid model (TLBO-ANN) has been developed by optimizing artificial neural network (ANN) technique with teaching learning-based optimization (TLBO) algorithm. A brief description of both the methods is described as follows.
2.2.1. Artificial Neural Network (ANN)
ANN is known as a common and powerful technique that imitates the activity and performance of the human brain and nervous system [31]. This technique has many crucial abilities such as generalization, learning from data, and can deal with a large variable. It was reported that the major characteristic of ANN comprises continuous nonlinear dynamics, high fault tolerance, collective computation, self-learning, self-organization, and real-time treatment [32]. Thus, this algorithm has been widely employed and applied successfully to solve many problems in geotechnical engineering. In both linear and nonlinear patterns, ANN is generally adopted to determine the hidden layer between output and input neutrons; as a result, ANN could decide to analyze relationships and patterns by itself in data. To predict the permeability coefficient of soil, a multilayer perceptron (MLP) was adopted as a regression technique. To calculate the weights of the input through activation function, the sigmoid function is used in neurons.where hi indicate the permeability coefficient (output) and x = (x1, x2, …, xi) denotes input parameters (i.e., affected factors of permeability coefficient).
2.2.2. Teaching Learning-Based Optimization (TLBO)
The teaching learning-based optimization (TLBO) algorithm is a novel algorithm, which has been suggested by Rao et al. [33, 34] and developed according to the inspiration of students and teachers in a class. It was reported that the TLBO algorithm showed more superiority than other algorithms such as particle swarm optimization, harmony search, and artificial bee colony algorithm [34]. In addition, other researchers indicated that the TLBO algorithm showed better results than those using genetic algorithm and ant colony optimization [35]. The concept of the TLBO algorithm has mimicked the influence of teachers on the output of the student in a class. Teachers and students are two main components of the algorithm and they represent two basic modes of learning, via the teacher phase and its interaction with the learner phase (student phase). The output of this algorithm is the grades or results of the learners that are strongly affected by the teacher’s quality. A high-quality teacher could encourage learners in a class, thus helping in enhancing the performance of the class. In the class, each learner attempts to follow the teacher and improve their performance of the class. Besides, each learner also interacts and exchanges with other learners in the class to enhance their single performance. The TLBO algorithm is a population-based method that is established by learners. The different variables are defined as different subjects that are introduced to the learners, and the results of learners are corresponding to the fitness value of optimization. The whole process of the TLBO algorithm includes two phases, namely, the learner phase and the teacher phase. The detail of the two phases, the algorithm, and the procedure of this algorithm can be found in the published literature [24, 36].
2.2.3. Validation Indicators
(1) Statistical Indices. In this study, three statistical indicators including R, RMSE, and MAE are used to evaluate the performance of the TLBO-ANN model. In the model validation process, the correlation between the actual and the predicted values are expressed by R-value. In addition, RMSE is used to evaluate the difference between the actual and the predicted values, while MAE displays the average error of the actual and the predicted value. Specifically, the lower RMSE and MAE values indicate higher model accuracy and better performance. Conversely, a higher R-value means better model performance. The R-value varies between −1 and 1, and the closer the absolute value of R is to 1, the more accurate is the model performance. The formulas for determining R, RMSE, and MAE are shown as follows:where M is the actual experimental value, N represents the expected value based on the model’s estimate, and n represents the total sample size in the data set.
(2) Cost Function. To show the difference between the predicted and the actual value, a cost function or loss function is generally used. The loss function refers to the error for a single training example, while the cost function refers to the average of the loss functions over the entire training dataset [37]. The cost function acts as an indicator of the model’s performance improvement during adjustment of error [38]. The main objective of the optimization strategy is to minimize the value of the cost function [39]. Some of the cost functions used in ML models include the regression cost function, the binary classifier cost function, and the multiclass classification cost function. Iterative strategies are applied during the training of ML models to reduce loss. In this study, the regression cost function was used during the training of the models.
2.3. Methodology
The proposed methodology of the present study is described in the following three main steps: data preparation, model building, and model validation (Figure 1):(1)Data preparation: in the first step, a database of 84 laboratory test results is used: Six parameters, namely , e, γ, PP, PL, and CC as input and the k-value as output. The data is normalized in the value range [0 : 1]. It is randomly divided into two parts in a 70 : 30 ratio. In the model, the training part accounts for 70% of the dataset, the testing part accounts for the remaining 30% of the dataset based on the experience of authors and published literature [12, 16, 17].(2)Model building: in the second step, the training dataset is used to generate the initial ANN network model then it is optimized by the TLBO algorithm as a process in the TLBO-ANN hybrid model.(3)Model validation: in the final step, the testing part data is used to validate the proposed model. Statistical indicators such as R, RMSE, and MAE are used to evaluate the performance of the model.

3. Results
The performance of the TLBO-ANN model has been evaluated based on the results of the cost function. The goal of predictive modeling is to have the model converge as soon as possible with the least number of iterations to minimize the cost function. Figure 2 depicts the convergence diagram of RMSE, MAE, and R of the TLBO-ANN model after 500 iterations. Convergence curves were obtained by plotting the cost function in each iteration of the three indices (with red lines representing training data and blue lines representing testing data). Different parameters showed different convergence behaviors. It is seen that the R-value of the cost function tends to increase, whereas the MAE and RMSE tend to decrease markedly with the number of iterations and the convergence. The results show that the cost function values of RMSE, after some very strong fluctuations in the first iterations, also converged and were almost stable after the 35th iteration for the testing data, and after the 115th iteration for training data, relative stability was achieved. The cost function of MAE achieved the fastest convergence at the 42nd iteration (for both training and testing data) although there was still slight variation after that as the number of iterations increased. Different from the above cost functions, the cost function of R achieved the fastest convergence at the 45th iteration (for both data); however, for the training data, there is still a slight upward trend but insignificant. As can be seen, the cost functions of R, MAE, and RMSE in the TLBO-ANN model converge rapidly in the simulation runs.

In the next section, typical results after 500 simulations of the TLBO-ANN model are presented. The correlation between the k-value corresponds to the experimental value obtained (black line) and the predicted value (red line) from the training and testing process, according to the TLBO-ANN model, shown in Figure 3. In this figure, the horizontal axis represents the number of samples in the data sets, the vertical axis represents the k-value of the soil with the unit 10−9 cm/s. The k-value of the samples in the training data set of the proposed model is quite close to the actual results (Figure 3(a)). With the testing data set, the experimental results are also predicted with small errors (Figure 3(b)).

(a)

(b)
The performance of the model is also evaluated by the error evaluation criteria, namely, MAE and RMSE presented above. The values of these criteria for the training and testing dataset are shown in Figure 4. The RMSE values are 3.0541 and 2.9401 while the MAE values are 2.1721 and 2.3075 for the training and testing dataset, respectively.

(a)

(b)
Figure 5 shows the histogram of the frequency and probability density function for the predictive results of the k-value. The results show that the maximum concentration error is in the range −0.002 to 0.001 for the training dataset and −0.009 to 0.012 for the testing dataset, indicating a very highly concentrated probability density function in this range. There are also a few cases where the error is high, about −1.5, but accounts for a tiny percentage that does not affect the overall. In addition, with a very small mean error (−0.0052 for training data and 0.0491 for testing data), the TLBO-ANN model shows a very high predictive accuracy.

(a)

(b)
The regression model showing the correlation results between the predicted value according to the TLBO-ANN model and the actual value for the training and testing dataset is shown in Figure 6. In which, the horizontal axis represents the results of the collected experiment, and the vertical axis represents the outcome predicted by the proposed model. It is observed that the values obtained from the proposed model for the training dataset (Figure 6(a)) and the testing dataset (Figure 6(b)) are very close to the experimental results. These results show that the TLBO-ANN model can generalize between input and output parameters and gives reasonable prediction results. For the training data set, the correlation between simulation and experimental results reached R = 0.951, for the téting dataset, R = 0.905, and the error is mainly concentrated in the first quartile. It indicates that the predictive power of the model is very good. The function “y = 0.88x + 0.0052” is set up to represent the correlation between experimental and simulation data for the training data set. Similarly, the function “y = x + 0.0051” is established in the testing data set. It is noticed that the coefficients of these two equations are quite equivalent. At the same time, the R values show that it is feasible to apply the TLBO-ANN model to predict the k-value.

(a)

(b)
4. Discussion
The performance of the TLBO-ANN model has been validated by statistical metrics. The results of this study are compared with the results of the studies of Pham et al. [12, 17]. They have used the following ML models: M5P, GP, ANN, SVM, and RF to predict the k-value with the same 06 types of parameters as inputs at the Da Nang-Quang Ngai Expressway project (Table 2).
The results show that the determination coefficient of the TLBO-ANN model in the study (R = 0.905) is much higher than that of the M5P and GP models (R < 0.77), as well as the ANN, SVM, and RF models (R < 0.851). It can be seen that the TLBO-ANN model is much superior to the single ANN model and other models such as M5P, GP, SNM, and RF. Result also shows that the role of the TLBO algorithm in the enhancement of optimization in the performance of the TLBO model is similar to some published studies. Actually, the TLBO shows that this is an optimal algorithm with high reliability, accuracy, and fast convergence speed. It does not require any algorithm-specific parameters, so the TLBO algorithm can also be called an algorithm-specific parameter-free algorithm [40]. The TLBO algorithm is based on the influence of the teacher’s presence on student outcomes in the classroom, and outcomes are calculated by semester grades. The TLBO algorithm is observed to perform better than other optimization algorithms (archive-based microgenetic algorithm (AMGA), clustering multiobjective evolutionary algorithm (clustering MOEA), differential evolution with self-adaptation and local search algorithm (DECMOSA-SQP), dynamical multiobjective evolutionary algorithm (DMOEA), generalized differential evolution 3 (GDE3), LiuLi algorithm, multiobjective evolutionary algorithm based on decomposition (MOEAD), enhancing MOEA/D with guided mutation and priority update (MOEADGM)) for unconstrained benchmarking problems, and unconstrained multiobjective [41]. In applying optimization algorithms to enhance the learning process of ANN, TLBO has better training accuracy in comparison with the other two algorithms (particle swarm optimization (PSO) and differential evolution (DE)) [19].
However, it should be noted that as only 84 test results were used from one project in Da Nang, Vietnam, the k-value prediction results of this paper only have high reliability within the scope of experimental data. Therefore, in the next research direction of the research, more experiments related to other physical parameters of the soil such as particle shape, particle distribution, effective particle diameter, and sampling in the different projects to expand the scope and number of inputs will be conducted.
5. Conclusion
In this study, the proposed ML hybrid model TLBO-ANN has been successfully developed and evaluated for the prediction of the k-value using cost function and statistical measures (R, RMSE, and MAE). The results show that the TLBO-ANN model is a good predictor in predicting the k-value of soil with R = 0.905. Comparison of this model performance with another single ANN model and other models such as M5P, GP, SNM, and RF is also much superior. Therefore, the TLBO-ANN model can be used for the accurate prediction of the k-value of soil. However, as the sample size of the present study is limited, it is proposed in future studies to include more samples and different combinations of input parameters of soils for wider applicability of this model in other areas also considering success in highly accurate prediction of the k-value at Da Nang-Quang Ngai Expressway, Vietnam.
Data Availability
The data used to support the findings of this study are available from the corresponding authors upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The authors would like to thank the support of the University of Transport Technology.