Abstract
One of the problems of optimization of concrete is to formulate a mathematical equation that shows the relationship between the various constituents of concrete and its properties. In this work, modelling of the compressive strength of concrete admixed with metakaolin was carried out using the Gene Expression Programming (GEP) algorithm. The dataset from laboratory experimentation was used for the analysis. The mixture proportions were made of three different water/binder ratios (0.4, 0.5, and 0.6), and the grades of concrete produced were grade M15 and M20. The compressive strength of the concrete was determined after 28 days of curing. The parameters used in the GEP algorithm are the input variables which include cement content, water, metakaolin content, and fine and coarse aggregate, while the response was designated as the compressive strength. The model was trained and tested using the parameters. The R-square value from the GEP algorithm was compared with the use of conventional stepwise regression analysis. With a coefficient of determination (R-square value) of 0.95, the GEP algorithm has shown to be a good alternative for modelling concrete compressive strength.
1. Introduction
The sustainable attributes of concrete are strongly tied to the service life and performance of the binder system used. Conventional binder systems based on Portland cement have been exceptional over a wide range of conditions. However, as it is with the manufacturing processes used in the production of most building materials, production of Portland cement needs a significant amount of energy and inherently produces greenhouse gases. This fact is even more thought provoking considering the increasing usage of Portland cement over the years, with about 3.6 billion tons of cement having been produced in 2011 [1]. Engineers have come up with approaches to improving the sustainability of concrete by adopting and increasing the use of cementitious materials that is based on little or no Portland cement and more on alternative materials, for example, fly ash, silica fume, granulated blast furnace slag, and natural pozzolans such as metakaolin. In the future, the use of alternative materials to Portland cement will only increase, so there is need to optimize the use of these materials. However, the use of these Supplementary Cementitious Materials (SCM) must be accomplished without jeopardizing the service life and performance attributes that have tagged concrete the most widely used construction material on the planet [2].
There are other properties expected of concrete other than high performance, such as workability, strength, and durability at all times. As a result of the advancement in technology, concretes that meet these requirements can now be produced. However, there has been no established method whereby the mixture proportions of concrete can be optimized according to the required performance. Only a few attempts have so far been made at that problem. The major reason for this is that a wide variety of mixture proportions are possible and it is quite challenging to appropriately optimize the problem under many criteria (represented by objective functions) mathematically [3]. Therefore, this paper presents the use of GEP in modelling the compressive strength of concrete with metakaolin as the supplementary cementitiuos material.
Application of metaheuristic algorithm in various fields of engineering has unfolded, and many more studies are currently ongoing. Jahed Armaghani et al. [4] carried out studies on the performance of vector machine (SVM) models with different kernels to model rock brittleness and compare the inputs’ importance in different SVM models. Also, neural network and particle swamp optimization was hybridized as neuroswamp to estimate pile settlement [5]. In the work by Gülbandılar et al. [6], it was reported that fuzzy decision-making theory algorithm is very applicable and useful in the selection of materials for cement mortar. Akin and Abejide [7] adopted the use of Gene Expression Programming (GEP) in modelling the compressive strength of concrete produced by partially replacing Portland cement with ground-granulated blast furnace slag. Artificial Neural Network (ANN) and Adaptive Network-based Fuzzy Inference Systems were adopted to model the behavior of concrete containing zeolite and diatomite [8]. Kocaka et al. [9] investigated the properties of cement formed by partially replacing Portland cement with ratios of blast furnace slag and waste tire powder, and this was carried out using the fuzzy logic approach.
Metakaolin is kaolin clay calcined at temperatures greater than 650°C. Stones that are rich in kaolinite are referred to as China clay or kaolin, traditionally used in the production of porcelain. Metakaolin particles are seen to be finer in size than that of Portland cement, but not as fine as silica fume. In metakaolin, the Si–O network remains unaltered and the Al–O network restructures itself. While kaolinite in its raw state is crystalline, metakaolin is largely disordered in its structure and provides good properties as mineral additive [10]. Metakaolin in itself is not cementitious, but it is a highly reactive pozzolan, which in the presence of water reacts well with lime to form hydrated calcium and aluminum silicates compounds. Therefore, it is regarded to be a good synthetic pozzolan. Its pozzolanic behaviour and potential for use in concrete production are currently being researched [11, 12].
2. Overview of Gene Expression Programming
Gene expression programming (GEP) is an evolutionary algorithm that generates computer programmes or models. These computer programmes are a system in the form of tree structures that adapt by varying their sizes, shapes, and composition, in the same manner like a living organism.
GEP was first introduced by Ferreira [13], with the assumption of it being an extension of genetic programming (GP) [14], while preserving few properties of genetic algorithms (GA) [15]. GEP is a powerful data analysis programme which has been adopted for use in various fields. The shortcomings of other data analysis tools are addressed by the use of gene expression programming [16].
The GEP gene is made up of a list of symbols with a fixed length that can be any element from a set of functions such as +, , −, /, and √ and the terminal set such as (). A typical GEP gene with the given function and terminal sets can bewhere are variables and C1 a constant say 3. The abovementioned GEP gene in equation (1) can be referred to as the Karva notation or K-expression. A K-expression can be mapped into an expression tree (ET) following a width-first fashion. The sample gene in equation (1) is shown in Figure 1. The conversation starts from the first position in the K-expression, which corresponds to the root of the ET, and reads through the string one by one. The GEP gene in equation (1) can also be expressed in a mathematical form as follows:

An expression tree can be reconverted into a K-expression by recording the nodes from left to right in each layer of the ET, from the root layer down to the deepest one to form the string. Figure 1 shows the gene expression tree from which equation (2) is decoded.
Unlike the parse-tree representation in canonical genetic programming, GEP uses a fixed length of character strings to represent solutions to the problems, which are afterwards expressed as parse-trees (called “expression tree” in GEP) of different sizes and shapes when evaluating their fitness [17].
One of the advantages of the GEP technique is that the creation of genetic diversity is extremely simplified as genetic operators work at the chromosome level. Another strength of GEP is that the unique multigenic nature allows the evolution of more complex programs composed of several subprograms [18–20]. The fixed length of the GEP is usually predetermined for a given problem. Thus, what varies in GEP is not the length of genes but the size of the corresponding ETs [21].
The schematic diagram in Figure 2 shows the whole process of a GEP algorithm from the start to end.

3. Materials and Method
3.1. Materials
Cement. The cement used in this research work was 42.5 N Dangote brand of ordinary Portland cement and was obtained from an open market in Samaru, Zaria. The specific gravity was found to be 3.16 g/cm3.
Metakaolin. Kaolin clay gotten from the Kankara Village in Katsina State, Nigeria, was calcined at about 800°C for three hours to convert it to metakaolin. Its oxides composition is given in Table 1.
Fine and Coarse Aggregate. Gotten from a local supplier in Zaria, Kaduna State, the aggregates were subjected to an appropriate preliminary test to ensure to meet with standards. The specific gravity for the fine and coarse aggregate are 2.56 and 2.70, respectively.
Water. Potable water free from salt was used for both mixing and curing.
3.2. Concrete Mixture Proportion
Table 2 presents the mixture proportion: three different water/binder ratios for each of grade M15 and M20 concrete.
3.3. Test Procedure
Three cube samples were used for each mixture. Samples (with 100 mm a side) produced from fresh concrete were demoulded after 24 hours; then, the samples were cured in clean water for 28 days; this is because concrete has been reported to attain 99% of its strength at 28 days. The sample was crushed using the crushing machine, and the compressive strength was calculated.
4. Modelling of the Compressive Strength
The experimental dataset described in Table 2 and the corresponding compressive strength are summarized in Table 3 and used for the modelling compressive strength of concrete.
The dataset was gotten from the laboratory experimentation carried out for low-strength and medium-strength concrete only, and the average of the tests conducted for each sample is given in Table 2. For variables more than 3 to 5, larger dataset will be required for an accurate model.
4.1. Model Construction Using Gene Expression Programming
The major task here is to define the hidden function connecting the input variables x1, x2…x5 and the output or target value y. This expression can be written in the following form:
The models are developed for the 28-day compressive strength of concrete. The variables for the modelling is given in Table 4.
The R-square (R2), Mean Square Error (MSE), and Root-Mean-Squared Error (RMSE) are determined as the statistical criteria for evaluating the performance of the results obtained by GEP-based models. The formula for the coefficient of determination (R-square) is given in the following equation:where t = target value; o = output value; n = number of dataset.
For an ideal or perfect fit, ti = oi, and MSEi = 0. Therefore, the range of the MSEi index is from 0 to infinity, with the value of 0 representing idea and absolute prediction. That is, the lower the MSE value, the better the model. The GEP algorithm parameters for this modelling are given in Table 4. The parameters are those that guide the operations of the algorithm for this work.
These GEP algorithm parameters were selected as given in Table 5 to aid the accuracy of the model generated, i. e, the number of generations and the number of chromosomes. Also, the function set selected makes the model to be less ambiguous and clear enough to be interpreted and, at the same time, was able to get a good prediction model through the use of trigonometric functions and exponential function.
5. Results and Discussion
5.1. Results of the Experimental Program
From Tables 6 and 7, the 28-day compressive strength showed improved results from the control up to about 10% replacement before the strength started dropping. With the increase in the use of the metakaolin above 10% replacement for all the w/b ratios of 0.4, 0.5, and 0.6, it is seen that the strength of the concrete was reducing, which validates the findings of Murali and Sruthee [23] that the percentage of metakaolin such as 5 and 10% showed considerable increase in the strength characteristics of the concrete relative to the conventional concrete type.
In addition, the strength loss of concrete, with an increasing metakaolin level from the 10% replacement was found to be consistent with all the water/binder ratios, which indicate that natural pozzolans like metakaolin have little or no effect on the water demand of concrete.
5.2. Model Results
Statistical view of the dataset used in the modelling is given in table one. Running the GEP algorithm for the concrete mix dataset needs huge computational time depending on the computer’s processing ability.
Some of the data were used to investigate the performance of the GEP algorithm in predicting the compressive strength, while the best solution is given in equation (5), which is the GEP model. Figure 3 gives the gene expression trees from which the model is derived.

(a)

(b)

(c)
Some of the data were used to investigate the performance of the GEP algorithm in predicting the compressive strength, and the best solution is given in equation (7), which is the GEP model. From the algorithm, equation (7) gave the best correlation coefficient and also included all the input variables not rending any variable as negligible.
From Figure 4, it can be seen that there is a close match between the actual curve and the predicted curve. The regression line in Figure 5 also validates that by seeing how closely packed the scatter plot is. However, some large errors are observed from the curve fitting, at about its fifteenth observed value, the error is quite large, recording a positive error of 7.04, the target at this observed order was 16 N/mm2, but the model gave a result of 8.9 N/mm2, This is a rise in the value and could have been as a result of some irregularities in the dataset because for a great part of the curve, it can be seen that the predicted value is closely approximated to the target value.


With MSE and R-squared (R2) values of 16.05 and 0.95, respectively, the model is relatively a good one. Other evolution metrics to validate the model are the Root-Mean-Squared Error (RMSE), Residual Standard Error (RSE), and Mean Absolute Error (MAE). Table 8 gives the results of the evaluation metrics.
5.3. Regression Analysis Model
To appreciate the use of the GEP algorithm for prediction, a linear equation was formulated using the classical statistical software package SPSS. The resulting equation for the compressive strength modelling is given in the following equation:
Equation (8) has the R2 value of 0.828, as compared to the values obtained from the GEP Model of 0.95. Given these results, the GEP algorithm can be appreciated as a more approximate tool for modelling the compressive strength of concrete.
As it can be seen from the models, the GEP model is highly nonlinear; therefore, it is quite difficult to solve by conventional techniques. The model based on the regression analysis is a linear function and relatively easier to solve. From the statistical details it is obvious that the model from the GEP algorithm is more accurate for the prediction of concrete compressive strength.
It can be seen from Table 9 that the GEP model result appears very close to the target value as given by the performance metric, compared to the regression analysis-based model. This is to clearly show that the predictive ability of GEP is more approximate and accurate than the classical statistical regression analysis.
6. Conclusions
The following conclusions were drawn from the investigation: (i)The experimental study showed that metakaolin can be used up to 10% replacement; also, it can be deduced that metakaolin has little or no effect on the water demand of concrete.(ii)mathematical equation has been derived showing the relationship between concrete compressive strength and its constituents, using GEP. With an R2 value of 0.95, from the model, the GEP algorithm has shown to be a good prediction program for modelling the compressive strength of concrete. The model derived can serve as the objective function in the optimization of concrete.(iii)These results show that the relationship that exists between concrete properties and its constituents is nonlinear, so it is best to represent the numerical modelling nonlinearly.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.