Abstract
Probabilistic seismic demand model (PSDM) is one of the critical components of performance-based earthquake engineering frameworks. The aim of this study is to propose a procedure to generate PSDMs for a typical regular continuous-girder bridge subjected to far and near-fault ground motions (GMs) utilizing machine-learning methods. A series of nonlinear time history analyses (NTHAs) is carried out to calculate the damage caused by the far and near-fault GMs for four different site conditions, and 21 seismic intensity measures (IMs) are considered. Subsequently, PSDMs are established for the IMs and engineering demand parameters based on the existing NTHA data using machine-learning methods, which include linear regression, Bayesian regression (BR), and a tree-based model. The results indicated that random forest (RF) is the most suitable model to predict the longitudinal and transverse curvature at the bottom of the four piers from the coefficients of determination. More specifically, the relative importance of each parameter in the model is evaluated, and peak ground velocity (PGV), peak spectral velocity (PSV), Arias intensity (AI), and Fajfar intensity (FI) are found to be the critical factors for the RF-based PSDM. Finally, all of these parameters, except AI, are correlated with velocity. The research results explore a new method for establishing the seismic demand model of continuous-girder bridges, which can provide suggestions for seismic damage prediction and seismic insurance risk evaluation.
1. Introduction
Recent earthquakes have highlighted that continuous girder bridges, as key components of transportation networks, are one of the most vulnerable infrastructure components [1, 2]. As a key step in performance-based seismic design (PBSD) [3, 4], probabilistic seismic demand model provides a means to describe probabilistically the relationships between seismic intensity measures (IMs) and engineering demand parameters for bridges subjected to potential earthquakes.
Several studies investigated seismic demand models for continuous girder bridges in recent years. Mackie et al. [5, 6] established a PSDM for columns of California highway bridges using nonlinear time history analysis and incremental dynamic analysis. Nielson et al. [7, 8] established PSDMs for columns, bearings, and abutments by analyzing the seismic fragility curves of typical bridges in the central and southeastern United States. Padgett et al. [9] investigated the PSDMs for a class of retrofitted multispan continuous concrete girder bridges using two suites of synthetic GMs. Pan et al. [10] focused on multispan simply supported steel highway bridges in New York, USA, and established PSDMs for columns and bearings based on nonlinear time history analysis. Pan et al. [11] used the demand/capacity ratio as a variable and adopted quadratic instead of linear regression model to predict the demand, while analyzing the fragility of bridges in New York, USA. Padgett et al. [12] proposed criteria, such as efficiency, practicality, proficiency, sufficiency, and hazard computability, to select the optimal IMs for highway bridges. Zheng et al. [13] established PSDMs for columns, bearings, and abutments by analyzing the seismic fragility curves of typical simply supported girder bridges in Wenchuan, China. The results showed that peak ground acceleration and engineering demand parameters of components were not ideal in logarithmic space. Ma et al. [14] compared the near-fault damage mechanism of continuous bridges with far-fault earthquakes and established a PSDM using an intensity parameter. The results showed that Housner intensity had the best correlation with the bridge pier drift ratio. The strategy currently employed to establish PSDMs for continuous girder bridges is based on linear regression utilizing only a single IM and engineering demand parameters. However, a single IM cannot account for all relevant characteristics of a ground motion. Therefore, it is essential to include more IMs to improve the capability of regression models to fit data. Several methods have been proposed to improve the outcomes of the evaluation. Jiang et al. [15] investigated the Optimal IMs of PSDMs for isolated bridges subjected to pulse-like GMs. Similarly, Wang et al. [16] proposed a multidimensional fragility evaluation methodology considering multiple performance limit states and seismic demand parameters, indicating that the uncertainty and dependence between seismic demand parameters are dispensable in the fragility analysis process. Taking the dependence of the seismic demands on ground motion characteristics and the prevailing uncertainties into consideration, Huang et al. [17] constructed the probabilistic demand models for reinforced concrete highway bridges with one single-column bent.
The prevalence of machine learning (ML) methods is attributed to their high accuracy and efficiency. Compared to the traditional linear regression, ML methods have advantages for analysis complex and uncertain problems, facilitating decision-making, and propagating [18]. However, the application of ML methods to PSDMs of regular continuous girder bridges with multiple IMs has rarely been investigated in the past, which provides the motivation for the present study. This paper proposes a general procedure to establish PSDMs for continuous girder bridges based on ML models. A series of NTHAs were performed on a bridge finite element model in OpenSees, and 21 seismic IMs were considered. Various ML models, such as linear regression, Bayesian regression, and tree-based model were used to establish a PSDM for the continuous girder bridge and compared to the conventional linear regression model. From the comparison, random forest (RF) was found to be the most suitable model, and the relative importance of each input IM was elucidated.
2. Overview of ML Regression Methods
ML is an important branch of artificial intelligence. It is an approach to optimize the performance of a computational process using available data or previous experience. Various ML regression methods, shown in Figure 1, such as linear regression, BR, and tree-based models [19] will be used to establish a PSDM for continuous girder bridges in this paper.

2.1. Linear Regression
Linear regression uses a linear function to fit the data based on the mean square error (MSE) between the observed and the predicted values to calculate the penalty function. It adopts the gradient descent method to find a set of weights to minimize the MSE. A linear regression model is established by minimizing the MSE. The model [19] takes the following form:where is the regression coefficients vector, is the input variables vector, is the observed response, is the predicted response, and is the error term.
Lasso regression (LR) and ridge regression (RR) add L1 and L2 norm regularization terms, respectively, to the standard linear regression penalty function. For LR, the penalty function [19] becomeswhile the penalty function [19] for RR can be written as follows:
The regularization terms in LR and RR are different, which lead to different results.
Elastic net (EN) is a linear regression model utilizing both L1 and L2 norm regularization terms. The penalty function [19] of EN is as follows:
Support vector regression (SVR) is another linear regression model based on the concept of support vector machine (SVM). Given a training dataset for binary classification, SVM constructs a hyperplane to divide the data into two classes, where the term ‘support vector’ refers to the data point nearest to the hyperplane. The hyperplane is chosen to maximize its distance to the support vectors, i.e., the classification margin is maximized by SVM. Similar to support vector machine, SVR finds a hyperplane as the regression result, such that the distance between the sample points farthest from the hyperplane is the shortest. The optimization problem [19] to find the regression hyperplane can be written as follows:where is the hyperparameter that determines the width of the interval boundary.
2.2. Bayesian Regression
BR is a linear regression model solved by statistical Bayesian inference. BR regards the parameters of a linear model as random variables and calculates the posterior estimates using the prior values of the model parameters. Compared to other models, BR can avoid under- and overfitting caused by choosing too simple or too complex a model. At the same time, it can make full use of available data, so as to avoid data underutilization. In order to obtain a fully probabilistic BR model, the output is assumed to be Gaussian distributed around [18]:where is treated as a random variable that is to be estimated from the data, is the normal distribution weight, and is the input data.
BR can be used for parameter regularization during the prediction stage. The regularization parameters are not selected automatically, but rather manually adjusted. In Bayesian ridge regression (BRR), we do not manually adjust the regularization parameters, but let it be estimated from the data as a variable. The BRR assumes that the prior of the coefficient follows a spherical Gaussian distribution [18]:where the priors over and are chosen from a gamma distributions, controls the variance of the Gaussian distribution, and is the p-dimensional identity matrix.
Automatic relevance determination (ARD), also known as sparse Bayesian learning, is very similar to BRR, but it will typically lead to sparser weights . Specifically, it weakens the assumption that the Gaussian distribution is spherical and instead assumes a different a priori hypothesis for , which is an elliptical Gaussian distribution for noncorrelated variables [18]:where .
2.3. Tree-Based Models
Tree-based models have the advantages of strong interpretability, convenience, and high accuracy. Regression decision tree mainly refers to the classification and regression tree algorithm. The values output by internal nodes are “yes” or “no,” which constitute a binary tree structure. Regression tree divides the feature space into several units, each with a specific output. Because each node outputs “yes” or “no,” the boundary is parallel to the coordinate axis. For the test data, the corresponding output can be obtained as long as we classify it into a unit according to the division of the tree nodes.
RF represents an improvement over bagged trees, which mainly reduces the correlation of multiple trees. In the training stage, RF uses bootstrap sampling to collect multiple different subtraining data sets from the input training data set to train multiple different decision trees in turn. In the prediction stage, the prediction results of multiple decision trees in the RF are averaged to obtain the final results.
Gradient boosting decision tree (GBDT), which adds the gradient boosting method to the ordinary decision tree, evolves from a single decision tree to multiple decision trees to gradually improve the learning accuracy. Adaptive boosting (AdaBoost) [20] strengthens the importance of the samples wrongly assigned by the previous application of the basic classifier and uses all the weighted samples to retrain the basic classifier. At the same time, a new weak classifier is added in each round until a predetermined small error rate or maximum number of iterations are reached. Light gradient boosting machine (LightGBM) [20] is a model based on decision tree algorithms, in which the model is generated leafwise rather than depthwise (as in other decision tree-based methods). Such a leafwise generation leads to more complex but also more accurate trees.
3. Example Bridge and Its Modeling
A three-span reinforced concrete continuous girder bridge in China was used to investigate the capacity of the proposed ML-based demand models. The girder had a box cross section, 1.6 m high, and 10.5 m wide. Each of the two piers consisted of two solid circular columns with a diameter of 1 m. The diameter of longitudinal reinforcements was 25 mm with a total of 20 reinforcement bars arranged at equal intervals. The diameter of stirrups was 16 mm, and the distance between stirrups was 10 cm. Pot rubber bearings were used at the bent beams and abutments as girder supports.
A 3D finite element model of the bridge was constructed in the OpenSees platform [21] (Figure 2). The girder was simulated using the elastic beam-column elements, as it usually remained in the elastic state under earthquake excitation. Nonlinear-beam-column elements were adopted to simulate the piers due to expectation that plastic hinges will develop in them. The cross sections were fiber-defined cross sections composed of concrete fibers and steel fibers. Concrete02 material model was used for both the confined and unconfined concrete, but different material parameters were assumed for the two. Reinforcing-steel material model was used for longitudinal reinforcement. The bridge bearings were modeled with flat slider bearing elements. Hyperbolic gap model proposed by Wilson and Elgamal [22] was used to simulate the nonlinear deformation characteristics and resistance of abutments. In order to simplify the model and reduce the calculation time, soil-structure interaction was ignored.

4. Ground Motion Selection
Ground motion is a time varying process, whose main characteristics are the amplitude, spectrum, and duration, which are referred to as the three key characteristics of an earthquake. In recent years, earthquake engineering researchers have proposed many ground motion parameters to describe the intensity of ground motion. Different IMs correlate differently with the seismic response of the same bridge structure. Therefore, it is necessary to use multiple IMs to evaluate the seismic response of bridges. The current study used the suite of GMs developed by Baker et al. [23], which was established as part of the PEER Transportation Research Program for the seismic risk assessment of infrastructure systems in California. Four sets of GMs were selected (set 1: M = 6, R = 25 km, soil site; set 2: M = 7, R = 10 km, rock site; set 3: M = 7, R = 10 km, soil site; set 4: pulse-like GMs). Figure 3 presents the response spectra for each ground motion set. A total of 21 IMs were considered as given in Table 1. In this paper, the bridge was excited in the longitudinal direction.

5. Probabilistic Seismic Demand Models
In this study, the horizontal (longitudinal and transverse) components of each ground motion [23] excited the bridge model in the longitudinal direction, and the seismic responses were calculated. The seismic responses refer to longitudinal pier curvatures (Figure 2) (at the bottom of pier #1 (), pier #2 (), pier #3 (), and pier #4 ()) and transverse pier curvatures (at the bottom of pier #1 (), pier #2 (), pier #3(), and pier #4 ()). In this study, the IMs and observed responses were transformed by taking their natural logarithm.
As a key step in PBSD, the PSDMs are established to describe the probabilistic relationships between engineering demand parameters and IMs. Previous studies on PSDMs mainly used the model proposed by Cornell et al. [31]. Cornell et al. [31] assumed that the median seismic demand () and the intensity parameter () satisfy the following exponential relationship:where a and b are the regression coefficients. Taking natural logarithm of both sides of (5) yields:
In the present work, the developed model has been assessed using the coefficient of determination :
The larger the value, the better the fitting of the regression curve to the data.
The coefficients of determination () for the considered seismic responses with respect to the 21 IMs listed in Table 1 are summarized in Figure 4(a), and values of greater than 0.7 are shown in Figure 4(b). It can be observed that the PGV and PSV fit the data better than the other IMs. Figures 5 and 6 present the PSDM results of the seismic responses versus PGV and PSV. It can be seen that the prediction ability of the regression model generated by fitting only one IM (PGV or PSV) required improvement mainly because using PGV or PSV only cannot guarantee to capture all the main characteristics of a ground motion. In order to improve the ability of the regression models to fit the data and consider more IMs, ML methods were adopted to establish the PSDMs in this study.

(a)

(b)


The ML methods described in Section 2 were used to establish the machine-learning-based PSDMs. 90% of the available data were used to establish the prediction model (training set), and the remaining 10% to evaluate the performance of the prediction model (test set). The partitioning of the entire dataset into the training set and testing set was random, and the performance of the model on the testing set was taken as an indication of the performance of the model on the unknown data. The ML codes for the models mentioned in Section 2 were developed using the open-source Python package scikit-learn [32]. The performance of each ML model was evaluated using the coefficient of determination () (Table 2). The maximum values of each column in Table 2 are marked by bold font. As shown in Table 2, the results of the tree-based model are better than those of the linear regression and BR, while the results of the different tree-based models (RF, GBDT, AdaBoost, and LightGBM) are close to one another. It can be concluded from Table 2 that the tree-based models had a higher accuracy than the traditional methods, linear regression, and BR. It can be seen from Figure 7 that the fitting degree of RF (green) was closest to the original data (black), which is consistent with the results listed in Table 2. Thus, RF was deemed to be the most suitable model to establish the PSDM in this research.

To evaluate how the performance of the RF model is affected by the input parameters, a further analysis was carried out to identify the importance of the input parameters, as shown in Figure 8. The importance of a feature [32] was computed as the (normalized) total reduction of the Gini importance [33] of that feature. The higher the Gini importance, the more important the feature. Note that the sum of all the values above the vertical bars in Figure 8 is 100%. As seen in Figure 8, PGV, PSV, AI, and FI are all critical factors of the PSDMs, while all other IMs have much less influence on the demand models. It can also be seen that, based on their definitions, the critical IMs (PGV, PSV, AI, and FI) are correlated with the velocity except for AI.

6. Conclusions
The main purpose of this paper was to develop a PSDM to predict the seismic response of regular continuous girder bridges based on ML methods. The efficiency of various ML regression models, such as lasso regression, ridge regression, elastic net, supported vector regression, Bayesian ridge regression, automatic relevance determination, random forest, gradient boosting decision tree, adaptive boosting, and light gradient boosting machine, for generating of PSDMs for a regular continuous girder bridge was first evaluated. A three-dimensional numerical model of a bridge was generated using OpenSees. A total of 320 GMs attributed to four site conditions were selected to excite the bridge. The coefficient of determination () was used to assess the ML models developed. Several main conclusions are drawn as follows:(i)PGV and PSV are more appropriate than other IMs for establishing the PSDMs for the regular continuous girder bridges subjected to the selected GMs. However, the machine-learning-based demand models utilizing more than one IMs are better than the traditional demand models based on a single IM.(ii)This paper evaluated three mainstream ML regression methods (linear regression, BR, and tree-based model), incorporating 10 ML models: LR, RR, EN, SVR, BRR, ARD, RF, GBDT, AdaBoost, and LGBM. The tree-based models had a higher accuracy than traditional methods, linear regression, and a significantly improved coefficient of determination. The obtained results indicated that RF can be used for predicting the seismic behavior of regular continuous girder bridges subjected to far and near-fault GMs.(iii)This study identified the critical factors for the RF-based PSDMs of the bridge for various engineering demand parameters. PGV, PSV, AI, and FI were found to be the critical factors of the RF-based PSDMs, and they were all correlated with velocity, except for AI.
Although the findings of this study are based on the case study of a three-span regular continuous girder bridge in China, the methodology is applicable to other bridges. As the accurate probabilistic seismic demand analysis is a common challenge in performance-based earthquake engineering frameworks, the proposed approach will help to improve establishing the PSDMs.
Data Availability
The data used to support the findings of this study were supplied by Li under license and so cannot be made freely available. Requests for access to these data should be made to lwshan1995@gmail.com.
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
This research was funded by the Scientific Research Fund of Institute of Engineering Mechanics, China Earthquake Administration (Grant nos. 2019EEEVL0403 and 2021EEEVL0313).