Abstract
Bridge weigh-in-motion (BWIM) serves as a method to obtain the weight of passing vehicles from bridge responses. Most BWIM systems proposed so far rely on the measurement of bridge global vibration data, usually strain, to determine the vehicle load. However, because the bridge’s global response is sensitive to all vehicles on the bridge, the global vibration-based BWIM techniques usually suffer from inaccuracy in the case where multiple vehicles are present on the bridge. In this paper, a data-driven approach is proposed to extract the passing vehicle’s weight and driving speed from vertical acceleration at the bridge joint. As a type of local vibration, the impulse acceleration responses at a bridge joint can be recorded only during a short period when a vehicle is passing over the joint and are thus not sensitive to vehicles at other locations of the bridge. A field test is conducted at a bridge to prepare labeled training data for the use of a convolutional neural network. One accelerometer is installed on the bridge joint to record impulse acceleration, while the vehicle’s weight and driving speed are obtained from a WIM station and a camera near the bridge, respectively. A network that detects the vehicle’s passage as well as its passing lane is first proposed, followed by a 1-D convolutional neural network that uses the raw data of acceleration as the input to predict the vehicle’s gross weight and driving speed. A comparison is made between the 1-D network and an updated 2-D network that uses the wavelet coefficients as the input matrix. The latter one shows better performance, indicating that it is important to choose the proper input data for the network to be trained. A transfer learning technique is used to test the feasibility of the proposed method. Results show that the proposed method can be extended with limited data to bridges other than the bridge where the network is trained.
1. Introduction
Bridges are susceptible to dynamic loads, including seismic movement, wind, and traffic loading, during their service lives. Among these dynamic loads, the traffic-induced load can be predominant for bridges and their components, in particular for fatigue, when vehicles with heavy weights pass on bridges frequently. On the other hand, bridges are suffering from deterioration caused by their aging processes. By 2019, 27.4% of bridges in China were built before 2000 [1], while around 25% of bridges were built before the 1970s in Japan [2]. Deterioration can lead to a decrease in the load capacity of bridges. As a result, the vehicles with heavy weights pose a nonnegligible risk on bridges and may give rise to serious problems such as fatigue or even failure of the bridge in some extreme cases [3]. In this regard, monitoring the weight of the passing vehicles reveals the severity of the loading environment at the bridge for maintenance purposes and provides basic data for the design of future bridges.
The most direct and accurate way to monitor the passing vehicle’s weight is to use a static scale, which is both time- and cost-consuming and requires stopping vehicles. Therefore, research studies have proposed the idea of bridge weigh-in-motion (BWIM) that takes the bridge itself as a weighing scale. The vehicle-induced bridge responses are measured, typically by strain gauges, and the passing vehicle’s weight is calculated from the bridge responses as an inverse dynamic problem. Since this idea was introduced by Moses in the 1970s [4], research studies have proposed many BWIM algorithms. Some of these algorithms are based on an extension of Moses’ method [5–7], while others treat the problem from a system identification point of view [8–13]. While easy-to-install BWIM techniques using accelerometers have been proposed, they suffer from two major disadvantages. For example, in multiple vehicle cases, the bridge acceleration measured by sensors is excited by all vehicles present on the bridge. In such cases, the identification of each vehicle’s weight can become an ill-conditioned problem, especially when the distances among vehicles are relatively short compared with the bridge span, thus reducing the identification accuracy of each vehicle’s weight [14]. Moreover, it is reported that the longitudinal location, as well as the lateral location, of the vehicle at each time instant is also an important factor affecting the identification accuracy [15], while such location information is, unfortunately, not necessarily available accurately in real cases. A method that is not sensitive to the presence of surrounding vehicles and the vehicle’s exact location on the bridge is desired.
In this paper, a BWIM method that uses the acceleration responses at bridge joints is proposed. When vehicles enter or leave the bridge, impact responses are observed due to the joints at the ends of the bridge. This impact acceleration response is influenced by many factors, including the vehicle’s weight, instant driving speed, and the number of axles while remaining insensitive to other vehicles at other locations on the bridge. A field test is conducted at a two-lane girder bridge, and a data-driven approach utilizing the impact acceleration is proposed. Training data for passing vehicles’ weight and instant driving speeds at joints were recorded by using a nearby vehicle weighing scale and a video camera installed near the bridge joints, respectively. Accelerometers were used to measure bridge joint responses. Convolutional neural networks (CNNs) are used for the detection of passing vehicles at joints and then for vehicle weight identification. The effect on the performance of the network from using different types of input dataset is investigated by comparing a 1-D and a 2-D network structure. The network is trained on one lane of the bridge and then successfully extended to another lane by transfer learning, showing the practicality of the proposed method. While there are BWIM proposals based on deep learning on bridge response data [16], this paper is unique in the sense that only girder-end acceleration signals are employed. A 2-D network structure with a wavelet transform is shown to improve the performance, and transfer learning is shown to be effective.
This paper is organized as follows: In Section 2, the experimental setup for the field test to obtain various types of dataset is described and some representative data are provided. In Section 3, a neural network is proposed to detect the passing vehicle from the measured bridge responses and to divide the passing vehicle into several classifications according to their passing lanes. Based on the vehicle detection from Section 3, Section 4 proposes a 1-D regression CNN for the vehicle weight and driving speed for passing vehicles, whose performance is compared with the 2-D CNN described in Section 5. To increase the practicality of the proposed method, a transfer learning technique is used to extend the trained model to another bridge joint in Section 6. Finally, some conclusions are drawn in Section 7.
2. Field Test and Data Collection
2.1. Experimental Setup
The field test was conducted at an expressway bridge in Gifu prefecture, Japan. The bridge is a two-span steel girder bridge with a length of 74 m. A camera was located near the exit of the bridge to capture the video of each vehicle passing over the joint. To measure the vehicle-induced responses, two wireless sensors equipped with Epson MA351AU three-axis accelerometers were located on the entrance joint of the bridge [17, 18], namely, Accelerometer I and Accelerometer II, at each lane of the bridge. The sampling frequency was set to be 100 Hz, and an internal finite impulse response (FIR) Kaiser filter was employed. As the proposed data-driven approach necessitates utilizing high-frequency acceleration responses at the joint and given that the gentle roll-off filter does not eliminate signal components above the cutoff frequency, a postprocess band-pass filter was then implemented to acquire a 10–20 Hz signal, which corresponds to the vehicle excitation frequency [19]. The top view of this bridge and the experimental setup are shown in Figure 1. In addition, the weight of all vehicles entering the bridge was measured at a weighing station embedded in the pavement several kilometers away from the bridge.

The experimental setup consists of three subsystems, including (a) camera, (b) weighing scale, and (c) accelerometers. The functions of these subsystems are briefly introduced herein. For each vehicle coming into the bridge, the camera captured a video of this vehicle, from which the driving speed, number of axles, appearance interval, and plate number were extracted by implementing a computer-vision technique described in [16]. The vehicle appearances were utilized to compare with the monitoring system at the weighing scale to extract the measured weight of the corresponding vehicle [20]. The accelerometers and the camera are synchronized through postprocessing, making it possible to extract the vehicle-induced responses for each specific vehicle. In this manner, the bridge acceleration responses, the vehicle’s entering and leaving time, the vehicle’s weight, driving speed, and other vehicle information are obtained for each vehicle, forming a database for the training of the deep learning network.
2.2. Brief Analysis of Collected Data
Typical impact acceleration responses induced by a four-axle passing vehicle with a weight of 11.4 t and a driving speed of 75.5 km/h are depicted in Figure 2. The distances between the axles of this vehicle are 1.76 m, 4.24 m, and 1.14 m, respectively. It is shown that the vertical impulse acceleration is a local response within 1 second. The impulse response of the four axles can be roughly detected, as shown by the dashed lines in the figure. Because this vertical impulse acceleration mostly contains high-frequency components, sensors that have good performance in high-frequency ranges are preferred. Intuitively, the peak values of the joint acceleration should increase together with the weight of the passing vehicles. However, a time history of around 400 seconds recorded by using Accelerometer I, which is labeled with camera detection and measured vehicle weights, shows that a heavier vehicle weight does not necessarily result in a larger peak acceleration value, as shown in Figure 3. This phenomenon indicates that the vehicle’s weight may be related to some other factors with a more complex relationship.


The above phenomenon is further illustrated in Figure 4. The maximum acceleration is plotted against the vehicle’s weight and driving speed for a total of 5900 vehicle passages. Each vehicle is presented by a data point in the figure. From Figure 4(a), it is observed that, with an increase in the vehicle’s weight, the maximum acceleration also becomes larger. The data are found to be very scattered, indicating that large inaccuracy will occur if we use this linear trend to predict the vehicle’s weight from the maximum joint acceleration. This is because the impact acceleration response is also affected by driving speed, number of axles, axle distance, the vehicle’s passing route, and so on, presenting a highly nonlinear relationship. In Figure 4(b), the maximum acceleration shows no significant relationship with the vehicle’s driving speed. Therefore, an algorithm that can well reflect the nonlinear relationship among the vehicle’s weight, driving speed, and joint acceleration, and many other factors are needed.

(a)

(b)
3. Vehicle Detection Based on Joint Acceleration
Correct detection of vehicles entering and leaving the bridge provides the foundation for vehicle weight estimation. In this section, a CNN-based classification algorithm is proposed to determine whether a vehicle is passing across the bridge joint within a time window and to determine from which lane the vehicle is entering the bridge. In the proposed network, the input is a matrix with 2 rows and 300 columns, representing the time histories measured by using Accelerometers I and II within a 3-second time window. The output of this CNN structure is a vector containing three values representing the three categories, namely, “vehicle passing on Lane I,” “vehicle passing on Lane II,” and “not detected.” The structure of the CNN is given in Figure 5, and the labels of the categories are listed in Table 1.

For the purpose of the classification, a softmax function, which is expressed in equation (1), is adopted for the last fully connected layer [21]:where ajc represents the cth output element of the last fully connected layer of the jth sample and Sjc is the corresponding normalized value. Through this function, the output vector contains values within the range between 0 and 1.
The loss function is defined by cross entropy and is expressed in equation (2) to quantify the error level between the predicted categories and the real categories:where
For the training data labeled as Category I and Category II, the acceleration time histories at Accelerometers I and II corresponding to vehicle passage over the joint were extracted based on the recordings from the camera. Each category has 9730 labeled time histories for training. For Category III, acceleration was extracted from the time periods in which vehicles are not at the joint, including the case of ambient vibration and the case where vehicles are on other parts of the bridge. The amount of data in Category III is made equal to the number of the other two categories.
The labeled data were used to train the network by a stochastic gradient descent method. Before starting training, 80% of the data were randomly selected as training data, while the other 20% were used as validation and test data. The accuracy and the loss function are plotted in Figure 6 for each iteration of the training process. Validation data are used to test the accuracy of the network every 50 iterations. The training process is terminated when the loss function is larger than the previously smallest values three times in series. In Figure 6(a), the accuracy, which is defined as the ratio of the count for correct samples to the number of all samples, keeps increasing for both training and validation data with the number of iterations. After around 900 iterations, the abovementioned termination criteria are met, and the training process is thus stopped. The final accuracy of the trained network reaches 94.93%.

(a)

(b)
To evaluate the classification accuracy of the trained network, a confusion matrix is constructed, as shown in Table 2, which includes the classification results of all the test data. The values of the diagonal elements indicate the number of results correctly classified, while other values indicate the wrongly classified results. The accuracy is calculated and listed at the bottom of the matrix for each category, showing that the presence and the passing lane of the coming vehicle can be detected with acceptable accuracy.
Once a vehicle passing across the joint is detected, the identification of the vehicle weight and the driving speed is the next step, which is discussed in the following sections.
4. Vehicle Weight and Speed Identification through 1-D CNN
In this section, the impact acceleration responses at bridge joints are used to give an estimation of the vehicle’s weight and driving speed using 1-D CNN for regression. The labeled training data include the vehicle-induced joint vertical impulse acceleration, the vehicle’s weight, and instant driving speed. The weight and driving speed are extracted from a weighing scale and a camera, respectively.
4.1. 1-D CNN Structure for Vehicle Weight and Speed Identification
From Section 3, the acceleration signals represented in Figure 2 can tell whether a vehicle is passing over the joint. However, in addition to the presence of the vehicle, further details about the passing vehicle can also be extracted from the acceleration signals, i.e., the vehicle’s weight and driving speed. In this regard, a 1-D CNN structure is constructed, as shown in Figure 7, where the recorded signal of one accelerometer corresponding to the detected lane of the vehicle is adopted as the input of the network, and the output vector contains the passing vehicle’s weight and driving speed. Before being fed into the neural network, the time history undergoes a time-shifting process to position the maximum absolute value of the input acceleration at the center of the time history. This step aims to reduce the network input’s complexity and enhance the accuracy of vehicle weight and speed identification.

Similar to the training process for vehicle detection described in Section 3, 80% of the data serve as training data, while the other 20% are adopted as the validation and test data. Before training, the input matrices are normalized by dividing the maximum values of the weights and speeds of all passing vehicles. The output of the network is also normalized as follows:
In this manner, the output values of the network are all in the range between 0 and 1.
To quantify the prediction error of the vehicle weight and driving speed, the loss function for the regression CNN network is defined as follows:where and si stand for the weight and driving speed of the ith vehicle sample and the subscripts “pre” and “tar” indicate the predicted and target values, respectively.
4.2. Training Process and Test Results for 1-D CNN
The evolution process of the loss function is shown in Figure 8. As the training process starts, the loss function starts to decrease from a high value. Same as in Section 3, the validation data are applied to the network every 50 iterations, and the training process stops when the loss function is larger than the previously smallest values three times in series to prevent overfitting.

For the estimated values of vehicle weights and driving speeds, the target value and predicted value from the network for all test data are plotted in Figures 9(a) and 9(b), respectively. A reference line, y = x, is plotted in the figure. Ideally, the data points represented by circles in grey should be on the reference line. However, due to many factors, including training error and measurement noise, the data points are scattered around the reference line. To quantify the scatter level of these data points, a correlation coefficient (CC) is calculated following the definition of equation (6) and is shown together in the figure:where xi, and yi are substituted by the target and predicted values of the ith vehicle. For the vehicle weight and driving speed, this coefficient is calculated to be 0.87 and 0.69, respectively.

(a)

(b)
An error distribution histogram for vehicle weight and driving speed prediction is plotted in Figure 10 as the error definition ofwhere εi indicates the relative error of the ith vehicle. The error distribution has its highest value around zero, indicating that the proposed algorithm neither overestimates nor underestimates the target.

(a)

(b)
To further evaluate the entire performance of the network, a mean absolute error (MAE) and a root mean square error (RMSE) are defined in equations (8) and (9), respectively,where N is the total number of vehicles.
From the abovementioned definitions, the MAE and RMSE of the vehicle’s weight and driving speed are calculated and listed in Table 3. It is observed that the estimation accuracy is within an acceptable level, while RMSE is usually larger than MAE due to the existence of very large and low, and thus rare, vehicle weights and driving speeds, as indicated in Figure 9.
Note that, in this network, the recorded acceleration responses are put into the network without any preprocessing technique. The network needs to find out the deep relation between the input and output data. To increase the prediction accuracy of the vehicle’s weight and driving speed, a 2-D CNN based on wavelet coefficients is explained in the next section, whose performance is evaluated and compared with the one given by the 1-D CNN.
5. Vehicle Weight and Speed Identification through 2-D CNN
When utilizing CNNs to complete a task, the selection of the input matrix is important and may have a significant influence on the accuracy of the prediction. Although a strong and powerful network has the capability to extract the high-dimensional relation between the input and the output, it is always expected that the relevant properties are not buried too deep in the input signals. For the identification task of the vehicle weight and driving speed, although it is still possible to use the raw signals measured by using the accelerometers in the same way as in Section 3, it is better to apply preprocessing to expose some key properties of the joint acceleration response. In this section, a wavelet transform is adopted and briefly reviewed herein.
5.1. Wavelet Transform to Generate 2-D Signals
The continuous wavelet transform, which is defined in equation (10), is widely used in signal processing and vibration analysis:in which x(t) is the input signal to be analyzed, s(f) is the scale parameter related to frequency f, and indicates the complex conjugate of the mother wavelet Ψ. In this study, the mother wavelet Ψ is chosen to be a Morse wavelet. The shape of the wavelet is decided by a symmetry parameter and a time-bandwidth parameter. In this study, the abovementioned two values are empirically set at 3 and 60, respectively.
Figure 11 shows the wavelet transform of a time history recorded by using Accelerometer I induced by a two-axle vehicle passing on Lane 1. Note that these wavelet coefficients are both time- and frequency-related. In this manner, the original 1-D signal is converted to a 2-D matrix, which is helpful to develop the advantages of CNNs. Moreover, the properties of the recorded signal in both the time and frequency domains are clearly expressed in this matrix, making it easier for the CNN to extract its hidden features.

(a)

(b)
5.2. CNN Structure for Vehicle Weight and Speed Identification
As stated above, the input matrix of the network for vehicle weight and speed identification is the wavelet coefficients. For a 3 s time history with 100 Hz sampling frequency as in the current case, the wavelet analysis gives 54 values along the frequency axis. Therefore, the input matrix in this study has a size of 54 × 301, corresponding to the frequency and the time domain. Two convolution layers with sizes of 20 × 20 and 10 × 10 follow the input matrix. ReLU function is used after each of the convolution layers to provide the network with the ability to extract nonlinear features. The maximum pooling layer has a stride of 2 × 2. The fully connected layers lead to an output vector containing two elements, which are represented by vehicle weights and driving speeds, respectively. The structure of the network is depicted in Figure 12.

5.3. Training Process and Test Results
The loss function is defined in the same way as in Section 4 and is plotted with the training process in Figure 13. This curve is similar to the one in Figure 8 that shows the training process for the 1-D CNN. It is observed that the final value of the loss function is lower than its peer in the 1-D CNN, indicating that the new network proposed in this section performs better than the previous one.

The target and prediction values are plotted in Figure 14. Similar to the points shown in Figure 9, the data points are scattered around the reference line. However, the data points are closer to the reference line compared with the results from 1-D CNN, as is reflected by the CCs by comparing with Figure 9.

(a)

(b)
The relative error distribution is plotted in Figure 15. For both the vehicle weight and driving speed, the histograms are observed to be sharper and narrower compared to those in Figure 10, indicating that the estimation accuracy is increased using the 2-D structure. Table 3 summarizes the MAE, RMSE, and CC for the two CNN structures. Results show that the 2-D structure performs better than the 1-D structure from the perspective of all accuracy indices.

(a)

(b)
The test results are also examined from the aspect of the vehicle weight histogram, which is of significant importance in the field of bridge weight-in-motion because it is highly related to bridge fatigue life. Such a histogram could be directly adopted in the fatigue analysis. In Figure 16, the target and predicted histograms are plotted together for comparison. It is observed from the target histogram that most vehicles are within the ranges of 12–16 t and 20–25 t. The predicted histogram is shown to coincide well with the target one.

6. Possibility of Algorithm Extension by Transfer Learning
The abovementioned training and testing processes are based on Accelerometer I located at the entrance of Lane 1. Strictly speaking, the trained networks are only suitable for the vehicles entering Lane 1 of this bridge, where the network is trained. The practicality of the proposed network is thus quite limited. In engineering practice, it is desired that the trained network can be conveniently adopted when applied to other bridges. However, although the quantitative nonlinear physical relation between the vehicle’s weight and vertical acceleration may be different because of the construction details of the joints, some basic characteristics may be common. For example, the impact acceleration responses at bridge joints are all impulse-like responses, with the number of peaks roughly determined by the number of axles, and the peak values of the impulse are related to the vehicle weight. These similarities provide the basis for the extension of the proposed network to other lanes or even other bridges. The extension is analyzed through a technique known as transfer learning. If two tasks share similar characteristics, the trained network from the first task is adopted as the initial value of the network for the second task. In this manner, the number of training data needed for the second network can be much smaller than in the case where the network is randomly initialized.
In this paper, transfer learning is tested by using vehicle data entering Lane II. From measurement records, there were 5160 vehicles entering from Lane 1 and 740 vehicles entering from Lane II. The same training process is conducted, and the training process is plotted in Figure 17. On the other hand, as a test to illustrate the effectiveness of transfer learning, the first convolutional layers, ReLU layers, and pooling layers are frozen, while only fully connected layer parameters are trained, which are plotted in the same figure. It is clear that the training process with and without transfer learning behaves much differently. In the case of no transfer learning, the termination criterion is satisfied much earlier than in the case of transfer learning, and the effect of overfitting is also observed, possibly due to insufficient training data. When transfer learning is applied, it takes more iterations to reach the termination criterion, and the final value of the loss function becomes lower than the one without transfer learning.

The target and predicted values of the test data are plotted in Figure 18. The results for training with and without transfer learning are plotted together. Because these two sets of data have the same number of data points, the scattering of the points without transfer learning clearly indicates that the adoption of transfer learning does have a good effect on higher estimation accuracy.

Table 4 shows some details of the training process discussed above. The number of vehicles passing over Lane II is much less than the number passing Lane I. This reduction in the number of training data undoubtedly leads to a larger error, i.e., MAE and RMSE, and a lower CC when estimating the passing vehicle’s weight. Once transfer learning is introduced, the estimation error decreases to a comparable level as in the case where sufficient training data are available.
This section tests the possibility of using transfer learning when there is not sufficient data to train the network from the beginning. In this paper, the training data for vehicle weights are obtained from a WIM station nearby. A future BWIM scenario based on this paper’s findings is as follows: for bridges where there is no WIM station available, portable BWIM systems based on accelerometers can be temporarily adopted on the bridge to obtain the passing vehicle’s weight for training [7, 22]. Once the training process is finished using the proposed method, most accelerometers on the bridge can be removed, while only one accelerometer for each lane at the bridge joint remains on the bridge for long-term BWIM purpose.
7. Summary and Conclusions
This paper proposes a data-driven approach for the purpose of BWIM. In the first step, a classification CNN is proposed to detect the vehicle’s presence on the joint and classify the vehicles by their passing lanes. Once the vehicle is detected, the regression CNN is used to predict the weight and driving speed of passing vehicles. The labeled training data are extracted from a WIM station and camera set nearby. The possibility of adopting transfer learning to extend the feasibility of the proposed method is tested. The following conclusions are drawn from this study.(1)By attaching two accelerometers on different lanes of the joint, the vehicle’s entrance as well as its passing lane can be detected through a classification CNN.(2)The relation among the vehicle’s weight, driving speed, and impact acceleration responses at bridge joints can be trained through a deep learning technique only from one accelerometer at the bridge joint.(3)Although the vehicle’s weight and driving speed can be predicted directly from the acceleration time history, the prediction accuracy can be improved by preprocessing of the wavelet transform to expose more details in the frequency domain.(4)When the training data are not sufficient (e.g., only a portable and temporary BWIM system is available for the target bridge), the proposed network can serve as a foundation for the implementation of transfer learning.
Data Availability
The data used to support the findings of this study are available from the authors upon reasonable request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was partially supported by the Council for Science, Technology, and Innovation, “Cross-Ministerial Strategic Innovation Promotion Program (SIP), Infrastructure Maintenance, Renovation, and Management” (funding agency: JST), JSPS KAKENHI (Grant No. 17H03295), and Chenguang Program (Grant No. 20CG27), the Shanghai Education Development Foundation, and the Shanghai Municipal Education Commission. The authors would like to appreciate Dr. Tajiri and Dr. Sanada from the Central Nippon Expressway Company and Dr. Suganuma from TTES for their valuable comments on this work.