Abstract
Pavement damage is the main factor affecting road performance. Pavement cracking, a common type of road damage, is a key challenge in road maintenance. In order to achieve an accurate crack classification, segmentation, and geometric parameter calculation, this paper proposes a method based on a deep convolutional neural network fusion model for pavement crack identification, which combines the advantages of the multitarget single-shot multibox detector (SSD) convolutional neural network model and the U-Net model. First, the crack classification and detection model is applied to classify the cracks and obtain the detection confidence. Next, the crack segmentation network is applied to accurately segment the pavement cracks. By improving the feature extraction structure and optimizing the hyperparameters of the model, pavement crack classification and segmentation accuracy were improved. Finally, the length and width (for linear cracks) and the area (for alligator cracks) are calculated according to the segmentation results. Test results show that the recognition accuracy of the pavement crack identification method for transverse, longitudinal, and alligator cracks is 86.8%, 87.6%, and 85.5%, respectively. It is demonstrated that the proposed method can provide the category information for pavement cracks as well as the accurate positioning and geometric parameter information, which can be used directly for evaluating the pavement condition.
1. Introduction
Pavement distress is the main factor affecting road performance. Timely and accurate detection of pavement damages is a crucial step in pavement maintenance. Cracks are the initial manifestation of various types of pavement diseases. Pavement cracks will not only affect pavement appearance and driving comfort but also can easily expand to cause pavement structural damage and shorten the overall service performance and life of the pavement [1, 2]. Therefore, early crack detection and timely maintenance of the cracked pavement can reduce the economic cost of pavement repairing and ensure the safety of vehicles and drivers transiting on the pavement.
Early pavement detection and maintenance mainly rely on manual detection, which is not only time-consuming and laborious but also has low detection accuracy and some associated risks [3–5]. Scholars from across the world, taking advantage of recent science and technology developments, have carried out a series of extensive and in-depth research to accurately and efficiently extract crack information from images [6–8]. In 2014, Wang et al. [9] proposed a pavement crack extraction method based on the valley bottom boundary; it uses a series of image processing algorithms to obtain the crack detection results. In 2015, Liang et al. [10] proposed a pavement crack connection algorithm based on the Prim minimum spanning tree, which obtains the crack structure by filling the fracture.
The disadvantages of these traditional crack detection methods are obvious. Each method is designed for a specific database or scenario, but the crack detector will fail if the dataset or scenario changes. With the rapid development of artificial intelligence, convolutional neural networks (CNNs) have been widely applied in the field of image recognition [11–14]. In recent years, deep learning methods have been increasingly applied to pavement crack detection and segmentation [15, 16]. Combining deep learning methods with pavement crack detection techniques considerably improves the efficiency and accuracy of pavement crack detection [17].
In 2016, Zhang et al. [18] proposed a crack detection method based on deep learning. They trained a deep CNN based on supervised learning, proving the feasibility of combining deep learning with pavement crack recognition. In 2017, Zhao et al. [19] proposed a pavement crack detection method based on a CNN using images of different scales and taken at different angles for training, achieving the detection of cracks of various shapes. However, owing to road surface interference and noise, the detection accuracy of this system peaked at 82.5%. In 2017, Markus et al. developed the open dataset GAPs for the training of deep neural network and evaluated the pavement damage detection technology for the first time, which is of great significance [20, 21]. In 2018, Nhat-Duc et al. [22] established an intelligent method for the automatic recognition of pavement crack morphology; this study constructs a machine learning model for pavement crack classification that included multiple support vector machines and an artificial swarm optimization algorithm. Using feature analysis, a set of features is extracted from the image projection integral, which can significantly improve the prediction performance. However, the algorithm is complex and programming it becomes very difficult. In 2020, Zhaoyun Sun et al. [23] proposed a method to detect pavement expansion cracks with the improved Faster R-CNN, which can achieve accurate expansion crack location detection through the optimization model. The aforementioned studies only detect and classify pavement cracks and their location but cannot quantify certain crack characteristics, such as crack width and area. On the other hand, there are also many studies on crack segmentation. In 2018, Zhang and Wang [24] proposed CrackNet, which is an efficient architecture based on CNN to predict the class of each image pixel, but its network structure is related to input image size, which prevents the generalization of the method. In the same year, Sen Wang et al. [25] proposed to use the full convolutional networks (FCNs) to detect cracks and built the Crack-FCN model taking into account the shortcomings of the FCN model in the crack segmentation experiment and obtained a complete crack image. However, the highest accuracy obtained by their method is only 67.95%; thus, segmentation performance needs to be improved. In 2019, Piao Weng et al. [26] proposed a pavement crack segmentation method based on the VGG-U-Net model. It solves the problem of fracture in the crack segmentation result in complex background, but its training time is slightly longer and its efficiency is low. In 2020, Zhun Fan et al. [27] proposed an encoder-decoder architecture based on hierarchical feature learning and dilated convolution (U-HDN) detects cracks in an end-to-end manner. The U-HDN method can extract and fuse different context sizes and different levels of feature mapping, so it has high performance. In the same year, Zhun Fan et al. [28] proposed an ensemble of convolutional neural network based on probability fusion for automatic detection and measurement of pavement cracks, and the predicted crack morphology is measured by skeleton extraction algorithm. In summary, these previous studies only use the segmentation method, which cannot achieve accurate crack classification and location determination.
Although all of these methods were able to recognize crack diseases from pavement images to a certain extent, some problems remain: (1) Most algorithms still extract crack information using feature extraction, which is relatively complex and programmatically difficult [6–8]. (2) Although deep learning-based pavement crack recognition algorithms already exist, they still do not eliminate the specificity requirement of “one device, one algorithm.” If the source image is taken by different devices or on different road sections, using a single dataset leads to inaccurate results. Therefore, their adaptability is poor [18, 19]. (3) Complex environmental factors affect the stability and accuracy of crack identification algorithms [29]. (4) Although there is already a pavement crack recognition method based on CNNs, the model has a single function and most of the cracks are classified and roughly positioned through the detection box and cannot be directly used for evaluating road conditions [23–26].
Given the abovementioned problems in pavement crack identification, this paper proposes a method based on a deep convolutional neural network fusion model for pavement crack identification, which is applicable in many crack detection cases (including detector vehicle and smartphone). By training on a learning image data having a variety of sources and sizes, the method can effectively identify cracks, and recognition accuracy can be guaranteed. At the same time, a detected crack can be segmented, and the segmented binary image can be used to calculate the geometric parameters of the crack. Therefore, the proposed model is of great significance for intelligent pavement detection and it can also achieve detection and segmentation simultaneously, thereby significantly improving model efficiency.
2. Methodology
In this paper, a crack identification method based on a deep CNN fusion model is proposed. First, the image dataset is established, and the image noise in the dataset is filtered out to increase the contrast between road cracks and background. Next, the processed images are provided as input into an improved single shot multibox detector (SSD) crack detection model and an improved U-Net crack segmentation model for training. Then, the binary image of a crack obtained by the segmentation model is used to calculate the geometric parameters of the crack. By integrating the advantages of the two models, this pavement crack identification method can effectively overcome the single-model limitations of inaccurate positioning and imperfect information. The overall process flowchart is shown in Figure 1. The details of each step are discussed in Section 2.1.

2.1. Image Collection and Preprocessing
In order to be applicable to crack detection in a variety of scenarios, the proposed method uses a detector vehicle (Figure 2(a)) and a smartphone (Figure 2(b)) to collect crack images. The pixel of the image captured by the detector vehicle is 1024 960, and the pixel of the image captured by the smartphone is 2560 1024.

(a)

(b)
The pavement crack images are preprocessed before network training to reduce the noise in the images and improve the prediction accuracy. Preprocessing consists mainly of augmenting and denoising the pavement crack images.
First, the number of images needs to be increased. As it is difficult to distinguish the effects of rotation by using an actual crack image, a black-and-white double-arrow picture is used here to exemplify how to increase the number of images (Figure 3). By horizontal reflection, vertical reflection, and clockwise rotation of the image by 45°, 90°, and 180°, the training image dataset can be expanded eightfold.

The images, as taken by the camera, are seriously affected by discrete pulse noise and zero-mean Gaussian noise. Therefore, median and bilateral filters are used to denoise the road images. Then, contrast enhancement is performed to increase the difference between crack information and road background, which improves the quality of the sample images. As seen in Figure 4, the processed crack information is more prominent than in the original image.

(a)

(b)
2.2. Classification and Detection Using Improved SSD Model
The SSD [30] network model used in this study was proposed in 2016; it combines the characteristics of the You Only Look Once (YOLO) model [31], which provides fast speed, and the Faster R-CNN (Region-CNN) model [32], which provides accurate recognition. We use Tensorflow and Keras framework to implement the SSD network.
The characteristic feature of the SSD network model is its capability of performing multiscale feature map detections. The model adds some convolutional feature layers at the end of the feature extraction network, and the feature maps extracted from these convolutional layers have the feature of decreasing in size. Image prediction is carried out by means of fusion of the multiscale detection results. In Figure 5, the feature fusion of conv4_3 and , two convolution layers, is given as an illustrative example. conv4_3_norm_priorbox sets each point to generate four preselected boxes. The sample dataset used in this study contains three categories; thus, the value of the conv4_3_norm_mbox_conf channel is 12 (). Each preselected box returns four position transitions, so the value of the conv4_3_norm_mbox_loc channel is 16 (). generates six preselected boxes per point, in addition to the others. Finally, mbox_conf and mbox_loc are merged, mbox_conf behind reshapes, and then softmax classification is performed. As can be seen from the above example, each convolution feature layer will produce corresponding prediction results, and finally, the prediction results on different scales will be fused to obtain the best fracture prediction results.

Increasing the number of network layers can improve the accuracy of the network in identifying pavement cracks. Therefore, in this study, the feature extraction network structure Visual Geometry Group 16 (VGG16) in the SSD network model was replaced with a deep residual network to improve the pavement crack identification accuracy. The deep residual network [33, 34] solves this problem by fitting a residual map instead of the original map and by adding multiple connections between layers.
In order to convert the basic VGG16 network of the SSD model into a deep residual network, it is necessary to connect the convolution layer feature map output size in the residual network with the matching VGG16 convolution layer. Table 1 shows the names and outputs of convolution layers matching the outputs of the feature maps of the two feature extraction networks.
The structure of the improved crack classification and detection network is shown in Figure 6. The convolutional layer of different feature map sizes contains two kinds of convolution kernels. One is used for position regression of the prediction box, and the other is used for crack classification.

In order to improve the robustness of the model, all predicted crack boxes of different sizes at all positions of the feature map are combined to form a diversified prediction set. When the pavement crack images were input for training, with settings steps_per_epoch = 100 and final_epoch = 100 (100-step iterations), 10,000 training iterations were conducted. The test set was used for testing after each step. If val_loss was reduced, the weight file was saved to continue training until training was complete.
Figure 7 shows the loss function curves obtained by training on the pavement crack dataset in the improved model. The red line represents the training set loss curve, and the blue line represents the val set loss curve. When final_epoch = 90, the model is effectively stable.

The reason behind improving the model is to improve crack detection accuracy and to generate a better detection box to surround the cracks. In order to ascertain whether the model prediction results were improved after improving the crack detection model, model results before and after the improvement were compared using the test set (Table 2).
By constantly adjusting the hyperparameters in the model and comparing the accuracy of crack classification and detection, the precision–recall (PR) curves for the recognition of three types of pavement cracks (transversal, longitudinal, and alligator) were constructed (Figure 8).

(a)

(b)

(c)
Three types of cracks were randomly selected for testing, and the test results are shown in Figure 9. By comparing the prediction results of the model before and after the improvement for the same category and the same picture, it can be seen that the improved crack classification and detection model provides a higher degree of confidence for identifying the crack category in the pavement image, and the prediction results are more accurate, which demonstrates the effectiveness of model improvement and optimization.

(a)

(b)

(c)

(d)

(e)

(f)
By replacing the feature extraction structure of the original SSD network with the deep residual network, the network accuracy and recall rate in predicting pavement cracks were substantially improved. This analysis of the experimental results shows that the proposed method achieves good results in the classification and detection of cracks. From the prediction effect, however, a classification by the pavement crack detection method based on the single SSD crack location model is incomplete, is not conducive to subsequent crack geometry parameter computation steps, and will produce larger calculation errors. Thus, as the practical application value is still lacking at this point, this study adopted the fusion segmentation model approach to address this problem.
2.3. Crack Segmentation Using Improved U-Net Model
In the proposed method, the U-Net [35] model is used as the pavement crack segmentation model. We use Tensorflow and Keras framework to implement U-Net network. As shown in Figure 10, the U-Net structure is divided into two main parts, the first part for feature extraction and the second part for the upsampling operation; this structure is also known as the encoder-and-decoder structure. U-Net uses splicing for feature fusion and splicing features on channel dimensions to form richer features, thus facilitating the network’s learning of crack features.

The structure of the U-Net network is simple, and the original U-Net network has crack segmentation accuracy problems. Therefore, the feature extraction network of the U-Net crack segmentation model was also replaced with a deep residual network to fully extract crack features and ensure crack segmentation accuracy. The specific improvement steps are similar to those of the SSD model. As shown in Table 3, the two basic network feature graph outputs match the network layers. After adjusting the corresponding layers of the feature extraction network, it is still necessary to adjust the network parameters through continuous training to optimize the crack segmentation effect. The improved crack segmentation model is shown in Figure 11.

In the training of U-Net crack segmentation model, the ReLU function is used as the activation function and the input data samples are regularized many times. Regularization adjusts the output value of each convolutional network layer to the same distribution, thereby avoiding a deviation or change in the distribution of feature vectors caused by network deepening. The segmentation model uses the upsampling method. That is, the feature map with the new size is obtained by the convolution inversion operation, and the feature map with the size corresponding to the convolution layer is added as the upsampling result. The segmentation network performs upsampling of the feature extraction network feature maps with sizes of , , , and , and the upsampling process combines the feature extraction network feature maps with sizes of , , , and ; this improves the segmentation network accuracy through multilevel joint learning.
Figure 12 shows the loss function curve obtained by training the pavement crack dataset in the improved model. The red line represents the training loss curve of the training set, and the blue line represents the val loss curve of the test set. According to the loss curve, the model training effectively reached stability by the 50th epoch. The detection results for the test set before and after model improvement are shown in Table 4.

The model was improved to optimize the hyperparameters in the network. After comparing the segmentation effect under different hyperparameter settings, the activation function was set as the sigmoid function and the SGD optimizer was selected to optimize the network training. The PR curves of the pavement crack segmentation model before and after improvement for the three types of crack images are shown in Figure 13. As can be seen, the upper right convexity of the PR curve after improvement is more evident than before the improvement, indicating a better performance by the improved model.

(a)

(b)
Figures 14–16 show the segmentation effect of the three crack segmentation models for transverse, longitudinal, and alligator cracks, respectively. It can be seen that before improvement, the crack information obtained by the model is significantly wider and somewhat distorted when compared with the ground truth. The improved model, by contrast, can obtain more accurate segmentation results. Therefore, the improved method is more suitable for crack segmentation.

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)

(a)

(b)

(c)

(d)
The crack segmentation model uses the cascade mode of multiple residual elements, which can effectively extract the morphological characteristics of the pavement crack and improve the learning effectiveness of the neural network on the crack characteristics. A single crack segmentation method based on U-Net can provide the crack pixel location information, but it cannot classify the crack [35]. Therefore, in this study, a fusion of two models was adopted to identify pavement crack images and to obtain the crack category, location information, and geometric parameters, thereby facilitating accurate quantification and evaluation of pavement cracks.
2.4. Proposed Fusion Model
2.4.1. Fusion Model Design
As shown in Figure 17, when only the SSD detection network is used, the number of cracks can be accurately obtained, but the crack width cannot be quantified. Use of only the U-Net segmentation model will lead to misjudgment of the number of cracks; e.g., a fractured crack may be misidentified as multiple cracks. The fusion of the detection and segmentation networks can avoid this phenomenon and ensure that the crack is identified as a single crack. Thus, the advantage of the fusion model is that it can accurately identify the number of cracks and ensure that cracks are quantified correctly.

The proposed fusion model adopts the following order: (1) detection, (2) segmentation. Because the detection network will obtain the result of a detection box, which contains one crack, and then segment that crack. This method can not only ensure crack number accuracy but also quantify the crack width information, which can prevent the misjudgment of crack number caused by using only the segmentation model. If the order is changed, i.e., segmentation first and detection second, in the presence of fractures in the cracks, the segmentation result will be misjudged into multiple cracks. Moreover, the segmented image is a binary image, which is not suitable for detection.
The proposed fusion model can improve pavement crack identification accuracy and provide a good foundation for measuring crack parameters. The fusion model structure is shown in Figure 18. First, the input crack image enters the feature extraction network for feature extraction and learning. Convolutional feature maps with sizes of 64 64, 32 32, 16 16, 8 8, 4 4, 2 2, and 1 1 are used for multiscale fusion classification, and feature maps with sizes of 512 512, 256 256, 128 128, 64 64, and 32 32 are used for upsampling. After continuous calculation of losses and updating of the weights, the crack classification and position results are obtained. Then, the geometric parameters of the crack are calculated according to the segmentation results, and the identification results of the final fusion model are obtained.

The specifications of the fusion model process are shown in Figure 19. First, the standardized pavement crack image is input into the crack classification and detection model for classification. If there are cracks in the image, the categories and confidence values of the cracks are obtained. Next, the image of a detected crack and its category information are input into the crack segmentation model to obtain the precise pavement crack location, and the geometric parameters of the crack are calculated according to the segmented binary image of the pavement. Finally, the classified detection results are fused with the segmentation results, and the crack’s geometric parameters are calculated. The result is restored to the corresponding original image for display, and the final pavement crack identification result is obtained.

2.4.2. Feature Extraction and Analysis
The deep convolutional network is trained by means of supervised learning. By extracting the morphological features of the cracks and continuously comparing them with the tag values to calculate the loss, the parameters of each network layer are continuously adjusted to finally reach the state of small loss and accurate judgment. With the deepening of convolutional layers in the network model, the model has more powerful feature extraction capabilities and can therefore identify and detect more abstract information. At this stage of the experiment, the crack characteristics learned by the deep convolutional network are analyzed and studied, and the characteristics of the three crack types at different depths are displayed by means of feature visualization.
Figure 20(a) shows original transverse, longitudinal, and alligator crack images (left to right) that were provided as input into the model for feature visualization. Owing to the large number of feature maps generated by the convolution of each layer, it will cause confusion to display all the feature maps, so we can analyze the single feature map generated by each convolution kernel. Figure 20(b) shows a single feature map for each of three kinds of crack on a convolutional layer with an output size of 128 × 128. It can be seen from the feature map that the morphological features of the cracks are still relatively evident. The model has learned the curve shapes and directions of the three types of fractures, which can be discerned by the human eye. However, these features can be somewhat unclear. Figure 20(c) shows a single feature map for each of three kinds of crack on a convolutional layer with an output size of 64 × 64. Compared with the 128 × 128 crack feature map, the pavement crack feature obtained by this layer is less clear, and the location feature of the crack is magnified and brighter. However, the basic linear morphological features of the three crack types can still be distinguished by eye. Figure 20(d) shows a single feature map for each of three kinds of crack on a convolutional layer with an output size of 32 × 32. Compared with the other two convolutional layer feature maps, the brightness of the feature map of this layer is considerably improved, whereas the morphological features of the three crack types can no longer be distinguished, and the learned features are more complex.

(a)

(b)

(c)

(d)
From the above analysis, it can be seen that crack features extracted by different convolutional layers in the deep convolutional network are not the same. Meanwhile, with the deepening of network layers, crack features extracted by the deep convolutional neural network evolve from low-order features to high-order features. Therefore, in order to perform a comprehensive study of crack features and improve the identification accuracy of pavement cracks, it is necessary to fully extract their features by increasing the convolutional network depth.
2.5. Calculation of Crack Parameters
As shown in Figure 21, linear cracks (transverse and longitudinal cracks) and alligator cracks have different morphological characteristics; the former need two parameters, length and width, to be calculated, whereas the latter needs area parameters to be calculated. Therefore, these two types of morphological cracks are considered separately when the geometric parameters of pavement cracks are calculated. Each image is a two-dimensional image, in which each point represents a pixel. There may be differences in the size of the image captured by different cameras, but the size of the pixel points is fixed. According to the corresponding relationship between image size and the pixel points, the real crack parameter value (unit: centimeter) can be calculated; e.g., if the width of the image is 1000 pixels and the camera’s shooting range is 4 meters wide, a pixel point represents 4 m/1000 = 0.4 cm.

(a)

(b)

(c)
In the binary image output by the pavement crack segmentation model, the pavement background is in black (a pixel value of 0) and the crack information is in white (a pixel value of 1). The steps for calculating the geometric parameters of a crack are as follows.
2.6. Calculation of Alligator Crack Area
Step 1: set the initial size (in pixels) of the pavement crack as , and scan the pixels in the image from left to right and top to bottom. A pixel value of 1 means that it belongs to the crack area, and the area increases by 1: . Step 2: if the pixel value is 0, it is part of the background, and the scan moves to the next pixel. Step 3: the value of obtained after all pixels have been scanned is the crack area, denoted as .
2.7. Calculation of Linear Crack Length
Step 1: first, set the parameter , denoting crack length (in pixels), to an initial value of 0. Scan the pixels in the image columns from left to right. When a column is found to contain a pixel value of 1, . Continue until all columns are scanned, and retain the final value. Step 2: scan the pixels in the rows from top to bottom according to the same principle as in the previous step, and set the parameter as . If the value of any pixel in a row is 1, . Continue scanning all the rows, and keep the final value. Step 3: according to the Pythagorean theorem, crack length (in pixels) can be calculated as follows:
2.8. Calculation of Linear Crack Width
The width of the crack can be obtained by dividing the linear crack area by its length, as follows:
Through these steps, the length, width, and area of the crack can be expressed in pixels. Some pavement images were randomly selected for calculation of their geometric parameters, and the results are shown in Table 5.
3. Implementation Details and Results
3.1. Data Preparation and Environment Setup
In this research, a detector vehicle and a smartphone were used to collect crack images. The test sample dataset contained 8000 crack images, including 2800 transverse, 2800 longitudinal, and 2400 alligator crack images. With a ratio of 6 : 2 : 2, the sample data were divided into a training set (4800), verification set (1600), and test set (1600) for training and testing the pavement crack identification model. The size of each image is 1024 960 pixels. Figure 22(a) shows the proportional distribution of the numbers of the three crack types, and Figure 22(b) shows the proportional distribution of the dataset partitions for training. Details of the experimental equipment and software used in this study are given in Table 6.

(a)

(b)
3.2. Comparison of Each Individual Model Optimization
3.2.1. Learning Rate
For this study, a dynamic learning rate was adopted to adjust the learning rate of the model, thereby optimizing the learning rate efficiently and improving the network training efficiency. The initial learning rate can be set to be relatively large and can then be gradually reduced as the number of iterations increases. Figure 23 shows the learning rate of the pavement crack segmentation model during the training process. As the number of iterations increases, the learning rate is dynamically adjusted, becoming increasingly smaller.

3.2.2. Activation Function
The ReLU function is a linear function in the positive direction [36, 37]. In this study, the ReLU, tanh, and sigmoid functions were compared, as shown in Figure 24. When the SGD algorithm is used, the ReLU function converges faster than the other two functions, with the additional advantage of low computational complexity. It does not need to perform an exponential calculation; it only needs to set an activation threshold. Moreover, it is more suitable for back propagation, which can maintain a constant gradient and avoid the occurrence of gradient dispersion. For these reasons, the ReLU function was selected as the activation function of the network model proposed in this paper.

3.2.3. Optimizer
The role of the optimizer in the deep neural network is to update and calculate the network parameters that affect model training and model output so as to make them approximate an optimal value and to minimize the loss function. For deep convolutional neural networks, choosing an appropriate optimizer plays a decisive role in the final recognition accuracy of the model. In this study, Adam, SGD, AdaGrad, AdamW, and Nadam were used to train and test the model. The test results are shown in Table 7. When training on crack data, although the Adam optimizer has a relatively fast training speed, its crack prediction accuracy is not the best. Whereas differences in training times were small, accuracy values clearly differ. Therefore, SGD was selected as the model optimizer.
3.2.4. Transverse and Longitudinal Ratios
Owing to the linear shape of transverse and longitudinal cracks, there is a major difference between length and width. Considering the particularity of pavement crack morphology, this study modified the transverse-to-longitudinal ratio parameter in the model in an attempt to improve pavement crack identification accuracy. Results for the modified model tests are shown in Table 8. Modifications of the transverse-to-longitudinal ratio for training do not directly influence the model, but when the gap between the transverse and longitudinal ratios is set too large, the prediction accuracy for the alligator cracks is substantially reduced, reducing, in turn, the average precision of the model. Therefore, the transverse-to-longitudinal ratio parameter was set as (1.0, 2.0, 0.5, 3.0, 1.0/3.0).
3.2.5. Optimization Results
(a)Crack classification and detection network As shown in Table 9, after optimizing the feature extraction network and hyperparameters of the crack classification and detection model, the model accuracy is improved.
According to these pavement crack identification results, the improved and optimized method developed in this study is effective for classifying and locating cracks in pavement images.
3.3. Fusion Model Identification Results and Analysis
In order to demonstrate the effectiveness of the model proposed in this paper, three sample images of each pavement crack type were randomly selected from the test set, and the model-identified parameter information for each crack is shown in Table 11. The confusion matrices for the training and testing phases are shown in Figure 25.

(a)

(b)
The effectiveness of crack positioning is displayed visually in the form of images (Figures 26–28). As can be seen in the figures, the accuracy of crack identification and positioning is considerably improved by image segmentation. The length, width, and area of pavement cracks can be calculated more accurately by using the segmented binary image. At the same time, the crack information can be restored to the original crack image to accurately cover the crack area.

(a)

(b)

(c)

(a)

(b)

(c)

(a)

(b)

(c)
3.4. Comparison of Different Datasets
To illustrate the reliability of this experiment, we used three kinds of different datasets (our dataset, CFD dataset, and AigleRN dataset) to carry out the experiment, respectively. Figure 29 shows some images of three different datasets. CrackForest Dataset (CFD) is an annotated road crack image database which can reflect urban road surface condition in general. Its website is https://github.com/cuilimeng/CrackForest-dataset. The AigleRN dataset contains 38 images with pixel level annotation, which was obtained at driving speed, and the French road condition was regularly monitored using the Aigle RN system. Its website is https://www.irit.fr/∼Sylvie.Chambon/AigleRN.html. Table 12 shows the accuracy of the proposed segmentation model in the three datasets. Experimental results show that the model has good performance on three datasets.

(a)

(b)

(c)
4. Conclusions
In this research, a crack identification method based on a deep convolutional neural network fusion model is proposed. The strategy for model optimization was carried out through repeated experiments, and the model hyperparameters were optimized, effectively improving its pavement crack identification accuracy. In summary, the following conclusions can be drawn from this research:(1)To achieve accurate crack classification and segmentation, we propose a fusion model. The SSD network is used as the detection model and the U-Net network is used as the segmentation model, which can achieve crack classification and segments the cracks in the detection box at the same time. Moreover, crack length, width, and area parameters are calculated using the crack-segmented binary image. This method not only ensures the accuracy of crack number but also calculates crack parameters, which can prevent the misjudgment of crack number caused by using only a single model.(2)The SSD network model was proposed as the pavement crack classification and detection network. We improved the SSD model by replacing the VGG16 feature extraction network with the deep residual network. Experimental results show that the mAP of the improved model is 6.5% higher than that of the former model, which indicates that the classification and detection level of pavement cracks can be improved by optimizing the network. We made the same improvement to the U-Net model. By joining the segmentation model behind the detection model, we can solve the problem of inaccurate crack location in the pavement classification detection network. Results show that the precision of the improved segmentation model is 6.9% higher than that of the former. Therefore, the proposed fusion model has value in the field of pavement crack identification and classification.(3)Compare and selecte the learning rate, activation function, optimizer, and other parameters of the model. Experimental results demonstrate that the proposed model not only improves performance when compared to the original model but also achieves higher accuracy, which has certain practical application value.
In summary, the method developed in this study offers an intuitive display and accurate reference for pavement crack identification for future road surface maintenance and for automatic pavement crack repair based on the calculated parameter information it provides.
Data Availability
Some or all data, models, or code that support the findings of this study are available from the corresponding author upon reasonable request. Images from all sample sets used in this paper can be obtained from the corresponding authors after publication. Models and code used in this paper can also be obtained from the corresponding authors, if the requirements are reasonable.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Authors’ Contributions
LX, XF, and WL conceptualized the study; LX and LP curated the data; LX, XF, and ZM performed the formal analysis; LX, XF, and ZM were involved in the methodology; LX and WL were responsible for the project administration; XF, LX, and HJ collected the resources; WL was responsible for the funding acquisition; LX, ZS, and XF investigated the study; LX, ZS, XF, and HS validated the study; LX, LP, and ZM wrote and prepared the original draft; XF, LX, WL, LP,ZS, ZM, and HJ writing, reviewed, and edited the manuscript.
Acknowledgments
This research was funded by the National Key Research and Development Program “comprehensive transportation and intelligent transportation” special project-“road infrastructure intelligent perception theory and method” (Grant number: 2018YFB1600202), National Natural Science Foundation of China (Grant number: 51978071), and Fundamental Research Funds for the Central Universities, CHD (Grant numbers.: 300102249301 and 300102249306), China. This research was also supported by the Norman W. McLeod Chair in Sustainable Pavement Engineering, Centre for Pavement and Transportation Technology (CPATT), University of Waterloo, Waterloo, Ontario, Canada. Therefore, they are highly acknowledged.