Abstract

COVID-19 has become the largest public health event worldwide since its outbreak, and early detection is a prerequisite for effective treatment. Chest X-ray images have become an important basis for screening and monitoring the disease, and deep learning has shown great potential for this task. Many studies have proposed deep learning methods for automated diagnosis of COVID-19. Although these methods have achieved excellent performance in terms of detection, most have been evaluated using limited datasets and typically use a single deep learning network to extract features. To this end, the dual asymmetric feature learning network (DAFLNet) is proposed, which is divided into two modules, DAFFM and WDFM. DAFFM mainly comprises the backbone networks EfficientNetV2 and DenseNet for feature fusion. WDFM is mainly for weighted decision-level fusion and features a new pretrained network selection algorithm (PNSA) for determination of the optimal weights. Experiments on a large dataset were conducted using two schemes, DAFLNet-1 and DAFLNet-2, and both schemes outperformed eight state-of-the-art classification techniques in terms of classification performance. DAFLNet-1 achieved an average accuracy of up to 98.56% for the triple classification of COVID-19, pneumonia, and healthy images.

1. Introduction

COVID-19 is a highly contagious disease originating from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Since its appearance in December 2019, it has spread throughout the globe, which has forced countries to take drastic measures, including closing borders, canceling flights, and quarantining people in countries with related cases. Containing the virus spread appears to be a challenging task [1]. Due to the critical health risks associated with it, COVID-19 was declared by the World Health Organization (WHO) as an international public health emergency and pandemic on 30/01/2020 and 11/03/2020, respectively. As of June 20th, 2022, cumulative 524.17 million confirmed COVID-19 cases and 6.34 million deaths worldwide have been reported [2]. Although China has currently contained the epidemic, the situation in some other countries remains dire, and major outbreaks due to SARS-CoV-2 gene mutations are becoming more rapid [3].

As of this writing, the diagnosis of COVID-19 mainly relies on two strategies. The first approach is based on real-time reverse-transcription polymerase chain reaction (PCR), which is an RNA detection method performed on respiratory secretions obtained from the nasopharynx using swabs. The process takes several hours or days and requires a high level of technical proficiency and experience to identify the signs of COVID-19 infection accurately [4]. The second approach is chest X-ray imaging [5]; radiographs of COVID-19 patients show asymmetric peripheral gross glass opacities that are not present in healthy individuals. Manual interpretation by radiologists is easily influenced by fatigue and emotion, etc. Besides, the diagnosis throughput of human experts is not comparable with that of machines, while early symptoms are difficult to spot and may be overlooked by human experts [6]. Therefore, there exists an urgent need to develop a smarter and more accurate algorithm for the automatic detection of diseases such as COVID-19.

Recently, artificial intelligence has shown rapid growth, and researchers have used it to develop novel approaches to disease diagnosis. For example, Kaluri and Reddy [7] performed image feature region segmentation using modified region growing after preprocessing the image. The images were then classified using a genetic algorithm combined with a fuzzy classifier to optimize the selection law. Rajaraman and Antani [8] opted for phased deep learning strategies to train the network and achieved good results for COVID-19 diagnosis. Panwar et al. [9] investigated three datasets using transfer learning and rendered the features in a more interpretable way using the Grad-CAM visualization method. Similarly, Wang et al. [10] analyzed CT images of multiple patients and provided guiding opinions for the diagnosis of the disease. In one work, Rahimzadeh and Attar [11] achieved a classification accuracy of 91.4% using Xception and ResNet50V2 networks for the triple classification of COVID-19, pneumonia, and normal images.

The accuracy of disease diagnosis is mainly influenced by the feature extraction and classification abilities of the network. Related studies from these two aspects are presented next.

First, feature extraction can be improved through feature screening and fusion. Koppu et al. [12] proposed a new method for data cleaning and filtering through the Fitness Oriented Dragonfly Optimization algorithm (F-DA). This method was experimentally proven to be superior to FireFly, gray wolf optimization, and particle swarm optimization for feature filtering. Kaluri and Pradeep Reddy [13] proposed a self-improved genetic algorithm to extract useful features which is more powerful than traditional feature extraction methods. Zhang et al. [14] presented DenseNet with optimized transfer learning and trained the network with comprehensive learning factors, resulting in their proposed DenseNet-OTLS network. The network used optimization of transfer learning setting strategy assigned different learning factors to the frozen, intermediate layer and new layers, and then, training was resumed to find the optimal feature information using some of the parameters learned from migration. Varela-Santos and Melin [15] utilized the gray-level cooccurrence matrix to extract the features through the pooling of two datasets and used a neural network for triple classification, resulting in an accuracy of 98.8%. Moreover, Ozturk et al. [16] utilized a network based on 17 convolutional layers, where each layer used a different filter to extract different features from the different layers. The model was labeled as DarkNet and achieved 98.08% accuracy in dichotomous classification and 87.02% accuracy in multiclassification. All the above algorithms employed single features and did not perform feature fusion. However, multifeature fusion results in a more representation of information and consequently in increased accuracy. For instance, Chen et al. [17] used ResNet and DenseNet to extract features and weighted them with multiple classifiers to classify 14 lung diseases (the model is called DualCheXNet). Likewise, Wang et al. [18] collected images from local hospitals and used an improved transfer learning method to introduce two networks, DenseNet and ResNet, for discriminant correlation analysis feature fusion and further classification (CCSHNet model). Recently, Wang et al. [6] utilized the GCN as a backbone network and proposed a deep feature fusion method with high classification accuracy for COVID-19.

Second, the network’s classification capability can be improved through the replacement of classifiers or by performing decision-level fusion. For example, similarly, Toğaçar et al. [19] trained deep learning models (MobileNetV2 and SqueezeNet) on three datasets and used an SVM as a classifier for data classification, thus not only utilizing a deep learning network framework but also applying machine learning methods. Chen et al. [17] improved classification accuracy by balancing three classifiers for decision-level fusion for the identification of fourteen lung diseases and provided experimental evidence of their method’s effectiveness. Canayaz [20] used SVM to replace the original deep learning fully connected layer in order to improve the classification accuracy after feature filtering using two metaheuristic algorithms, resulting in 99.38% accuracy for COVID-19 multiclassification.

In past studies, the use of separate networks to extract features was common but did not result in the extraction of rich features. Furthermore, most methods used one classifier with an unbalanced classification ability, which led to a low diagnostic performance. In this paper, both feature extraction capability and disease classification capability are taken into account through the use of two separate networks to extract features for fusion. In this manner, the feature extraction capability is improved. Then, the weighted decision results of three classifiers are used to improve classification capability. Disease feature learning and disease diagnosis results are influenced by the size of the dataset, and many past studies have commonly used datasets with a small number of images. In contrast, in this study, a dataset with 9208 images with rich samples is used. Also, a spatial attention mechanism module and a channel attention mechanism module are introduced in the network to further improve its feature extraction ability.

From the technical point of view, the main contributions of this work are as follows: (1)The efficient classification model DAFLNet is proposed, which consists of two modules, the dual asymmetric feature-level fusion module (DAFFM) and the weighted decision-level fusion module (WDFM). In addition, two schemes are proposed, which are DAFFM-1 and DAFFM-2(2)The proposed model employs both pixel-level fusion and weighted decision-level fusion(3)To determine the optimal weights for weighted decision-level fusion, a new pretrained network selection algorithm (PNSA) is proposed(4)Multiway data augmentation is performed on chest X-ray images (CXRs) to improve sample diversity(5)Compared to 8 state-of-the-art diagnosis methods for COVID-19, experimental results show that DAFLNet is effective and gives better performance

The structure of this paper is organized as follows. An overview of recent attempts to detect COVID-19 in medical images and the relevant studies reported in this regard are first presented. In Section 2, the dataset, the involved deep learning methods, and the proposed new model are described. Subsequently, in Section 3, the data preprocessing methods are presented along with the experimental steps and methods. The experimental results are also discussed, analyzed, and compared with the existing methods. Finally, in Section 4, a summary of the paper is presented.

2. Dataset and Methods

This section comprises four parts. First, the dataset and preprocessing methods are introduced. The second part describes the deep learning networks, while the third part introduces the two modules proposed in this paper. Finally, the proposed method is described in detail.

2.1. Dataset
2.1.1. Original Dataset

Sait et al. [21] collected 15 publicly available COVID-19 datasets and removed the duplicates to form a new dataset, which is the one used in this work. The dataset contains 1281 COVID-19 X-rays, 1656 viral pneumonia X-rays, 3270 normal X-rays, and 3001 bacterial pneumonia X-rays. The viral pneumonia and bacterial pneumonia were combined in a single category. The dataset included images from both children and adults. Three randomly selected images from the three categories are shown in Figure 1.

2.1.2. Dataset Preprocessing

The dataset contained healthy, COVID-19, and pneumonia X-ray images, marked as H, C, and P, as expressed in Equation (1). If the input image size is large, it is likely that redundant information present will slow down the processing speed of the neural network, so the set of original images was resized to a uniform size of , and a new image set was obtained, as shown in Equation (2):

The size reduction also improved the overall storage memory requirements substantially. Intuitively, it can be observed that this resizing operation can decrease the storage size required by , thus enabling the storage of all CXRs in RAM, which accelerates the speed for the proposed algorithms.

In our experiments, the dataset was randomly divided into three subsets: the training set (: 70%), the validation set (: 10%), and the test set (: 20%). The relevant information is listed in Table 1.

2.2. Deep Learning Models
2.2.1. EfficientNetV2

EfficientNet [22] is a balance input image size, network depth, and width to obtain the best results. In this study, EfficientNetV2 [23] was used as the backbone network. Fused-MBConv is the core part of EfficentNetV2 and replaces the boosting convolution and depth-wise convolution operations of MBConv with normal convolution to reduce training times. EfficentNetV2 is approximately ten times faster than EfficientNet in training and has better performance [24]. Figure 2 illustrates the structure of MBConv and Fused-MBConv. In addition, with EfficientNetV2, the whole training process is divided into four stages, and a progressive learning strategy with strong regularization is used in each stage, resulting in a network with few parameters and high accuracy.

2.2.2. DenseNet+CBAM

DenseNet (Dense Convolutional Network) was proposed by Huang et al. [25]. It connects each layer with other layers in a feed-forward fashion. In traditional convolutional neural networks, the network has connections for layers, while in DenseNet, there are connections, and the input of each layer comes from the output of all the previous layers. DenseNet consists of DenseBlock and a transition layer. In DenseBlock, the size of each layer does not change, and the channel dimension changes. Suppose the output of each nonlinear transformation is feature maps. Then, the input of the network in layer will be . The structure of DenseNet is shown in Figure 3. DenseNet121 and DenseNet169 were used for this study’s experiments.

The convolutional block attention module (CBAM) [26] is separated into two distinct parts, referred to as the channel attention (CA) module and the spatial attention (SA) module. This not only results in a reduced number of parameters and computational cost but also allows its integration as a plug-and-play module to existing network architectures. Each channel of a feature represents a specialized detector; therefore, channel attention is focused on features that are meaningful. To aggregate spatial features, CBAM uses both global average and maximum pooling to exploit different aspects of the information, separately. Besides, CBAM introduces a spatial attention module to concentrate on meaningful features. Since EfficientNetV2 has an integrated SE attention module, in this paper, we add CBAM to DenseNet so that both networks can utilize useful information and achieve good feature extraction capability. Then, the outputs of the two networks are fused to enrich the extracted features’ characteristics.

2.3. Feature-Level Fusion Module

EfficientNetV2 and DenseNet acquire features in different ways, and they are fused to acquire different features. In DAFFM, these two deep learning networks extract features in parallel. Two schemes are proposed, called DAFLNet-1 and DAFLNet-2, where DAFLNet-1 consists of EfficientNetV2 and DenseNet121, while DAFLNet-2 consists of EfficientNetV2 and DenseNet169.

The information fusion is divided into three levels: pixel-level fusion (PLF), feature-level fusion (FLF), and decision-level fusion (DLF). Commonly, there exist two feature fusion methods concat and add. Add corresponds to an increase in information amount for the features describing the image; however, the dimensions describing the image do not increase; only the amount of information under each dimension is increasing. On the other hand, concat refers to a merger of the number of channels, i.e., the number of channels describing the image increases, while relevant information for each feature stays constant. The relevant mathematical expressions are given in Equations (3) and (4). In this study, we used concat for feature-level fusion, as shown in Figure 4. The number of Fusion_Feature1_Feature2 channels refers to the sum of Feature1 and Feature2 channels: where is the feature extracted by EfficientNetV2, is the feature extracted by DenseNet, and is the fused features’ set.

2.4. Pretrained Network Selection Algorithm

There are three types of features in WDFM. The first one is extracted by EfficientNetV2, the second one is extracted by DenseNet, and the third one is obtained through the fusion of the two network outputs, which are both related and different. The three types correspond to three classifiers, auxiliary classifier1, auxiliary classifier2, and the fusion classifier. Experiments have shown that one classifier is not effective and that three classifiers with different weights can achieve better results. Thus, the loss of DAFLNet is where are the losses corresponding to the three classifiers, while are the weights corresponding to the three losses. Since different features enable different classification capabilities for the results, the greedy selection algorithm is unsuitable for the determination of weights because it does not guarantee that the fusion-based classification will yield the best performance. Therefore, a new pretrained network selection algorithm (PNSA) is proposed, which will aid the selection of the optimal weights in the range. The pseudocode of PNSA is presented in Algorithm 1. In this study, the fusion features are considered to be the richest, so is set to 1. The two auxiliary classifiers serve the same purpose, so . The system’s performance was analyzed for different values of , denoted as , . We finally determined that the network performance was best when was set to 1 and were set to 0.4.

Step 1 Input: Obtain the EfficientNetV2, DenseNet and fusion features from .
Step 2 For
  Step 3 Obtain the losses of
  Step 4 Obtain the losses of
  Step 5 Obtain the losses of
  Step 6 Calculate the loss using Eq. (5)
 End
Step 7 Generate and sort the indicator vector
Step 8 Obtain the minimum loss
Step 9 Obtain the weight of the smallest loss.
2.5. Proposed Approach

DAFLNet is mainly used for disease classification of chest X-ray images where COVID-19 has caused the appearance of ground glass-like shadows. In order to obtain richer features from limited images, four approaches for multiple-way data augmentation were adopted. The proposed approach is divided into two modules: DAFFM and WDFM, which improve the network performance in terms of both feature extraction and classification ability, respectively. DAFFM uses the EfficientNetV2 and DenseNet deep learning models, with CBAM added to DenseNet, and the features extracted by the two networks are fused to improve the feature extraction capability. The weighted classification fusion strategy is used in WDFM to improve the disease prediction accuracy. Figure 5 shows the overall design of the proposed method. To elaborate further, a pseudocode for DAFLNet is given in Algorithm 2.

Phase I: Preprocessing
Step 1 Input: Original Image . See Eq. (1).
Step 2 Dataset resizing. See Eq. (2).
Step 3 is divided into training set A, validation set B and test set C. See Eq. (6).
Step 4 MDA(A): noise injection, rotation, gamma correction and mirror to training set A.
Phase II: DAFFM
Step 5 Input of raw pre-trained EfficientNetV2 and DenseNet models.
Step 6 Obtain MBConv and Fused-MBConv Networks from .
Step 7 For
 Step 8 Add CA and SA to (CBAM).
 Step 9 Obtain residual Networks from .
 Step 10 . See Eq. (3).
 Step 11 Generate DAFLNET-1 and DAFLNET-2 based on the fusion model concat parameter .
 End
Step 12 Generate DAFFM.
Phase III: WDFM
Step 13 Obtain the and fusion features from .
Step 14 Obtain the .
Step 15 Obtain the weights of according to
Step 16 Concatenate to obtain the . See Eq. (5).
Step 17 Test confusion matrix, calculate indicators.
Step 18 Output: The DAFLNet model and its performances.

3. Experimental Analysis and Results

3.1. Experiment Platform

In this experiment, Python language was used for implementation and data preprocessing. The hardware utilized was a 32 GB Tesla V100 graphics card and an Intel® Xeon® CPU E5-2698 v4 @ 2.20 GHz processor running the Linux operating system. Libraries such as numpy and the deep learning pytorch toolbox were used. The learning rate (LR), batch size (BS), epochs, optimizer, and dropout rate (DR) made up the tuned hyperparameters of the model. The values listed in Table 2 produced the optimal experimental results. When the loss diminished, the LR was decreased to its original value of 0.1. The training process was terminated when the training loss threshold exceeded 3 times and the value obtained was not less than that of the previous iteration.

3.2. Data Augmentation

To mitigate any potential overfitting, multiway data augmentation (MDA) [6] was utilized. As shown in Figure 6, we used four ways to enhance the training data (). Furthermore, through the random hold-out (RHO) method, the resized dataset was partitioned into three sections: the training set , the validation set , and the test set : where and the relevant sizes related to these subsets satisfy the following equation: where refers to the cardinality of a set.

Assuming that there are MDA techniques, and images are generated using each technique; eventually, images are generated. The following four MDA methods were used in this study.

3.2.1. Noise Injection ()

Gaussian noise was injected into all the images of a training set, thereby generating new noisy images: where denotes the noise injection function.

3.2.2. Rotation ()

The rotation angle was applied to the images: where corresponds to the function of rotation.

3.2.3. Gamma Correction ()

The gamma correction factor rG_C =1.5 was used to produce new images as follows: where refers to the gamma correction function.

3.2.4. Mirror (Mir)

where is the mirroring function.

3.3. Evaluation Metrics

To assess the performance of the proposed DAFLNet method, various indicators were concurrent, namely, accuracy (), precision (), sensitivity (), specificity (), recall (), and F1-score (). The true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) were used to identify the diagnosis of CXRs by the model. A TP indicates a positive outcome for both the real category of the sample and the recognition result. Similarly, a FN highlights a positive real category of the sample identified as negative by the model. Alternatively, FP refers to a negative real category of the sample recognized as a positive by the model. Finally, a TN indicates a negative real sample category recognized as such correctly. The corresponding equations are expressed below:

3.4. Feature-Fusion Methods and Decision-Level Fusion Weight

In our experiments, a comparative study was performed to determine the fusion method. Both concat and add fusion were performed for DAFLNet-1 and DAFLNet-2, and the corresponding results from the experiment are given in Figure 7. Generally, the concat results yielded better performance compared to add, so the concat method was chosen for the ensuing experiments.

The weight of decision-level fusion determines the performance of the network. Experiments were conducted using DAFLNet-1 and DAFLNet-2 to determine the optimal values of experimentally. The results are shown in Figure 8. The best performance is achieved when , with DAFLNet-2 exhibiting better performance than DAFLNet-1.

3.5. Proposed DAFLNet

In order to determine the optimal fusion strategy, the two schemes DAFLNet-1 and DAFLNet-2 were tested under different configurations. This process comprised a total of 9 stages , as shown in Table 3.

3.6. Experimental Results
3.6.1. Effect of DAFLNet

The experiment was conducted five times, and the best results in the test set are shown in Table 4. M4 and M5 were more accurate than M1, M2, and M3, thus highlighting that the fused networks were generally more effective than the individual networks. Besides, M6 and M7 were based on M4 and M5 through the addition of the convolutional block attention module (CBAM) to ResBlock. The experimental results show that the performance of the networks with the addition of CBAM was improved. Figure 9 illustrates the overall performance of the nine networks visually. Finally, the performance of the network was further improved through the addition of weighted decision-level fusion, where DAFLNet-1 achieved a classification accuracy of 98.86%.

3.6.2. Classification Performance Validation

To illustrate the classification of data from the validation set of 921 CXRs, a confusion matrix [27] was created. Figures 10 and 11 show the confusion matrix of the proposed DAFLNet-1 and DAFLNet-2 models.

Table 5 demonstrates the overall performance of DAFLNet-1 and DAFLNet-2 for the validation set, revealing an average accuracy of 99.61%, 98.63%, and 98.41% for COVID-19, pneumonia, and normal X-ray images for DAFLNet-1, respectively. The average accuracy in DAFLNet-2 was 99.56%, 98.52%, and 98.31% for COVID-19, pneumonia, and normal X-ray images, respectively. In conclusion, the DAFLNet proposed in this paper performed well in terms of accuracy, recall, and F1-score even with unbalanced data.

3.6.3. Comparison with State-of-the-Art Approaches

To demonstrate the proposed method’s contribution, the proposed DAFLNet method was compared with 8 state-of-the-art approaches: ECOVNet-EfficientNetB3 base [28], BCNN_SVM [29], COVNet [30], DTL-V19 [9], DenseNet121 [31], Resnet50V2 [32], Xception [33], and MobileNetV2 [34]. The same dataset with all methods and the results shown are the mean and standard deviation of 5 runs. Table 6 illustrates the relevant comparison results.

Compared to the other methods, the strategy proposed in this work is unique. BCNN_SVM used a BCNN bilinear fusion of two deep learning networks, VGG16 and VGG19, to extract the features, and then used an SVM to classify for the presence of COVID-19. Since both fusion networks were based on VGG, the extracted features were similar and the relevant accuracy was lower than the proposed DAFLNet. Additionally, the DTL-V19 of the work in question used deep migration learning to migrate VGG19 over to COVID-19 classification. This resulted in fewer training parameters; nevertheless, the network layers were fewer, resulting in limited performance and overfitting. ECOVNet-EfficientNetB3 and COVNet use EfficientNetB3 and ResNet50, respectively, as pretrained networks for classification. These are models, so their performance is inferior compared to the proposed network owing to the absence of feature fusion. Furthermore, the proposed method was compared with the currently popular classification networks, namely, DenseNet121, Resnet50V2, Xception, and MobileNetV2, and results indicate that these networks are not as effective as DAFLNet.

DAFLNet takes into account both feature extraction and disease classification, using two networks to extract features for fusion, thus resulting in improved feature extraction capability, while the weighted classification results of the three classifiers improve its classification capability. It can be seen that, among all methods, the proposed DAFLNet achieved the best results. Moreover, DAFLNet-1 achieved an average accuracy of 98.56%, and DAFLNet-2 achieved an average accuracy of 98.41%. The high accuracy was mainly achieved through the coordinated feature fusion and attention mechanisms, the use of the newly proposed EfficientNetV2 as the backbone network, and the utilization of an improved classifier, the effectiveness of which is demonstrated through the experimental results. In addition, the proposed multiplexed data augmentation prevents model overfitting, hence improving its performance.

4. Conclusions

In this paper, the Chest Lesion Feature Fusion Network (DAFLNet) is proposed, which achieved an average accuracy of 98.56%. From the performance perspective, the network can diagnose COVID-19, health, and pneumonia accurately. To validate the performance of DAFLNet, it was compared to 8 state-of-the-art methods on two datasets. The experimental results showed that DAFLNet achieved the best detection performance in terms of accuracy, precision, sensitivity, and F1-score on both datasets. From the perspective of computational efficiency, DAFLNet fusion with two deep learning networks and CBAM results in longer training times, which is a disadvantage. However, the model obtained after training can diagnose the disease quickly and has similar diagnostic speed to the other methods. In terms of generalization ability, the proposed method can extract more disease features for diagnosis. With regard to practicality, DAFLNet can be deployed to hospital servers to assist doctors in faster and more accurate diagnosis, making a significant contribution to society and hospitals.

The comparison of this method with other methods illustrates its effectiveness. However, the method has not been validated on CT images. Furthermore, the proposed method utilizes a large number of parameters due to the fusion of two networks performed. Thus, future research directions will be towards the reduction of the number of parameters and experimentation on different types of datasets.

Data Availability

The data that support the findings of this study are open at https://data.mendeley.com/datasets/9xkhgts2s6/1.

Conflicts of Interest

The authors declare no conflicts of interest.

Acknowledgments

This work is supported by the Changchun Science & Technology Development Project (No. 21ZY35).