Abstract

Gastrointestinal (GI) diseases, particularly tumours, are considered one of the most widespread and dangerous diseases and thus need timely health care for early detection to reduce deaths. Endoscopy technology is an effective technique for diagnosing GI diseases, thus producing a video containing thousands of frames. However, it is difficult to analyse all the images by a gastroenterologist, and it takes a long time to keep track of all the frames. Thus, artificial intelligence systems provide solutions to this challenge by analysing thousands of images with high speed and effective accuracy. Hence, systems with different methodologies are developed in this work. The first methodology for diagnosing endoscopy images of GI diseases is by using VGG-16 + SVM and DenseNet-121 + SVM. The second methodology for diagnosing endoscopy images of gastrointestinal diseases by artificial neural network (ANN) is based on fused features between VGG-16 and DenseNet-121 before and after high-dimensionality reduction by the principal component analysis (PCA). The third methodology is by ANN and is based on the fused features between VGG-16 and handcrafted features and features fused between DenseNet-121 and the handcrafted features. Herein, handcrafted features combine the features of gray level cooccurrence matrix (GLCM), discrete wavelet transform (DWT), fuzzy colour histogram (FCH), and local binary pattern (LBP) methods. All systems achieved promising results for diagnosing endoscopy images of the gastroenterology data set. The ANN network reached an accuracy, sensitivity, precision, specificity, and an AUC of 98.9%, 98.70%, 98.94%, 99.69%, and 99.51%, respectively, based on fused features of the VGG-16 and the handcrafted.

1. Introduction

Cancer is the second leading cause of death and a hindrance to life worldwide [1]. The gastrointestinal (GI) is influenced by various diseases and abnormalities such as ulcer bleeding and benign and malignant polyps [2]. In 2018, a total of 5 million cases of GI cancer were diagnosed, and 3.6 million people died because of GI cancer. The GI contains the upper and lower GI, such as the oesophagus and stomach and colon and rectum, respectively. Colon and rectal cancer are one of the most common types of GI cancers, thus accounting for 6.1%, as well as stomach and oesophageal cancers accounting for 5.7% and 3.2% of other cancers, respectively [1]. An early diagnosis is necessary before GI diseases develop into malignant diseases, and there are vital indicators for diagnosing GI disorders. However, the death rate remains high, which indicates a failure in early diagnosis to receive appropriate treatment and reduce the death rate [3]. Polyps are a type that affects the colon and stomach because of the growth of abnormal cells that slowly grow [4]. This type is curable if it is diagnosed early; however, the symptoms do not appear until after it becomes large and turns into cancer if it is not diagnosed early and receives appropriate treatment [5]. Some endoscopic techniques cause pain when detecting benign and malignant tumours, ulcers, and bleeding. In 2000, a new technique was used for the internal examination of the GI tract by WCE [6]. WCE is a technique to detect GI diseases with a tiny camera measuring 11 mm × 30 mm. This technique takes more than 2 hours to capture a video divided into frames containing thousands of images [7]. In 2019, 1 million people were treated with WCE technology [8]. Nevertheless, this technique has some challenges, such as the number of many images of each patient, time-consuming nature, and the lack of experts. In addition, ulcers, polyps, and bleeding do not appear except in some tires and do not appear in most tires, which cause an incorrect manual diagnosis [9]. The similarity of texture, shape, and colour causes an obstacle for doctors in distinguishing between the types of disease. Thus, the computer-aided diagnosis will solve all these challenges and work to provide physicians with the proper identification of the type and location of the lesion in endoscopy images [10], thus improving the diagnosis and reducing the time of diagnosis. Researchers focused their efforts and time on diagnosing GI diseases using artificial intelligence techniques, and automatic diagnosis for early diagnosis of GI diseases remains an area under study to reach satisfactory accuracy in detecting early lesions [11]. In recent years, deep learning techniques have proven superior in the classification, segmentation, and extraction of deep features for each type of GI disease [12]. However, the challenges facing deep learning were the lack of images in the data set, which were solved by using data augmentation techniques [13, 14]. Moreover, deep learning techniques have proven to perform better than feature extraction machine learning algorithms [15], which made it one of the best image recognition techniques [16]. Furthermore, deep learning models work with a convolution technique of merging spatial and channel information to extract deep features called spatial encoding. Hence, in this study, the features were extracted by several hybrid approaches because of the morphological symmetry of GI diseases. The features were extracted from more than one CNN model and combined to obtain more representative features. A hybrid technique has been used to combine handcrafted features with deep features of CNN models to obtain more efficient and representative feature vectors for each disease.

The most important contributions in this work are as follows:(i)Two consecutive filters are used to enhance endoscopy images of the gastroenterology dataset(ii)Essential features are selected to reduce the dimensions of the VGG-16 and DenseNet-121 features by PCA(iii)The last classification layer of VGG-16 and DenseNet-121 models is replaced with the SVM algorithm(iv)Endoscopy images of GI diseases are analysed by incorporating the features of VGG-16 and DenseNet-121 models before and after using PCA and their diagnosis by ANN(v)Endoscopy images of GI diseases are diagnosed by ANN with fused features of the CNN and the handcrafted

The rest of the paper is organised as follows: previous studies related to GI diseases are analysed and discussed in Section 2. Endoscopy images of GI diseases are analysed by applying practical methods and materials in Section 3. The performance of the methodologies is summarised in Section 4. The performance of all proposed methodologies is discussed and compared in Section 5. Conclusion of the research is in Section 6.

Khan et al. proposed two deep learning models with the Moth-Crow method for diagnosing GI diseases. Firstly, image contrast was increased, and then data augmentation and pretrained deep learning models were applied. The features of each image were extracted, then combined with the features of the distance canonical correlation method, and fed into a machine learning algorithm to classify them [17]. Meanwhile, Thambawita et al. presented machine learning algorithms to classify universal features and deep learning models to classify different types of GI diseases. The systems were trained and evaluated on one data set and on another data set, respectively [18]. Yogapriya et al. three CNN to classify the gastroenterology data set. Images were optimised, and data augmentation was used to solve the lack of images in the data set. The VGG16 model had the best accuracy and recall of 96.33% and 96.37%, respectively [19]. In addition, Lonseko et al. proposed a CNN to extract spatial features using coding and decoding layers. Herein, data augmentation was used to balance the data set. GoogLeNet and ResNet50 reached an accuracy of 91.38% and 90.28%, respectively [20]. Dheir and Abu-Naser proposed five CNN models to classify the Kvasir dataset to detect GI diseases. Herein, the image processing method was applied to remove unwanted artifacts, and data augmentation was used to get enough images. The ResNet and Inception-v3 models achieved 92.3% and 90% accuracy, respectively [21]. Wan et al. presented the YOLOv5 model to detect polyps based on the self-attention method. Self-attention is activated when extracting features to enhance channels rich in information and disable channels that do not contain important information. The YOLOv5 model got precision and recall with accuracy rates of 91.5% and 98.9%, respectively [22]. Nogueira-Rodríguez et al. proposed a system to train a dataset and generalise the system to test other data sets; the system achieved an F1 score of an internal data set of 91% [23]. Meanwhile, Dulf et al. proposed a two-network approach: a pretrained deep learning network to classify GI diseases with a sensitivity of 98.13%. The second network based on the extraction of the lesion and its isolation from the healthy region achieved the Jaccard index by 75.18% [24]. Mohammad and Al-Razgan proposed DenseNet-201 and Inception v3 to extract and classify the features of GI diseases. Firstly, the data were increased to improve the system through the training stage, the deep features were extracted, and the features of the two models were merged using the dragonfly optimisation method and the classification of the fused features [25]. Khan et al. presented a system to diagnose GI diseases based on a combination of features. VGG16 was trained for extracting and integrating features by the matrix-based method. The features were selected by the PSO method with a function of fitness. The cubic SVM algorithm classified the selected features [26]. Khan et al. proposed a CNN model to detect GI diseases. The ulcer area was segmented with a modified mask, and then the dataset was trained using RCNN to obtain the ulcer area. The features were extracted by ResNet101 and optimised by Grasshopper optimisation with fitness function [27]. Meanwhile, Öztürk and Özkaya et al. used a CNN to classify the gastroenterology data set. The pooling layer map features were provided to the LSTM and then classified by combining all LSTM layers [28]. Okimoto et al. proposed the ResNet50 model to train images of the GI dataset to test the performance of the system by independent endoscopy images. For active cases of EoE, the system performed an accuracy and sensitivity of 94.7% and 90.8%, respectively [29].

Various researchers focused their efforts and time to reach satisfactory accuracy to distinguish between GI. Considering the similarity of features between various early-stage GI diseases, this study focuses on applying various methods to extract features and fuse them to obtain effective features. Deep learning features between deep learning models were extracted and incorporated into the same feature vectors. In addition, CNN features were integrated with the handcrafted features extracted by GLCM, DWT, FCH, and LBP algorithms.

3. Materials and Methods

This section describes the methodologies and materials employed to analyse endoscopy images of GI diseases as shown in Figure 1. Herein, all GI images were subjected to optimisation techniques and fed into VGG-16 and DenseNet-121 models. The first methodology for diagnosing endoscopy images of GI diseases was VGG-16 + SVM and DenseNet-121 + SVM. The second methodology for diagnosing endoscopy images of GI diseases was by integrating features of VGG-16 and DenseNet-121 before and after using PCA and diagnosed by ANN. The third methodology for diagnosing endoscopy images of GI diseases was by integrating the features of VGG16 and DenseNet-121 separately with the handcrafted and then diagnosing by ANN.

3.1. Kvasir Dataset

This study assessed the systems on the Kvasir dataset to diagnose GI diseases. Herein, images of the Kvasir dataset were collected by endoscopic equipment from four hospitals in Vestre Viken Health Trust (VV) in Norway. Then, images were carefully evaluated by endoscopic medical experts from VV and Cancer Registry experts of Norway from the South East Norway Health Authority to screen for cancer and prevent its progression. The dataset consists of 8,000 images equally divided into eight types of GI diseases as follows: 1,000 dyed-lifted-polyps, 1,000 dyed-resection-margins, 1,000 esophagitis, 1,000 normal-cecum, 1,000 images of normal-pylorus, 1,000 images of normal-z-line, 1,000 images of polyps, and 1,000 images of ulcerative-colitis [30]. All dataset images had varying resolutions between 720 × 576 and 1920 × 1072 pixels. Figure 2(a) shows a set of images randomly selected by the system to represent all classes of the Kvasir dataset.

3.2. Enhancement of Endoscopy Images

The endoscopy images of the gastrointestinal disease contained some noise and artifacts such as bubbles, fluid, and residues of stool in the colon or food in the oesophagus. Also, low contrast, particularly on images of ulcers, all of which constitute a hindrance for deep learning models to achieve satisfactory diagnostic accuracy. Therefore, images should be enhanced by removing artifacts and increasing the contrast of the low-contrast regions affected. This study calculated RGB averaged for each channel, and average filters were applied to remove artifacts and Laplacian filter to increase low contrast [31].

The average filter was devoted to improve endoscopy images of the GI tract. The filter was assigned to dimensions of pixels; in each iteration, it selected a pixel from the image, then calculated the average of the 24 contiguous pixels, and replaced it with the average of its neighbours [32]. The process continues until it has replaced every pixel in the image as in the equation as follows:where F(x), mean the enhanced image, the input of previous, and the number of pixels in the image, respectively.

The Laplacian filter was used, which increased the contrast of the low-contrast affected areas as in the equation as follows:where refers to a second differential equation and x and y mean the 2D.

The enhanced endoscopy image was subtracted through the averaging filter from the enhanced endoscopy image through the Laplacian filter to obtain a final enhanced image as in the equation as follows:

Finally, enhanced images were obtained as Figure 2(b).

3.3. Hybrid Approach between Deep Learning and SVM

Deep learning models take time to train a dataset on expensive computers. Moreover, deep learning models do not achieve satisfactory diagnostic results for diagnosing GI diseases. The hybrid approach, VGG-16 + SVM and DenseNet-121 + SVM, solves these challenges [33]. This approach consists of VGG16 and DenseNet-121 to extract features from endoscopy images. The features of high-dimensional GI endoscopy images were reduced using the PCA algorithm because these approaches extract high-dimensional features. Secondly, an SVM classifier was used to classify low-dimensional features.

3.3.1. Deep Feature Extraction

The CNN models have great abilities to serve humanity in various areas, such as healthcare and medical image diagnosis. CNN models feature deep feature extraction with high precision without manual assistance. In this study, the improved endoscopy images of the GI dataset are passed to two VGG-16 and DenseNet-121 models that passed through several layers, each layer with complex calculations to perform a specific task. Layers of CNN models consist of neurons connected by weights and connections. The features of endoscopy images are extracted by convolutional, pooling, and auxiliary layers whose connections are connected through millions of connections and weights [34].

The convolutional layers are from the crucial layers of CNN to extract the deep features of the input images [35]. The number of convolutional layers differs from one model to the other. Each layer has a precise assignment: a layer to appear edges, a layer to extract geometric features, a layer to extract colour features, a layer to extract shape features, and so on. Each convolutional layer has three parameters that define the filter size for each convolutional layer, the number of filter steps over the image at a time, and zero padding. The filter f(t) wraps with the endoscopy image x(t) based on filter size [36]. Each time the features are extracted based on the filter’s size [37], the procedure is reiterated until extraction of all the features based on equation (4). The p-step parameter determines the number of steps that filter over the image. Meanwhile, the zero-padding parameter keeps the original image size [38] as follows:where W(t), f(t), and x(t) denote the output layer, the filter in the layer, and the inputted image, respectively.

Convolutional layers create millions of neurons and need complicated computations and long execution times [39], Thus, CNN has pooling layers to solve this issue. Pooling layers decrease the dimensions of features through two methods: max and average pooling [40]. The max pooling method specifies a group of pixels of the image; meanwhile, the max value neuron is chosen from among the specified values and replaces it as in equation (5). The average pooling method specifies a group of pixels of the image and then computes the average for all selected values. An average is kept instead of a set of values chosen as in equation (6) as follows:where m and n refer to the location in the cell of the matrix and f, p, and k denote the filter size, the step over the image, and the feature vector size, respectively.

The images of the GI dataset passed through the VGG-16 and DenseNet-models for deep feature extraction. The high-dimensionality features are lowered using the PCA method.

The last layer in the VGG-16 and DenseNet121 models produces high-level features of size (7, 7, 512) and (16, 32, 512). The global average pooling layer was used as the last layer of two models to convert the high-level features into vector features of size 2048 and 2048 for VGG-16 and DenseNet-121 models, respectively. Thus, the features were saved in a feature matrix of 2048 × 8000 and 2048 × 8000 for VGG-16 and DenseNet-121, respectively, and sent to the PCA algorithm. PCA deletes the repeated features and keeps the most important features and saves them at 512 × 8000 and 512 × 8000 for VGG-16 and DenseNet-121, respectively.

3.3.2. SVM Algorithms

SVM is a classification and regression algorithm that belongs to the family of machine learning classifiers. When the dataset features are fed into an SVM classifier, it creates N dimensions of the same size as the dataset features. The algorithm works to create various lines or the so-called hyperplane to separate the data of the data set into multiple classes [41]. Each hyperplane has a margin separating the data set classes, the best performing of the SVM algorithm with the maximum margin [42]. The hyperplane is mapped based on the support vector, which is the data point that is close to the line between classes [43]. Furthermore, there are two kinds of algorithms. SVM linear works with linearly separable data, whereas the nonlinear SVM works with nonlinearly separable data by creating a variable called the kernel. The kernel function converts a nonseparable data set into a linear (separable) data set. Figure 3 illustrates the methodology for diagnosing endoscopy images for the gastroenterology data set. The VGG16 and DenseNet-121 models receive enhanced endoscopy images, extract features, and keep them in feature matrices size of 8000 × 2048 for each VGG-16 and DenseNet-121 model. The high-dimensional GI dataset was reduced by the PCA method [44] and stored in the two feature matrices of size 8000 × 512 for each VGG-16 and DenseNet-121 model. The SVM classifier receives the endoscopy image feature matrices using VGG-16 [45] and DenseNet-121 [46] models, then finally trains them at a high speed, and classifies them efficiently.

3.4. A Hybrid Approach Based on Integrating Features of CNN Models

Training a data set using CNN takes a long time and needs the aid of a computer which is expensive; diagnosing a data set using one type of deep learning model does not achieve satisfactory accuracy. This section discusses a hybrid method that integrates the features of the VGG-16 and DenseNet-121 models to solve the previous challenges [47].

This approach consists of two proposed systems as shown in Figure 4. The implementation steps of the first system are firstly, the images of the GI dataset were optimised and then fed into the VGG-16 model, the model extracted the features of the endoscopy images through convolutional layers, and pooling layers were stored in feature vectors of 8000 × 2048 size. Secondly, the endoscopy images of the GI dataset were optimised and then fed to the DenseNet-121 model. The model extracted the deep features of the images through the convolutional and pooling layers and saved them with the size of 8000 × 2048. Thirdly, the features extracted by VGG-16 and DenseNet-121 models were fused and saved with the size of 8000 × 4096. Fourthly, the dimensions were reduced by the PCA method. Then, after reducing the features, the size became 8000 × 720. Fifthly, the feature vectors 8000 × 720 were inputted into the ANN classifier to train them fast, and then their performance on endoscopy images of GI diseases was evaluated.

As for the second system, its steps are as follows: the first and second steps were similar to the first system. Thirdly, the high-dimensionality features of the VGG-16 and DenseNet-121 models were reduced. Low-dimensional features were obtained using the PCA method and saved in vectors with the size of 8000 × 512 for the VGG-16 model and 8000 × 512 for the DenseNet-121 model. Fourthly, the low-dimensional feature vectors of VGG-16 and DenseNet-121 models were integrated to obtain new vectors of 8000 × 1024. Fifthly, the feature vectors 8000 × 1024 were inputted into the ANN classifier to train them fast, and then their performance on endoscopy images of GI diseases was evaluated.

This approach aims to represent each GI image with features of VGG16 and DenseNet-121 models and evaluate the execution of the ANN classifier with the features combined before and after using the PCA method.

3.5. A Hybrid Approach Based on Integrating Features of CNN with Handcrafted

This section presents a novel approach for the diagnosis of endoscopy images of the GI dataset by ANN when fed with features of CNN and features of shape, colour, geometry, and texture extracted by GLCM, DWT, FCH, and LBP methods called handcrafted features [48].

This approach consists of two systems for the diagnosis of GI disease. It passed through several steps, as shown in Figure 5: firstly, the endoscopy images are enhanced, and then we fed them to VGG-16 and DenseNet-121 models separately. Each model extracts deep features from endoscopy images of GI diseases through convolutional and pooling layers and saved them into size of 8000 × 2048 using the VGG-16 model and size of 8000 × 2048 using the DenseNet-121. Secondly, the features are reduced by the PCA method and saved into feature vectors with the size of 8000 × 512 for each VGG-16 and DenseNet-121 model separately because of the production of high-dimensional features by VGG-16 and DenseNet-121 models. Thirdly, GLCM, DWT, FCH, and LBP methods are applied to extract the features of shape, colour, geometric, and texture, respectively, and then we combined all the algorithm features into feature vectors with the size of 8000 × 244.

The handcrafted features representing the features of shape, colour, geometry, and texture are among the representative and vital features representing each type of GI disease.

The GLCM algorithm extracts the texture features of the region of interest (ROI) in GI endoscopy images by transforming the ROI into a grey matrix. The algorithm checks the ROI in endoscopy images based on spatial information to extract features. The algorithm compares the target pixel with its neighbours based on distance d and principal angles 0°, 45°, 90°, and 135°. Based on the pixels of the region, the algorithm decides whether or not the texture is smooth or rough; if the pixels are equal, the texture is smooth; meanwhile, if the pixels values of the area are other, the algorithm decides that the texture is rough [49]. GLCM generates features based on statistical metrics and stores them in 8000 × 13 feature vectors.

The DWT algorithm analyses the components of the endoscopy image by dividing it into four components for analysis. The algorithm works with four types of filters; each type specialises in analysing a component of the image. The low-pass filter analyses the first component of the endoscopy image and extracts approximation parameters that are analysed by measures of contrast, mean, and standard deviation. The low-high and high-low filters analyse the second and third components of the endoscopy image, respectively, and extract detailed parameters that are analysed by measures of contrast, mean, and standard deviation of each component [50]. Meanwhile, the high-pass filter analyses the last component of the endoscopy image and extracts detailed parameters, which are analysed by measures of contrast, mean, and standard deviation. Finally, the features of the four components are collected and saved into feature vectors with the size of 8000 × 12.

Colour features are among the strongest representative features of types of GI diseases. The FCH algorithm is devoted to get the features of endoscopy images of GI diseases. The algorithm extracts a local colour and represents it in a histogram bin. The algorithm works by distributing varying proportions to represent each colour. Any two colours in two different histogram bins are different even if they are similar, and each histogram bin has similar colours even if the colours differ [51]. Then, the algorithm checks the membership of each colour and extracts the colour features. The FCH method produced a total of 16 colour features from each image and saves them into feature vectors with the size of 8000 × 16.

The LBP method acquires the bilateral surface texture features of the ROI in endoscopy images of the GI tract by converting the ROI into a grey matrix. The algorithm analyses the ROI to measure local variances to describe the texture of endoscopy images of gastroenterology. In addition, the algorithm is set with a size of . Moreover, the algorithm begins with the analysis of the ROI; the algorithm replaces each central pixel based on 24 adjacent pixels based on the mechanism of action of the algorithm shown in equation (7). The procedure is repeated based on the number of pixels of the ROI. Each time, a central pixel is targeted and replaced with another value based on neighbouring pixels based on the algorithm’s mechanism of action [52]. The LBP 203 method yields features for each image and are saved into the size of 8000 × 203 as follows:where , and P mean the grey value of the aim pixel, the grey value of close pixels, the radius of adjoining, and the number of adjoining pixels, respectively.

Fourthly, the low-dimensional features of the VGG-16 model with a size of 8000 × 512 are combined with a size of 8000 × 244 of the handcrafted features, thus becoming the feature vectors of the size of 8000 × 756, and then fed to an ANN categorising them into eight classes. Fifthly, the low-dimensional features of the DenseNet121 model with a size of 8000 × 512 are combined with a size of 8000 × 244 of the handcrafted features, thus becoming the feature vectors of the size of 8000 × 756, and then fed to an ANN categorising them into eight classes.

4. Experimental Results of the Performance of the Proposed Systems

4.1. Split of the GI Tract Dataset and Training Options

Several hybrid systems were applied in this study based on the fusion features. All systems aimed to achieve a promising accuracy to diagnose the GI dataset endoscopy images. The Kvasir data set was divided equally into 80% during the training and validation phases (80 : 20) and 20% during the testing phase. Each class in the dataset was divided into 640, 160, and 200 images for system training, validation, and testing, respectively.

Table 1 shows the hyperparameters of the training options for tuning the DenseNet121 and VGG16 models.

4.2. Evaluation Metrics

All the proposed systems were passed to equal evaluation scales for their performance on the gastroenterology data set [53]. Each system produces a confusion matrix that provides the rating scales with the variables required to calculate the results of the system’s performance. The confusion matrix is in the shape of a quad matrix, wherein the primary diagonal represents the correctly categorised images, and the rest of the matrix represents the incorrectly categorised images.

4.3. Data Augmentation

The performance of CNN models requires several huge images of data sets to obtain sufficient images through the training. Therefore, the lack of data set represents the challenges of CNN to achieve superior diagnostic accuracy. Thus, the data augmentation procedure is used to solve challenges by artificially increasing endoscopy images [54]. The data augmentation contains various operations to increase the images, such as flipping, shifting, and rotating with various angles. Given that the data set is balanced, the increase in images is equal in all classes. Through the training, the endoscopy images of the gastroenterology data set were increased nine times for each image and equally for each class [55]. There are 640 images for each class through the training before applying the augmentation. Thus, there are 6400 images for each class through the training after using the augmentation by increasing each original image by nine artificial images.

4.4. Result of the Hybrid Approach between Deep Learning and SVM

This section explains the diagnostic performance outcomes of the hybrid approaches VGG-16 + SVM and DenseNet-121 + SVM for endoscopy image diagnosis of the GI dataset. The enhanced endoscopy images are fed to VGG-16 and DenseNet-121 models, which analyse the images and extract their features by several successive convolutional layers. The PCA is fed by high-dimensional features to reduce features while retaining essential ones. The SVM receives low-dimensional features and divides them into training and testing phases. The SVM classifies the features of endoscopy images into eight classes.

Table 2 summarises the execution results of the VGG16 + SVM and DenseNet-121 + SVM systems for diagnostic endoscopy images. As shown, DenseNet-121 + SVM is slightly superior to VGG16 + SVM. The VGG16 + SVM system yielded an accuracy, sensitivity, precision, specificity, and AUC of 95.6%, 95.49%, 95.56%, 99.35%, and 99.02%, respectively. In contrast, the DenseNet-121 + SVM model achieved an accuracy, sensitivity, precision, specificity, and AUC of 96.4%, 96.33%, 96.45%, 99.49%, and 98.57%, respectively.

Figure 6 illustrates the execution of the VGG-16 + SVM and DenseNet-121 + SVM systems to diagnose endoscopy images of the GI dataset.

Figure 7 illustrates the performance of two systems, VGG-16 + SVM and DenseNet-121 + SVM, through the confusion matrix to diagnose endoscopy images of the gastroenterology dataset to eight classes. DenseNet-121 + SVM achieved an accuracy as follows: for dyed-lifted-polyps 97%, for dyed-resection-margins 99%, for esophagitis 96.5%, for normal-cecum 95.5%, for normal-pylorus 96.5%, for normal-z-line 97.5%, for polyps 96%, and ulcerative-colitis 93.5%. In contrast, VGG-16 + SVM yielded an accuracy as follows: for dyed-lifted-polyps 92.5%, for dyed-resection-margins 96.5%, for esophagitis 95.5%, for normal-cecum 95%, for normal-pylorus 97%, for normal-z-line 98.5%, for polyps 95.5%, and for ulcerative-colitis 94%.

4.5. Result of the Hybrid Approach Based on Integrating Features of CNN Models

This section presents the execution of the ANN algorithm with the fusion of features of VGG-16 and DenseNet-121 before and after using the PCA method for endoscopy image diagnosis of the gastroenterology dataset. The ANN was fed with two feature vectors: feature vectors based on feature extraction from both VGG-16 and DenseNet-121 and their merging and dimensionality reduction by PCA. The feature vectors extracted by the VGG-16 and DenseNet-121 models were submitted to the PCA algorithm to reduce the high-dimensional features.

Table 3 summarises the execution of the ANN when fed with features combined between VGG-16 and DenseNet-121 before and after using PCA. The ANN algorithm with VGG-16 and DenseNet-121 features merging before using PCA achieved better than integrating features after using PCA. Moreover, by integrating the features of VGG-16 and DenseNet-121 before using PCA, the ANN yielded an accuracy, sensitivity, precision, specificity, and AUC of 97.8%, 97.26%, 97.45%, 99.52%, and 98.88%, respectively.

In contrast, ANN yielded an accuracy, sensitivity, precision, specificity, and AUC of 97.3%, 95.49%, 97.33%, 99.88%, and 99.05%, respectively, by integrating the features of VGG-16 and DenseNet-121 after using PCA.

Figure 8 presents the performance of the ANN when fed with fused features between VGG-16 and DenseNet-121 before and after using PCA to diagnose endoscopy images of the gastroenterology dataset.

Figure 9 shows the performance of ANN with features of VGG-16 and DenseNet-121 models before and after using PCA through the confusion matrix to diagnose endoscopy images of the gastroenterology dataset into eight classes. An ANN with features of VGG-16 and DenseNet-121 before using PCA achieved an accuracy for each class: for class 1 (dyed-lifted-polyps) 97.5%, for class 2 (dyed-resection-margins) 99%, for class 3 (esophagitis) 98.5%, for class 4 (normal-cecum) 98.5%, for class 5 (normal-pylorus) 97.5%, for class 6 (normal-z-line) 96.5%, for class 7 (polyps) 97.5%, and for class 8 (ulcerative-colitis) 97.5%. In contrast, with fused features between VGG-16 and DenseNet-121 after using PCA, an ANN yielded an accuracy for each class: for class 1 (dyed-lifted-polyps) 95.5%, for class 2 (dyed-resection-margins) 99%, for class 3 (esophagitis) 98.5%, for class 4 (normal-cecum) 96.5%, for class 5 (normal-pylorus) 97.5%, for class 6 (normal-z-line) 98.5%, for class 7 (polyps) 97.5%, and for class 8 (ulcerative-colitis) 95.5%.

4.6. Result of the Hybrid Approach Based on Integrating Features of CNN with Handcrafted

The section summarises the diagnosis of endoscopic images of gastroenterology by ANN when fed with two feature vectors: the first feature vector is a mixture of VGG16 and handcrafted. The second feature vector is a mixture of DenseNet121 and handcrafted. Here are the tools for evaluating the ANN to diagnose the gastroenterology dataset into eight classes.

4.6.1. Validation Checks and Gradient

Validation checks and gradient is a tool that shows the execution of the ANN in all stages of the dataset when diagnosing endoscopy images of a gastroenterology dataset with an ANN. The tool records the performance of the ANN network, as it records the lowest error by comparing the expected and actual values in each epoch. Figure 10 describes the ANN performance of endoscopy images of the gastroenterology dataset. With fused features of VGG16 and handcrafted, the ANN recorded the lowest error when validation checks were 6 in epoch 53 and at a gradient of 0.010961. Meanwhile, with fused features of DenseNet121 and handcrafted, the ANN recorded the lowest error when validation checks were 6 in epoch 80 and at a gradient of 0.01194.

4.6.2. Error Histogram

When diagnosing endoscopy images of gastroenterology by ANN, the error histogram is a tool that displays the execution of the ANN during the phases of the dataset. The error histogram records the performance of the ANN based on the lowest error rate when comparing the expected and actual values in all instances of the dataset. The error histogram displays three colours for the evaluation of the ANN, with each colour assigned to one of the split stages. The blue, green, and red colours represent the execution of the ANN to train, validation of the data set, and performance of the ANN to test new samples, respectively. Figure 11 shows the error histogram of the ANN performance of endoscopy images of the gastroenterology dataset. With fused features between VGG16 and handcrafted, ANN recorded the lowest error when the error histogram between 20 bins was between −0.9401 and 0.9401. Meanwhile, with fused features between DenseNet-121 and handcrafted features, ANN recorded the lowest error when the error histogram was between 20 bins between −0.9401 and 0.9401.

4.6.3. Best Validation Performance

When diagnosing endoscopy images of a gastroenterology dataset with an ANN, a tool that shows the ANN’s interpretation during the dataset’s phases is the best validation. The best validation tool records the performance of the ANN based on the lowest error when comparing the expected and actual values in each epoch. The best validation shows three colours for the evaluation of the ANN, with each colour assigned to one of the segments of the dataset. The blue, green, and red colours represent the execution of the ANN to train the data set, validation of the data set, and performance of the ANN to test new data, respectively. Figure 12 shows the ANN performance of the best validation through endoscopy images of the gastroenterology data set. With fused features of VGG-16 and handcrafted, ANN recorded the minimum error at 0.044408 at epoch 47. Meanwhile, with the fused features of DenseNet-121 and the handcrafted features, the ANN recorded the minimum error at 0.047531 at epoch 74.

4.6.4. Confusion Matrix

The confusion matrix is the gold criterion to evaluate the execution of artificial intelligence techniques. In this work, endoscopy images of the gastroenterology dataset were diagnosed by ANN when fed with two feature vectors, each containing fused features: the first vector with fused features from VGG16 and handcrafted. The second vector has the fused features of DenseNet121 and the handcrafted. ANN trained and evaluated its performance on the fused features and classified all endoscopy images of GI diseases into eight classes. In addition, ANN produces the confusion matrix shown in Figure 13 based on the features fused of the CNN models (VGG-16 and DenseNet-121) and the handcrafted.

Based on the fusion of features between VGG-16 and handcrafted features, ANN achieved an accuracy for each class: for class 1 (dyed-lifted-polyps) 98%, for class 2 (dyed-resection-margins) 99.5%, for class 3 (esophagitis) 98.5%, for class 4 (normal-cecum) 98.5%, for class 5 (normal-pylorus) 98.5%, for class 6 (normal-z-line) 99.5%, for class 7 (polyps) 98.5%, and for class 8 (ulcerative-colitis) 97.5%. In contrast, based on a fusion of features between DenseNet-121 and handcrafted features, ANN achieved an accuracy for each class: for class 1 (dyed-lifted-polyps) 98%, for class 2 (dyed-resection-margins) 99.5%, for class 3 (esophagitis) 98%, for class 4 (normal-cecum) 100%, for class 5 (normal-pylorus) 99.5%, for class 6 (normal-z-line) 99.5%, for class 7 (polyps) 98.5%, and for class 8 (ulcerative-colitis) 98.5%.

Table 4 summarises the performance of the ANN with fused features of CNN and the handcrafted features to diagnose endoscopic images of GI diseases. Based on the fused features between VGG16 and handcrafted, ANN has performed with an accuracy, sensitivity, precision, specificity, and AUC of 98.6%, 98.20%, 98.56%, 99.65%, and 99.58%, respectively. In contrast, based on the fused features of the DenseNet-121 and handcrafted, ANN has performed with an accuracy, sensitivity, precision, specificity, and AUC of 98.9%, 98.70%, 98.94%, 99.69%, and 99.51%, respectively.

Figure 14 depicts the demonstration of ANN performance with features of CNN and handcrafted features for endoscopy image diagnosis of the gastroenterology dataset.

5. Discussion of the Performance of All Systems

In this work, various automated techniques have been developed that aim to represent each image of GI diseases through features fused between more than one method. Endoscopy images are optimised with the same techniques for all systems. Hence, this work contains three different methodologies, each with two approaches. The first methodology for diagnosing endoscopy images of GI diseases using two-hybrid approaches, VGG-16 + SVM and DenseNet-121 + SVM, which performed an accuracy of 95.6% and 96.4%. The second methodology is to diagnose endoscopy images of GI diseases using ANN with the fusion features of VGG-16 and DenseNet-121 before and after dimension reduction using PCA. Based on the features between VGG-16 and DenseNet-121 before using PCA, ANN performed an accuracy of 97.8%. Meanwhile, based on the fusion of features between VGG16 and DenseNet-121 after using PCA, ANN achieved an accuracy of 97.3%. The third methodology to diagnose endoscopy images of GI diseases by ANN is based on the fusion of features between VGG16 and handcrafted and a fusion of features between DenseNet-121 and handcrafted features. Based on the fusion of features between VGG-16 and handcrafted features, ANN achieved an accuracy of 98.6%. Meanwhile, based on a fusion of features between DenseNet121 and handcrafted, ANN achieved an accuracy of 98.9%.

The results of all systems in this work to diagnose endoscopic images of the GI disease data set have been summarised in Table 5 and Figure 15. The table contains the performance of each system for diagnosing each category in the dataset. The best diagnostic accuracy for classes dyed-lifted-polyps and dyed-resection-margins classes by ANN with fused features of CNN and handcrafted were 98% and 99.5%, respectively. For the esophagitis class, the best accuracy by ANN with fused features of VGG16 and DenseNet121 and the fused features between VGG16 and handcrafted features was 98.5%. For normal-cecum, normal-pylorus, normal-z-line, polyps, and ulcerative-colitis classes, the best accuracy by ANN based on fused features between VGG16 and handcrafted was 100%, 99.5%, 99.5%, 98.5%, and 98.5%, respectively.

6. Conclusion

In recent years, the number of deaths caused by GI tumours has increased because of the lack of healthcare resources. The endoscopic technique produces a video from which thousands of tires are extracted. However, it is difficult to track all the tires by manual diagnosis, particularly because the injuries appear only in a few images. Therefore, artificial intelligence techniques solve this challenge and diagnose thousands of images with speed and high accuracy. In this work, several systems based on fused features were developed. The methodologies used to diagnose endoscopic images of GI diseases were varied. The first methodology for diagnosing endoscopic images of GI diseases is a hybrid methodology for feature extraction using CNN and diagnosed by the SVM algorithm, called VGG-16 + SVM and DenseNet-121 + SVM. In addition, the second methodology for diagnosing endoscopic images of GI diseases by ANN is based on the fused features between VGG-16 and DenseNet-121 models before and after dimension reduction through PCA. Meanwhile, the third methodology for diagnosing endoscopic images of GI diseases by ANN was based on the fused features between VGG-16 and the handcrafted features and the fused features between DenseNet-121 and the handcrafted features. All the systems in this study reached a promising accuracy for diagnosing GI diseases. When fed with fused features between DenseNet-121 and handcrafted features, ANN yielded an accuracy, sensitivity, precision, specificity, and AUC of 98.9%, 98.70%, 98.94%, 99.69%, and 99.51%, respectively.

There were limitations which we faced but were overcome. The limitation is the lack of images in the data set, solved by the data augmentation method.

Data Availability

In this study, supporting images of the systems were collected from a publicly available dataset in the following link: https://datasets.simula.no/kvasir/#download.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors would like to thank the Deputy for Research and Innovation, Ministry of Education through the Initiative of Institutional Funding at the University of Ha’il-Saudi Arabia through project number IFP-22 139.