Abstract
Detecting fires is of significance to guarantee the security of buildings and forests. However, it is difficult to fast and accurately detect fire stages in complex environment because of the large variations of the fire features of color, texture, and shapes for flame and smoke images. In this paper, a statistic image feature-based deep belief network (DBN) is proposed for fire detections. Firstly, for each individual image, all the statistic image features extracted from a flame and smoke image in time domain, frequency domain, and time-frequency domain are calculated to construct training and testing samples. Then, the constructed samples are fed into DBN to classify the multiple fire stages in complex environment. DBN can automatically learn fault features layer by layer using restricted Boltzmann machine (RBM). Experiments using the benchmark data of three groups of fire and fire-like images are classified by the present method, and the classification results are also compared with those commonly used support vector machine (SVM) and convolutional deep belief networks (CDBNs) to manifest the superiority of the classification accuracy.
1. Introduction
Accurately and timely detection of fires is important to save life, property, and economic losses. Recently, fire detection methods have been developed to monitor forest fires, civil infrastructure, and industrial fires [1–6]. 23,535 fire incidents of buildings in 18 cities around the world in the year 2017 were reported by the International Association of Fire and Rescue Services (CTIF) [7]. Therefore, accurately and timely detection of fires using sensors is of great significance to protect the social security [8].
Fire features, such as heat, gas, flame, and smoke, are the most commonly used in fire detection techniques to monitor fire. Point sensors are the most directly and commonly used fire sensing techniques for engineering applications [9–11]. However, point sensors might lead to high false positives using signal processing techniques to extract features under complicated fire environment [11]. Fire images are the effective and direct representations of fires under complex outside environment, which has attracted numerous researchers to construct modern fire alarm systems. Smokes or flames are the first signal when there is a fire. Therefore, effective smoke or flame detection plays an important role in fire detection.
The color, location, height, and optical density are the features of a smoke for the combination of gases, airborne solid, and liquid particulates [12–15]. However, color-based smoke detection is always reliable for the confusion of gray or black color smoke with nonsmoke pixels of other objects [16]. The fuel and oxidant exothermic reactions create flames with various colors, flickering motions, and dynamic texture feature. The spatial and temporal wavelet analysis combined with Weber contrast analysis and color segmentation was presented to enhance the dynamic texture feature of flames [17]. However, though flames have distinct color features, failed to recognize fire similar color objects in the backgrounds, which lead to a difficult problem on how to choose a suitable color model. In general, once there are fires occurred in buildings and forest, they can be reflected in the measured smokes and flames with certain characteristics. However, the smoke or flame features are usually too weak to be observed due to the strong background noises and other disturbances.
Traditional machine learning methods are support vector machine (SVM) [18], fuzzy neural networks [19], smoke segmentation-based local binary pattern Silhouettes coefficient variant (LSPSCV) [20], color segmentation-based radial basis function (RBF) nonlinear Gaussian kernel-based binary SVM [21], color segmentation-based fuzzy model [22], and fire frame segmentation-based Markov random field [23]. However, most existing studies on fire detection train models using smoke or flame from videos are often difficult to track more complicated smoke situations. Moreover, traditional machine learning methods only consider fire smoke or flame in an ideal background without too much disturbances. Therefore, these artificial intelligent fire detection models still suffer from obvious shortcomings to learn complicated nonlinear relationships and thus have limited representation capacity.
Deep learning, as a new branch of machine learning, has been shown the excellent ability in learning features from raw images [24]. The most obvious property of deep learning models is the multiple layer structures. With multiple hidden layers stacked hierarchically, the deep learning model can realize very complicated transformation and abstraction of raw images [25–28]. Deep belief network (DBN) is a kind of the generative deep learning model with powerful feature learning ability [29].
Pundir and Raman adopted DBN to recognize fires in an accurate and robust way for varieties of scenarios such as wildfire-smoke video, hill base smoke video, and indoor or outdoor smoke videos [30]. DBN was carefully designed to extract the nonlinear features for a better description of the important trends in a combustion process [31]. Kaabi et al. [32] developed a Gaussian mixture model (GMM) and the corresponding energy attitude of smoke region based on RGB rules to preprocess smoke or flame images for the DBN classification. Wang et al. [33] extended DBNs to classify coal-fired boilers to further predict NOx emission. The main advantage of the intelligent diagnosis solution is that DBN does not rely on manual feature extraction and selection.
In this paper, a new fire detection method based on DBN is proposed to detect fires using smoke and flame images. Firstly, all the statistic image features extracted from the raw images are obtained to characterize the fire status. Secondly, the training and testing samples are constructed by the statistic image features. Finally, DBN is employed to identify the fire status using smoke or flame images in the strong background noises and other disturbances. Comparing with other commonly used methods, the classification accuracy ratio of the proposed method is manifested by using the open-access experimental investigations.
2. A Basic Theory of DBN
DBN is a kind of neural network with multiple hidden layers which allow DBN to learn complex functions which could then complete data transformation and abstraction by successive learning process. The main architecture of DBN includes an ensemble of stacked RBMs. Every two-layer neural network composes an RBM. Here, DBN is composed of three stacked RBMs and an output layer. The learning process of a DBN consists of two stages: one is pretraining every individual RBM layer by layer in an unsupervised manner; the other is to fine tune the whole network using the back propagation algorithm in a supervised manner. More details can be seen in References [34–36].
3. Statistical Image Feature-Based DBN
3.1. Overview
The diagram of the fire detection method is shown in Figure 1. The general procedures are summarized as follows: Step 1. Predefine smoke or flame patterns. Step 2. Collect the raw images of the smoke or flame patterns using the image capture system, respectively. Moreover, raw images of unknown status are also collected using the same image capture system. Step 3. Calculate feature descriptor from the raw images using feature descriptors, i.e., color moments in HSV space, statistical features of gray-scale image, statistical features of gray-level co-occurrence matrix, local binary pattern histogram, and wavelet transform decomposition (WTD)-based statistical features of gray-scale image, which will be discussed in details late. Step 4. Construct the training samples for the known smoke or flame patterns and the testing samples for unknown smoke or flame patterns using the feature descriptors. Step 5. Train each individual RBM layer by layer and then fine tune DBN (DBN learning). Step 6. Obtain the unknown smoke or flame patterns after DBN learning process.

3.2. Color Descriptor
3.2.1. Color Moments in HSV Space
RGB space is the commonly used color representation. However, researchers [37, 38] found that HSV space is more close to human visual perception and is friendly to the base visual feature extraction. The HSV refers to hue (0 to 360°, H), saturation (0 to 1, S), and value (0 to 1, V). In HSV space, colors are represented as combination of the three channels. For any channel of HSV color space, the first-order moment (denoted as M1) is calculated bythe second-order moment (denoted as M2) isand the third order moment (denoted as M3) is
Figure 2(a) shows a smoke image in RGB space, and Figure 2(b) shows the same image in HSV space. It does not show much difference in both spaces, except for a certain region of concern. Figure 2(c) shows the detection result of the smoke region in RGB space, and Figure 2(d) shows the same result in HSV space. It can be seen that HSV space shows a better property. Therefore, the first, second, and third moments M1, M2, and M3 in HSV space (H, S, V) will generate 9 × 1 statistical features vector F1.

(a)

(b)

(c)

(d)
3.2.2. Statistical Features in Gray-Scale Image
Typical statistical features in gray-scale image [39] are considered in this part, as listed in Table 1. Images in gray-level bins of frequency distribution are represented by gray-level histogram, and it counts the pixels which are similar and stores it. The histogram analyses statistics of a single pixel with a certain gray level. It can reflect the changes occurred in the translation, rotation, and angle on individual parts of an image. Entropy is calculated based on the gray-level histogram. Therefore, 7 statistical features of gray-scale images will generate 7 × 1 statistical features vector F2.
3.3. Texture Descriptor
3.3.1. Gray-Level Co-Occurrence Matrix
Texture can be characterized by the direction, adjacent interval, and change range of images. There will be a certain gray-level relationship between two pixels separated by a certain distance in the image space, i.e., the spatial correlation characteristics of the gray level in the image. The gray-level co-occurrence matrix (GLCM) [40] is a tool to describe such relations by measuring the co-occurrence frequencies among pairs of pixels of images in gray level. The GLCM is obtained by the statistics of two pixels with a certain distance in the image. The elements in the diagonals of the GLCM tend to have larger values when the image is composed of pixel blocks with similar gray values, while the elements that deviate from the diagonal will have larger values when the image pixels change locally.
GLCM reflects the comprehensive information about images in gray-level co-occurrence matrix. Through the gray-level co-occurrence matrix, we can analyze the local pattern and arrangement rules of the image. In order to describe the texture, we usually obtain statistics on the basis of the GLCM. Table 2 gives the five typical statistics used in GLCM, which formed 5 × 1 statistical features vector F3.
3.3.2. Local Binary Pattern Histogram
Local binary pattern (LBP) is an operator that is used to describe the local texture features of an image [41]. It has the characteristics of multiresolution, invariant gray scale, and rotation; therefore, it is a good measure for image feature extraction, as shown in Figure 3. LBP is often alongside with histograms to represent images as a feature vectors, i.e., LBPH. As a visual descriptor, LBP can be obtained directly using scikit-image [42] package, which needs at least 2 parameters, the number of circularly symmetric neighbor set points and the radius of circle. Histograms are then extracting using the generated LBP; we set 18 bins to get the final histogram of the raw image, i.e., a feature vector, denoted as b1 to b18.

(a)

(b)

(c)

(d)

(e)

(f)
From Figure 3, we can see that different parts of the frequency histogram form a certain texture of the image. Therefore, local binary pattern histogram is chosen here as features for smoke/flame detection. F4 is an 18 × 1 statistical features vector combined by the frequency histogram.
3.4. WTD-Based Statistical Features of Gray-Scale Image
WTD can be understood as a successive low-pass and high-pass filters. Therefore, image after WTD is a set of subimages which contains different details of the raw image. Figure 4 shows a 3rd level WTD, L3 is the low resolution approximation of the original image, and H3_h, H3_v, and H3_d; H2_h, H2_v, and H2_d; and H1_h, H21_v, and H1_d are the wavelet subimages of horizontal details, vertical details, and diagonal details at the 3rd level decomposition. Due to its simplicity and efficiency, the Haar wavelet is widely used, especially in the image processing fields such feature detection and image compression. Therefore, the Haar wavelet is chosen in the present method. Figure 4 shows the corresponding subimages after the 3rd level WTD. The final 7 subimages are denoted as L3, H3_h, H3_v, and H3_d; H2_h, H2_v, and H2_d. As can be seen from Figure 5, the raw fire image and its L3, H3_h, H3_v, H3_d, H2_h, H2_v, and H2_d show different details. We calculate the statistical features of the 7 subimages as listed in Table 1. Finally, we get 7 feature vectors for each of the subimage, which are denoted as W1 to W7. In the present investigations, we only focus on lower frequency band, and the subimages of H1_h, H21_v, and H1_d are not considered.


(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)
4. Experimental Investigations
4.1. Data and Evaluation Metrics
In our experiment, two case studies on smoke detection and fire detection are carried out. The benchmark data are collected from smoke dataset [43] and fire dataset [44]. In smoke dataset, the first 500 images are characterized by the presence of the smoke and the last 500 images are normal scene images which contain roads, trees, white curtains, lights, jets of water, and so on. In fire dataset, the first 1048 images are characterized by the presence of the fire and the last 1048 images are normal images with natural landscape and some of which contain sunlight that appeared with a large similarity to fires. Table 3 lists the brief information of the two datasets. Figure 6 shows some of the images. We can see that it shows great difference between images, which means it is very challenging to detect smoke from the images.

(a)

(b)

(c)

(d)
To evaluate the performance of the present method, we use the same evaluation metrics mentioned in [40], which are detection rate, i.e., the true positive rate (TPR), false alarm rate (i.e., FPR), and average accuracy rate (AAR) given as follows:in which TP, TN, FP, and FN are the true positive, true negative, false positive, and false negative.
4.2. Result and Discussion
We first apply the model to smoke detection. In the case study, we conduct ablation study w.r.t various feature combinations and comparison study with other methods. The smoke and nonsmoke images are divided into training set and testing set with ratio equals to 0.6. Features are extracted from the images to feed into the deep belief networks, i.e., , , , , , ( and ), ( and ), (, , , and ), and (, , , , and ). Details are shown in Table 4.
Figure 7 shows the results of the ablation study over three metrics with respect to (w.r.t.) 9 statistical features vectors. Figure 8 is the average accuracy and the standard deviation over 9 statistical features vectors , , , , , , , , and . The combination of color features, textual features, and wavelet features shows the best performance.


Then, we conduct the comparison study with another two machine learning methods. One is the convolutional deep belief networks (CDBNs) [45]. The reason we choose the model for comparison is that it could learn the hierarchical representations from raw images in an unsupervised learning manner without any label information, which is somewhat like the way we extract the statistical features. The other one is the support vector machine (SVM) [46]. Codes are released by the authors. DBN used in the proposed method is simply set to 200-100-50, while the CDBN is a two-layer unsupervised feature learning model followed by a softmax classifier, whose parameters are listed in Table 5. And the kernel type of the SVM is sigmoid, with gamma set to 1 and cost set to 10. The others take the default values. The inputs to the models are all same except for the CDBN.
We take 50 trails to obtain average results of the three evaluation metrics over the three comparison methods, as shown in Figure 9. The metric values of TPR, FPR, and AAR of the proposed method are 97.14%, 5.28%, and 95.88%, respectively. The metric values of TPR, FPR, and AAR of CDBN are 87.61%, 10.64%, and 88.42%, respectively. The metric values of TPR, FPR, and AAR of SVM are 64.39%, 34.63%, and 64.80%, respectively. Table 6 lists the best results among the 50 trails. Besides, the best performance of the average classification accuracy rate over the three methods is 97.68%, 92.72%, and 71.25%, respectively. Compared with the CDBN and SVM models, it can be seen that the present statistic image feature-based DBN obtained the best performance. The comparisons with more advanced models are shown in Table 7. We have to admit that these SOTA models have strong feature learning ability from raw images but may suffer from a large quantity of training parameters and limited training samples, as well as the various backgrounds, diverse angles, and different lights.

Fire detection is conducted in the following part using the proposed method. Same as the previous case in the smoke detection, both ablation study and comparison study are conducted. The fire and nonfire images are also divided into training set and testing set with ratio equals to 0.6. Figure 10 shows the results of the ablation study over three metrics w.r.t. 9 feature combinations. The features are calculated using color descriptor, texture descriptor, and statistical features with DWT. Feature vectors , , , , , , , , and in the case are the same as those in the previous case.

Figure 11 shows the average accuracy and the standard deviation over 9 feature vectors, and has the best performance in average accuracy.

In the comparison study of fire detection, we also take 50 trails. Figure 12 shows the average results of the three evaluation metrics. The metric values of TPR, FPR, and AAR of the proposed method are 91.68%, 9.66%, and 90.94%, respectively. The metric values of TPR, FPR, and AAR of CDBN are 80.69%, 19.10%, and 80.79%, respectively. The metric values of TPR, FPR, and AAR of SVM are 69.56%, 29.96%, and 69.76%, respectively. Table 8 lists the best results among the 50 trails in the fire detection. Besides, the best performance of the average classification accuracy rate over the three methods is 92.65%, 84.50%, and 75.06%, respectively. The present statistic image feature-based DBN shows comparable results over CDBN and SVM models. SOTA results are given in Table 9, which shows the similar results as in the smoke detection case.

5. Conclusion
Fires in buildings and forests are difficult to be fast and accurately detected due to the complex environment. A statistic image feature-based DBN is proposed for fire detections to reduce the feature variations influence of color, texture, and shapes for flame and smoke images. Experiments using open-access benchmark data of fire and fire-like images are performed to verify the effectiveness of the present method. The relative optimal combinations of statistic image features are analyzed using the different combinations of statistic image features. The best performance of the average classification accuracy rate using the present method could achieve 97.68 and 92.65%, respectively, for detecting smoke and fire using the benchmark data. Compared with two typical classification models of SVM and CDBN, the classification accuracy rates exceed 4.96%; 26.43% and 8.15%; 17.59%, respectively. The results show that the present method can learn effective flame and smoke features with high accuracy in complicated classifications. Moreover, this method is expected to fast detect fire stages with the help of small samples of statistic image features.
Data Availability
The fire dataset is available in https://mivia.unisa.it/datasets/video-analysis-datasets/fire-detection-dataset/ (accessed on 8 September 2020).
Conflicts of Interest
The authors declare that there are no conflicts of interest.
Acknowledgments
This work was funded in part by the Zhejiang Provincial Key Research and Development Project under Grant 2020C03096 and Zhejiang Provincial Science Foundation under Grant GF19F020010.