Abstract
Multimodal biometrics fusion plays an important role in the field of biometrics. Therefore, this paper presents a multimodal biometrics fusion algorithm using deep reinforcement learning. In order to reduce the influence of user behavior, user’s personal characteristics, and environmental light on image data quality, data preprocessing is realized through data transformation and single-mode biometric image region segmentation. A two-dimensional Gobar filter was used to analyze the texture of local sub-blocks, qualitatively describe the similarity between the filter and the sub-blocks and extract the phase information and local amplitude information of multimodal biometrics features. Deep reinforcement learning was used to construct the classifier of different modal biometrics, and the weighted sum fusion of different modal biometrics was implemented by fractional information. The multimodal biometrics fusion algorithm was designed. The Casia-iris-interval-v4 and NFBS datasets were used to test the performance of the proposed algorithm. The results show that the fused image quality is better, the feature extraction accuracy is between 84% and 93%, the average accuracy of feature classification is 97%, the multimodal biometric classification time is only 110 ms, the multimodal biometric fusion time is only 550 ms, the effect is good, and the practicability is strong.
1. Introduction
With the development of biometric extraction and its related technologies, biometric extraction products are becoming more and more mature and are gradually entering every aspect of society, which has brought great changes in people’s living habits. Among them, fingerprint identification attendance meter, face recognition access control, iris customs clearance system, and other products have played a great role in security and brought great convenience to people’s lives. However, with the increasing demand for security, there is an urgent need for a better biometric recognition system that must meet various requirements, including uniqueness, universality, stability, collectability, ease of use, and security. Universality means that the attribute of use needs to be the attribute of normal and healthy human beings; uniqueness means that the biometric information used is unique to the individual; stability means that the biometric mode needs to be stable and constant for a long time; collectability means that biometric modal information can be collected by external equipment; and ease of use means that the system needs to meet the needs of users for convenience [1]. Security means that the mode needs to be not easy to leave so as to avoid theft and forgery by others. The above requirements can be met through multimodal biometric fusion. Research on this problem plays a positive role in the improvement of research results in the field of biometric extraction and has high research value. Therefore, the multimodal biometric fusion algorithm is studied [2].
With more and more attention paid to security in various fields, research on multimodal biomedical feature fusion algorithms has gradually attracted much attention. Dinh Phu-Hung [3] proposed a multimodal medical image fusion algorithm based on the marine predator algorithm and three-scale image decomposition. The algorithm uses the Kirsch compass operator based on the local energy function for detailed level fusion so that the output image retains important information. Moreover, the algorithm uses the marine predator algorithm (MPA) to fuse the bottom layer by optimizing the parameters so that the output image has good quality, but the algorithm has the problem of low accuracy in feature extraction. Xinhua Li and Jing Zhao [4] proposed a new algorithm for multimodal medical image fusion. Firstly, CT images and MRI images are decomposed into low and high-frequency sub-bands by NSCT with multiscale geometric transformation; secondly, for the low-frequency sub-band, the local standard deviation algorithm or fusion is selected. For the high-frequency sub-band, the adaptive pulse-coupled neural network model is constructed, and the fusion rules are set according to the cumulative ignition times of iterative operations in the network; finally, the fused image is obtained by image reconstruction. However, the algorithm has the problem of a low error rate. M. Zhu and X. Yu [5] proposed a multifeature fusion algorithm in detail enhancement of VR panoramic images. The shadow detection results are obtained by using HSV color features and texture features, and then the final detection results are obtained by fusion. Experimental results show that the algorithm greatly reduces the false detection rate. However, the algorithm has the problem of low average classification accuracy. Geng Peng and Xiuming Sun [6] proposed a multimodal medical image fusion algorithm based on the quaternion wavelet transform. The algorithm can fuse not only CT and MR image pairs, but also CT and proton density-weighted MR image pairs, as well as multispectral MR images such as T1 and T2. However, the algorithm has the problem of a high average authentication time for a single user. Castro et al. [7] proposed a multimodal feature fusion algorithm for gait recognition based on CNN. Based on the original pixels and their derived simple functions, the algorithm uses advanced learning technology to extract relevant features and fuse the original pixel information with the information from optical flow and depth map to realize multimodal feature fusion. However, the algorithm has the problem of a low error rate. Therefore, a new multimodal biometrics fusion algorithm based on deep reinforcement learning is proposed. The main contributions to this paper are as follows: (1) data preprocessing is realized through data transformation, single-mode biometric image region segmentation, and other steps to reduce the impact of user behavior, user personal characteristics, and ambient light on image data quality. (2) The two-dimensional Gobar filter is used to analyze the local sub-blocks of texture, qualitatively describe the similarity between the filter and the sub-blocks, extract the phase information and local amplitude information of multimodal biometrics, and the accuracy and efficiency of feature extraction are proposed. (3) Deep reinforcement learning is used to construct different modal biometric classifiers, and the weighted sum fusion of different modal biometrics is implemented through the score information to realize the design of a multimodal biometric fusion algorithm so as to achieve the multiple objectives of improving the fusion efficiency and quality.
2. Materials and Methods
2.1. Dataset Description
In the process of testing the designed multimodal biomedical feature fusion algorithm based on deep reinforcement learning, the experimental datasets used are the CASIA-Iris-Interval-v4 and NFBS datasets, respectively.
The CASIA-Iris-Interval-v4 includes binocular iris images of 100 users. The image size is 640 480, and the acquisition times on the left and right sides of each person are 7 times, for a total of 1400 images. The NFBS (neurofeedback skull-stripped) dataset, the dataset includes 125 sets of brain image data. Each set of data includes a defaced T1 MR image, a skull removed MR image, brain segmentation results, and its dataset modal T1 weighted MRI scan. In the performance test of the algorithm, the experimental parameters are set as follows: the corresponding input batch size of a convolutional neural network is set to 16. Adam is the optimizer; the initial learning rate is set to 0.01; the probability of random inactivation of neurons is set to 0.8. The two datasets are divided into verification set, test set, and training set, with a division ratio of 1 : 2 : 7.
2.2. Preprocessing of Single-Modal Biometric Image Data
Preprocess the single-modal biometric image data to eliminate the adverse effects of user behavior, user personal characteristics, and ambient light. The specific data preprocessing steps are data transformation and single-modal biometric image region segmentation.
Data transformation is to transform the value range of single-modal biometric image data according to demand so as to better implement the processing of single-modal biometric image data [8]. The processing method of data transformation is the minimum-maximum normalization method, which mainly implements the linear transformation of value range data through the linear transformation method [9] so that the transformed data is in a unified value range. The equation is as follows:where refers to the sample data within the normalized single-modal biometric range; refers to the sample data of single-modal biometric range; refers to the minimum value of sample data in value range; refers to the maximum value of sample data in value range; refers to the minimum value of the data classification range within the value range; and refers to the maximum value of the data classification range within the value range [10].
The single-modal biometric region segmentation mainly detects the edge between the background region and the single-modal biometric region; that is, the single-modal biometric contour is detected, and the single-modal biometric region is segmented in the image background [11].
The human body is affected by infrared, and the brightness of the single-modal biometric region is often lower than that of other background regions. Edge detection is implemented by the Canny operator. The detection steps are as follows:(1)The smoothing filter is constructed by a one-dimensional Gaussian function . The convolution operation of is performed by column and row, respectively, to obtain the smooth image . The equation for constructing the filter is as follows [12]:(i)where is the standard deviation corresponding to the Gaussian function.(2)We pass a 2 × 2 matrix, where the gradient direction and gradient amplitude of are calculated by the neighborhood first-order partial derivative equation [13]. The equation is as follows:(i)where refers to the result of convolution operation of the smooth image by line through the filter; refers to the result of convolution operation of the smooth image by column through the filter. The convolution template of is as follows:(ii)The convolution template of is as follows:(3)The nonmaximum suppression of gradient amplitude is implemented to obtain all possible edge points in the smooth image .(4)The result of step (3) is segmented by high and low thresholds, respectively, to obtain two threshold edge images. For the image obtained by low threshold segmentation, the edges need to be collected continuously until all the gap connections of the image obtained by high threshold segmentation are realized [14].
Through the above steps, the preprocessing of single-modal biometric image data is completed.
2.3. Multimodal Biometric Extraction
The texture local sub-block analysis of the segmented single-modal biometric region is implemented through the two-dimensional Gobar filter. The similarity between the filter and the local sub-block is qualitatively described [15], and the phase information and local amplitude information of multimodal biometrics are extracted [16]. When the convolution result is positive, the local texture phase coding of the sub-block is 1. On the contrary, the local texture phase coding is 0.
The definition of a two-dimensional Gobar filter is as follows:where refers to the time domain frequency of two-dimensional Gabor wavelet; refers to the frequency of two-dimensional Gabor wavelet in the frequency domain; refers to the localization frequency of two-dimensional Gabor wavelet; refers to the local texture threshold of sub-block; refers to the wavelet iterative function; and refers to the scale difference function of texture extreme points.
The texture coordinates in the sub-block are transformed into a fixed scale rectangular region to describe each pixel in the single-modal biometric region.where refers to coordinate transformation distance; refers to coordinate transformation angle; refers to rectangular area’s scale; refers to the normalized value of coordinate transformation distance; and refers to the normalized value of coordinate transformation angle [17, 18].
Through , filter waves of . The local sub-block is qualitatively described as 1 and 0. The single-modal biometrics are described through the description of the local sub-block, and the corresponding multibit feature codes of various single-modal are created to realize the extraction of multimodal biometrics [19, 20].
2.4. Multimodal Biological Feature Fusion Algorithm
Deep reinforcement learning is used to construct different modal biometric classifiers, and the weighted sum fusion of different modal biometric features is implemented through the score information to realize the design of a multimodal biometric fusion algorithm.(i)Input: test sample set of multimodal biometrics.(ii)Output: fusion results of multimodal biometrics.
(1) The deep reinforcement learning algorithm based on convolution neural networks is used to construct different modal biometric classifiers. The classifier is composed of two modules: Inception-ResNet module and residual connection module [21].
The residual connection module can accumulate the output result of the upper layer network and the copy data and directly input it to the lower layer network. The accumulation of the output and the original image feature map helps to reduce the training error. The module network is composed of multiple residual blocks. It is a stacked deepening network that can prevent the occurrence of the overfitting phenomenon. The middle of the network is the residual structure, which can realize jump transmission [22, 23].
The Inception-ResNet module uses the inception structure and can find the optimal sparse structure unit. The single-modal between the two activation functions is expanded to become a new neural network, and multiple convolution kernels of different sizes are used to realize feature classification on different scales. The loss function of the classifier is as follows:where refers to the loss value; refers to the number of samples; refers to the feature value; and refers to the feature center point of category corresponding to sample .
When the value of the loss function is the smallest, the network can reach the most convergent state.
(2) classifiers are represented by the following equation:where represents the th classifier.
The classification results of classifiers are as follows:where represents the average classification threshold:
(3) Let us estimate the optimal weight of classification results as follows:where is the mean value of the truth function is the mean of false value scores. is the mean of the matching score.
(4) Let us quantify the uncertain area in the feature information through the mean closure strategy as follows:where refers to the quantized value of uncertain area and refers to the quantized function of uncertain value .
(5) Let us assign weight to uncertain areas according to value as follows:
The assigned weight value satisfies the following equation:
Through the reasonable distribution of weights, the summation and weighted fusion of the matching score information of different modal features are realized. The process of the proposed algorithm is shown in Figure 1.

2.5. Experimental Index
In the experiment, multimodal medical image fusion algorithms based on marine predator algorithm and three-scale image decomposition, the new algorithm for multimodal medical image fusion, multifeature fusion algorithm in detail enhancement of VR panoramic images, multimodal medical image fusion algorithm based on quaternion wavelet transform, and multimodal feature fusion algorithm for gait recognition based on CNN are used as the comparison test algorithms, which are represented by algorithms 1, 2, 3, 4 and 5, respectively.
The various performances of the proposed algorithm are tested. The experimental indexes are as follows: fusion effect: we take brain CT images and MRI images as examples for multimodal biometric fusion, and compare the fusion results, that is, the quality of the image. Accuracy of multimodal biometric extraction: the calculation equation of multimodal biometric extraction accuracy is as follows:where refers to the amount of data correctly extracted from multimodal biometric samples, and refers to the total number of samples for multimodal biometrics.
Average classification accuracy (ACA): it represents the average probability of correct classification of multimodal biometrics in multiple experiments, and its calculation equation is as follows:where refers to sample data size of correctly classified multimodal biometrics, refers to the experimental data, and refers to the times of experiments.
The time-consuming of multimodal biometric classification: the time-consuming of multimodal biometric classification refers to the time spent in classifying multimodal biometrics. The calculation equation of the index is as follows:where represents the classification start time and represents the classification end time.
Multimodal biometric fusion time: it refers to the sum of the time taken to complete all multimodal biometric fusion steps. It is expressed as follows:where represents the time taken for the th multimodal biometric fusion step.
3. Results and Discussion
This paper takes brain CT images and MRI images as an example for multimodal biometric fusion, in which the original image is shown in Figure 2.

The fusion results of different algorithms are shown in Figure 3.

The fusion results of brain CT images and MRI images are shown in Figure 3. The fusion images with algorithms 1–5 are of low quality and have poor definition. Therefore, the fusion results cannot accurately reflect the detailed features of the two images. Compared with these algorithms, the proposed algorithm almost retains the texture of the soft tissue area in most brain MRI images and obviously retains the shadow area of a suspected blood clot in brain CT images. The image definition is higher and the fusion effect is better.
We test the feature extraction accuracy of the designed multimodal biometric fusion algorithm based on deep reinforcement learning and five comparison algorithms. The test results are shown in Figure 4.

According to the data in Figure 4, the process of increasing dimension, the feature extraction accuracy of the proposed algorithm also increases. The feature extraction accuracy of algorithm1 is 74%–86%, that of algorithm 2 is 63%–73%, that of algorithm 3 is 52%–64%, that of algorithm 4 is 57%–70%, and that of algorithm 5 is 56%–67%. The feature extraction accuracy of the proposed algorithm is between 84% and 93%, and its feature extraction accuracy is higher than that of the five comparison algorithms, which shows that the proposed algorithm can achieve more accurate multimodal biometric extraction.
We test the average classification accuracy of the designed algorithm and the five comparison algorithms. The test results are shown in Figure 5.

The average classification accuracy test results in Figure 5 show that the highest average classification accuracy was 82% for algorithm 1, 73% for algorithm 2, 76% for algorithm 3, 72% for algorithm 4, and 75% for algorithm 5. The average classification accuracy of multimodal biometrics of the proposed algorithm is high, up to 97%, which is higher than the five comparison algorithms, indicating that the ACA performance of multimodal biometrics of the proposed algorithm is strong.
The time-consuming test results of multimodal biometric classification are shown in Figure 6.

The test results in Figure 6 show that the multimodal biometric classification of algorithm 1 takes 270 s, the multimodal biometric classification of algorithm 2 takes 340 s, the multimodal biometric classification of algorithm 3 takes 250 s, the multimodal biometric classification of algorithm 4 takes 260 s, and the multimodal biometric classification of algorithm 5 takes 460 s. The multimodal biometric classification of the proposed algorithm takes the shortest time, and it only takes 110 ms to realize the multimodal biometric classification with the highest efficiency.
The test results of multimodal biometric fusion time of six algorithms are shown in Figure 7.

According to the multimodal biometric fusion time test data in Figure 7, the multimodal biometric fusion time of algorithm 1 is 700 ms, the multimodal biometric fusion time of algorithm 2 is 740 ms, the multimodal biometric fusion time of algorithm 3 is 640 ms, the multimodal biometric fusion time of algorithm 4 is 700 ms, and the multimodal biometric fusion time of algorithm 5 is 650 ms. The multimodal biometric fusion time of the proposed algorithm is the shortest, and the multimodal biometric fusion can be realized in only 550 ms, which is more efficient.
4. Conclusions
In the research, the problem of multimodal biometric fusion is solved, a multimodal biometric fusion algorithm based on deep reinforcement learning is proposed, and the algorithm is verified in practice. The following research results were obtained: (1) the unification of the value range of single-modal biometric image data and the segmentation of single-modal biometric regions are realized; (2) the extraction of multimodal biometric phase information and local amplitude information is realized by two-dimensional Gobar filter; and (3) different modal biometric classifiers are designed. In the research, due to the limitations of time and energy, the training of classifiers and other issues have not been studied in detail. More in-depth and detailed research will be carried out in the future so as to maximize the classification accuracy and the effect of multimodal biometric fusion.
Data Availability
The data used to support the findings of this study are included within the article.
Conflicts of Interest
The author declares that there are no conflicts of interest with this manuscript.
Acknowledgments
This work was supported by the Natural Science Foundation of Hunan Province under Grant no. 2017JJ2124, the Teaching Reform Project in Hunan Province under Grant no. 2018436, and the Cooperative Education Project in Ministry of Education under Grant no. 201802.