Abstract

A trustworthy and secure identity verification system is in great demand nowadays. The automatic recognition of the 3D middle finger knuckle is a new biometric identifier that could offer a precise, practical, and efficient alternative for personal identification. According to earlier studies, deep learning algorithms could be used for biometric identification. However, the accuracy of the current 3D middle finger knuckle recognition model is relatively low. Motivated by this fact, in this study, seven deep learning neural networks have been modified and trained to identify 3D middle finger knuckle patterns using transfer learning. Using the Hong Kong Polytechnic University’s 3D knuckle image dataset, an extensive experiment was performed. Two sessions of data from different camera lenses were used to assess the performance of the suggested deep learning model. The results show that the InceptionV3 method significantly enhanced the recognition of 3D middle finger knuckle patterns with 99.07% accuracy, followed by Xception, NasNetMobile, and DenseNet201 (97.35%, 92.92%, and 92.59%, respectively), which is superior to the current middle finger knuckle recognition model. This accurate, fast, and automatic middle finger knuckle identification will help to be implemented in real-time and small-scale settings like offices, schools, or personal devices like laptops and smartphones, where training is simple.

1. Introduction

Biometric technology offers enormous potential to meet a range of security requirements for the automatic and effective identification of humans. Among the various biometric identifiers, the fingerprint [1, 2] is perhaps the most widely used biometric for e-governance, e-business, and a variety of law enforcement applications.

Additional biometric identifiers, such as the face, iris, fingernails, palm, or vascular patterns, have also proven their effectiveness in a variety of applications [3]. The accuracy, effectiveness, and most crucially, the user’s ease of meeting the application requirements will determine the usefulness of biometric identifiers. Several challenges have emerged with biometric recognition deployments using fingerprints due to frequent skin deformations, residual dirt, sweat, moisture, and/or scars. Due to its high identification accuracy, acceptable efficiency, and ease of generating finger dorsal images, biometric recognition using finger knuckle images has gained increasing attention in recent years [4].

The finger knuckle patterns can be simultaneously imaged during fingerprint identification and are less susceptible to damage during daily life activities. In contrast to fingerprints, the prominent creases and curved patterns of the finger knuckle patterns are more clearly apparent to the naked eye, making them conveniently imaged from a distance. In conclusion, there are reasonable reasons to indicate that the addition of finger knuckle patterns in biometric recognition might help to overcome some of the limitations of the usage of only fingerprints.

Recent research trends study the 3D information of biometrics in addition to the conventional studies on 2D intensity images, due to the ability to extract rich information from 3D photos and the fact that these images are often more stable and illumination invariant [4].

The current restriction of such 3D knuckle recognition is the limited ability of traditional methods. However, recently, convolutional neural networks (CNNs) have achieved good performance in image classification [5]. The applications of this method have been studied in several specific biometric recognition problems, including face recognition, iris recognition, and fingerprint recognition [68]. Therefore, it is highly motivated to apply CNN models to advance the recent 3D middle finger knuckle recognition framework.

Contactless 3D finger knuckle recognition is a new research frontier. It was first briefly studied using finger dorsal surfaces for biometric identifications [9]. Unfortunately, the consideration of 3D finger knuckle recognition has not been aroused, probably because the efficiency of utilizing 3D finger knuckle data was constrained by the low resolution of the 3D finger dorsal images being studied and the ineffective feature descriptor employed for extracting discriminative features, which is a generic surface shape descriptor, the Shape Index [10, 11]. The consideration of 3D finger knuckle recognition has been brought forward through only five recent studies.

The authors in [1] studied the utilization of 3D finger knuckle patterns for biometric recognition. This research studied different aspects of 3D finger knuckle recognition, for example, the feature description, the possibility of presentation attacks against a finger knuckle recognition system, the individuality of the finger knuckle, the comparisons of 2D and 3D finger knuckle recognition, and offered a standard database for further studies and research. The efficiency along with capacities of utilizing 3D finger knuckle images considering biometric recognition was validated. Nevertheless, there are further plain inquiries regarding the utilization of the recent biometric quality if utilizing neural network could perform a better job.

In the same year, the authors in [4] utilized the surface key points extracted from the 3D knuckle surface for developing a more effective matching method to solve 3D knuckle recognition issues. The results of their comparative experiments with the most advanced methods on the publicly available 3D knuckle database illustrate that their technique could offer over a 23-fold improvement in performance improvement in terms of accuracy. Though their effort emphasizes 3D knuckle recognition, the performance of the method on other publicly available databases with similar 3D biometric patterns (including 3D palmprints and 3D thumbprints) has been shown to verify the performance of the proposed model.

Furthermore, the authors in [12] presented work that explores the potential for biometric identification utilizing 3D middle finger knuckle patterns. The research offers a recent, simple, trained from scratch, but effective deep convolutional neural network model designed for 3D finger knuckle pattern recognition.

The model was designed to be implemented in a real-time system. It might be utilized in small-scale settings such as workplaces, homes, or personal devices like laptops and tablets, where training is simple. The experimental results were very encouraging and showed the potential for biometric applications utilizing the 3D middle finger knuckle pattern. The results confirmed that, despite the numerous challenges compared with other research in the same field, the proposed method offers a relatively great solution with an accuracy of 71%.

However, the authors in [13] presented a novel curvature-based feature generated from the 3D finger knuckle surfaces descriptor and a technique based on the statistical distribution of the encoded feature space to calculate the similarity function. The proposed feature representation uses insights into 3D geometry to accurately encode curvature information.

When calculating the similarity between a pair of templates, they calculate the similarity function according to the probability mass distribution of the coded feature space. They use a vision in 3D geometry that says that, for a pair of adjacent surface normal vectors, the distance among their heads is smaller than the distance among their tails if the surface was concave. Consequently, they can distinguish if a local shape is convex or concave along a specified direction and encode the curvature information as binary templates of preferred sizes for further comparisons. Their method could be extended to patterns of dissimilar sizes. Additionally, they offered the generalizability of their technique by evaluating it on extra openly accessible biometric datasets of similar patterns, like 3D palmprints and finger veins.

In the meantime, the writers in [10] emphasized the difficulties related to the construction biometrics, for example, the lack of training data as well as important train-test sample variability noticed in real life applications. They offered a new deep neural network-based technique for contactless 3D finger knuckle recognition. Furthermore, to create a much more powerful deep feature representation, their technique simultaneously encodes and integrates deep information from various scales. This was the first time a neural network method has been improved for 3D finger knuckle identification.

Therefore, a wide-ranging study of 3D knuckle outline identification using complexity neuronic nets is satisfactory, and this work has focused on that.

The rest of this paper has been arranged as follows: the details, considering our methodologies, including the material and methods that are utilized for 3D finger knuckle identification, emerge in Section 3. The comparative experimental results are systematically represented in Section 4 of the study. The results discussion is represented in Section 5, while the key conclusions from this work are summarized in Section 6.

3. Materials and Methods

This section concentrates on the methodology for creating and using the 3D middle finger knuckle recognition models. We present deep learning methods for biometric recognition of 3D middle finger knuckle patterns.

3.1. Materials (The Dataset)

In this study, the convolutional neural network’s performance was evaluated using the relatively recently publicly available HKPolyU 3D finger knuckle images dataset [1]. It provides a two-stage dataset with 2D and 3D knuckle images. For the 3D imaging, a photometric stereo method was used to gather this dataset. The biometric photography system is made up of a camera, seven evenly distributed illuminations, a control circuit, and a personal computer. The data for the dataset was collected from 228 distinct subjects. 190 of the 228 subjects volunteered to participate in the 2nd session of data collection. Each session for each subject includes six images of the forefinger and six images of the middle finger. For every 3D image, there are seven photometric stereo images. As a result, each subject in each session has access to forty-two (3D) forefinger photometric stereo images and forty-two (3D) middle finger photometric stereo images [1, 12].

This database of 3D finger knuckles has challenging images that could depict real-world situations, in which the second session photos were taken under diverse imaging conditions or using multiple imaging lenses and illumination [10]. See Figure 1 based on the concept utilized in [1], the authors noted that employing the forefinger images can lead to greater performance than using the middle finger images.

The author of [12] mitigates the impact of the issue with the middle finger by proposing a CNN model to be used for testing middle fingers. Therefore, only the middle finger has been used for the study. The proposed method provides a relatively good solution with an accuracy of 71%.

In this work, we tested various CNN models to develop more precise models that may be used for the recognition of 3D middle finger pictures. Therefore, only the middle finger images of 36 randomly selected subjects have been used for the study. The selection of the subject was not correlated to any criteria. However, using more subjects is feasible while taking longer training and testing times [12].

There are 42 3D middle finger photometric stereo photos for each subject; therefore, 1512 photographs from session 1 were used to train the pretrained models, and the same number of images from session 2 were utilized for testing. The performance and quality of the pretrained convolutional neural network models were evaluated on a total of 3024 pictures. In Figure 1, some examples of the images are displayed.

3.2. Methods (Modelling)

Recent developments in DL, particularly in the biometric area, indicate the possible use of multiple Deep CNN architectures. It could be difficult to train a convolutional neural network (CNN) from scratch, though. We employ a variety of pretrained models and transfer learning strategies to get around this problem [14]. Transfer learning’s main benefit is that it can train data faster and with fewer samples [15]. The newly trained model can use the information gained from the previously trained model [16].

In this study, seven distinct baseline models are extensively evaluated. InceptionV3, Xception, NasNetMobile, DenseNet201, ResNet50, AlexNet, and VGG16 are some examples of these baseline models. These models were chosen because they work effectively in computer vision. All of these baseline models have been employed in this study as a transfer learning model by modifying the last output layer to be suitable for the number of classes that have been used in the experiments. Each of these different models is briefly addressed in the following subsections.

3.2.1. AlexNet Architecture

Due to its exceptional performance on picture classification and recognition tasks, AlexNet is thought to be the first deep CNN architecture [17] as shown in Figure 2. In the early 2000s, the deep CNN architectures’ learning capability was constrained to modest sizes due to hardware constraints. Thus, AlexNet was trained on two NVIDIA GTX 580 GPUs simultaneously to get over hardware constraints and get the full capacity of deep CNN. AlexNet consists of five convolutional layers, three pooling layers, and three fully connected layers with approximately 60 million trainable parameters [18].

3.2.2. VGG16 Architecture

One of the most well-known deep CNN designs is the VGG Net, which was suggested by the authors in [19] from the Visual Geometry Group at the University of Oxford, as shown in Figure 3. The VGG Net won first and second place in the ILSVRC 2014 object localization and classification competitions. The main idea of this architecture was that increasing the depth of the CNN architectures possibly perform more accurately when multiple smaller kernels are used in place of a single large kernel in carrying out computer vision tasks. VGG Net variants are still quite extensively used for many computer vision tasks for extracting deep image features for further processing, especially in the medical imaging field.

3.2.3. InceptionV3 Architecture

The basic concept behind the InceptionV3 designs is to address the issue of excessive variability in the position of the salient parts in the pictures under consideration by allowing the network to incorporate numerous distinct types of kernels on the same level, thereby “widening” the network. The so-called Inception modules make this concept of numerous kernels operating at the same level possible, as shown in Figure 4. The original InceptionV1 (GoogleNet) [20] was suggested using this fundamental concept. Later, in [21], InceptionV2 and InceptionV3 architectures were proposed, which improved on the InceptionV1 architecture by addressing key issues regarding representational bottlenecks and auxiliary classifiers, adding kernel factorization, and adding batch normalization to auxiliary classifiers. This InceptionV3 structure placed first runner up in the ILSVRC 2015 image classification test.

3.2.4. ResNet50 Architecture

The fundamental principle of ResNet frameworks is that stacking convolutional and pooling layers on top of one another can cause the network performance to degrade because of the issue of vanishing gradient. To address this, identity shortcut connections can be used, which essentially skip one or more layers. A residual block is a collection of layers that have identity relationships. By including skip connections, the large training error that is frequently seen in otherwise deep architectures is practically eliminated [22]. One ResNet architectural variation with 50 layers is called ResNet50 (see Figure 5).

3.2.5. DenseNet201 Architecture

The DenseNet architecture [23] incorporates dense connections. It improves on ResNet architecture by connecting each layer to every other layer; with these kinds of densely linked architectures, each layer receives the feature maps from every preceding layer and transmits its feature map to the one above it. Reusing features while retaining a minimal number of parameters overall is another significant benefit of such a design. The DenseNet architecture has several frequently used variations, including the DenseNet201 design that was employed in this work (see Figure 6).

3.2.6. Xception Architecture

Depth-wise separable convolutions are used by Xception. Within this methodology, as shown in Figure 7, there are 36 convolutional phases altogether. The channel-wise spatial convolution is carried out by the Xception model after the one-by-one convolution first. There is no intermediary activation for Xception. Because of this, when compared to other approaches, it has the best accuracy. Finally, Xception works better because the model parameters are used more effectively [24].

3.2.7. NASNetMobile Architecture

The Google Brain Team created the neural structure search network (NASNet), which employs two primary functionalities: (1) normal cell; (2) reduction cell. To attain a higher MAP, NASNet first performs its operations on a small dataset before transferring its block to a large dataset. For better NASNet performance, a customized drop path called a scheduled drop path for efficient regularization is utilized. The original NASNet architecture [25] uses normal and reduction cells precisely because the number of cells is not predetermined. While reduction cells provide the feature map that has been reduced by a factor of two in terms of height and width, normal cells dictate the size of the feature map. Based on the two initial hidden states, a control architecture in NASNet based on a recurrent neural network (RNN) is used to predict the whole structure of the network, as shown in Figure 8.

4. Model Evaluation

4.1. Experimental Setup

The networks are implemented on a machine equipped with an Intel Core i7-2.8 GHz processor and 16 GB of RAM using a software package in MATLAB 2019b and Mac OS High Sierra. The training parameters for training the CNN models have been set; the values of the mini batch size are equal to 15, the learning rate is set at 3e − 4, and the number of epochs is set at 10.

4.2. Confusion Matrix

The effectiveness of the network architecture has been evaluated by using confusion metrics. A table as shown in Figure 9 that summarizes the results of classification problem prediction is known as a confusion matrix [26]. The number of right and wrong predictions is combined and classified into four categories [2628].True positive (TP): both the predicted and actual results are positiveFalse positive (FP): when the prediction is positive, but the actual output is negativeTrue negative (TN): both the actual outcome and the predictions are negativeFalse negative (FN): a negative result is predicted, but the actual result is positive

4.3. Classification Metrics

The model’s performance was evaluated using the five metrics listed below [26]:Accuracy is the ratio of correct to incorrect predictionsPrecision shows how accurately a model classifies a sample as positiveSensitivity refers to the capacity of a model to identify positive samplesF1 score measures the ratio of the precision and recall values in a balanced manner

5. Experimental Results

5.1. Results of Seven Pretrained Models

The main dataset is randomly partitioned into training, validation, and test subsets. The proposed models are trained using training data and validated against the validation set after each training cycle. Then, we use the testing dataset to evaluate the models and quantify the performance of the models using evaluation metrics.

5.1.1. Classification Results

As observed in Table 1, the pretrained models InceptionV3, Xception, NasNetMobile, and DenseNet201 ranked top 4 on prediction accuracy with 99.07%, 97.35%, 92.92%, and 92.59%, followed by ResNet50, which provided an acceptable result with 91.80% accuracy. AlexNet achieved approximately 73.54% accuracy, while the most awful result was gained by VGG16, which provided an accuracy of 7.34%. Other measures such as recall, precision, and F1 score are presented in Table 2.

5.1.2. Confusion Matrix Results

It is clear from Figure 10, that most pretrained models perform well in the biometric identification of 3D middle finger knuckle patterns. InceptionV3 correctly identified 1498 out of 1512 images, and only 14 were misclassified (99.07% accuracy). Xception recognized 1473 of 1512 images, with only 39 images that were not recognized (97.35% accuracy). NASNetMobile and DenseNet201 made 107 and 112/1512 misclassifications (about 93% accuracy). ResNet50 misclassified 124 out of 1512 images (91.80% accuracy), while AlexNet misclassified 400 out of 1512 images (73.54% accuracy). However, VGG16 correctly recognized only 111 of 1554 images (7.34% accuracy).

5.1.3. Learning Curve Results

Figure 11 displays the accuracy and loss curves for the seven pretrained CNN models during the training and validation phases. The graphs demonstrate that InceptionV3, Xception, NASNetMobile, and ResNet50 have the greatest validation accuracy rate of 100%, followed by DenseNet201, AlexNet, and VGG16 with accuracy rates of 97.92%, 88.19%, and 22.59%, respectively.

Figure 11 shows that all model training periods are rather long due to the usage of a computer with a normal CPU capability and the large image dataset for training and testing the models.

The initial value provided by the pretraining model is extremely low, less than 10%, due to the small amount of data for each subject. Using small amounts of data for each person was intended to be closer to real forensic or biometric scenarios, where the availability of samples can be limited to one or a few samples for each subject. Figure 10 also displays the pretraining model’s training loss value. In the pretrained model, it can be observed that the loss values dropped during the training stage. When compared to the training curve, it can be seen that every validation curve oscillates. This is because, in comparison to the training dataset, the size of the validation dataset is relatively small.

Figure 11 also showed that, overall, the validation dataset produced greater accuracy and a lower loss rate than the training dataset, indicating that the models learned more effectively from the validation dataset.

5.1.4. False Discovery Rate Results

The term “false discovery rate” (FDR) refers to the proportion of all false discoveries, such as the proportion of false discoveries in the calculation of all discoveries.

The FDR equation is as follows:

Table 3 indicates the FDR of all pretrained models. It can be observed that InceptionV3 obtained the lowest FDR with 0.82%.

5.2. Training Time Results

It is clear from Table 1 that AlexNet used the minimum time to finish one epoch of the training process. Meanwhile, Xception spent 123.8 s, which represents a significant amount of time, to accomplish one epoch of the training process.

6. Discussion

Using deep learning approaches for biometric identification is a hot topic that has sparked much consideration lately.

Plentiful deep learning models have been generated and effectively deployed for personal recognition.

In this paper, a state-of-the-art study was selected for comparison purposes. After reviewing their methods and findings, we believe that there are still several research gaps for biometric identification employing 3D finger knuckle patterns.

Conversely, our paper perfectly utilized seven pretrained CCN models, including those not implemented in the 3D finger knuckle pattern recognition area.

Moreover, the lack of studies about this type of biometric identifier has confirmed the importance of utilizing such CNN models. Since [10, 12] are the only two previous studies that are available on the CNN-based method for the 3D finger knuckle recognition issue. Besides, [12] represents the merely preceding paper that used the CNN-based approach for 3D middle finger knuckle recognition. Hence, the results of the presented study have been compared with [12], and the findings are summarized in Table 4.

It is clear from the previous table that six out of the seven pretrained CNN models that have been tested in this study outperformed the existing classifier.

7. Conclusions and Future Works

This study implements a transfer learning approach to test seven CNN-models for 3D middle finger knuckle with a dataset containing 3024 3D middle finger knuckle images. These baseline models include InceptionV3, Xception, NasNetMobile, DenseNet201, ResNet50, AlexNet, and VGG16. The performances of the pretrained models have been evaluated. Among the results, InceptionV3 outperformed the pretrained models in 3D middle finger knuckle image classification. With achieving a sensitivity of 99.07%, a recall of 99.07%, a precession of 99.18%, and an F1 score of 99.07%. Despite the excellent performance of the InceptionV3 transfer learning model system, this paper contains some limitations. First, only 36 subjects from one dataset have been used for evaluating the models; the classification result may vary with testing images from different datasets, various imaging situations, or multiple imaging lenses and lighting. Second, there are still some other pretrained CNN models that have not been deployed in the 3D knuckle recognition area. Finally, other preprocessing techniques, such as image enhancement, have not been utilized in this study. In future work, image enhancement technology can be utilized to determine whether there is any approach for improving the results. To sum up, in this study, the InceptionV3 CNN model significantly improved 3D middle finger knuckle recognition performance, representing the possibility of totally automatic and quick 3D middle finger knuckle recognition using a deep neural network model.

Data Availability

Previously reported datasets were used to support this study and are available at “The Hong Kong Polytechnic University Contactless 3D Finger Knuckle Images Database,” (https://web.comp.polyu.edu.hk/csajaykr/3DKnuckle.htm). https://web.comp.polyu.edu.hk/csajaykr/3DKnuckle.htmThe prior study that was used to create this dataset was cited at relevant places within the text as references [1].

Conflicts of Interest

The authors declare that they have no conflicts of interest.