Abstract
Biometric identification is a convenient and reliable method in identity authentication. The widespread adoption of biometric identification requires strong privacy protection against possible theft or loss of biometric data. Existing techniques for privacy-preserving biometric identification mainly rely on traditional cryptographic technology such as oblivious transfer and homomorphic encryption, which will incur huge expenses to the system and cannot be applied to large-scale practical applications. For these issues, we propose a biometric identification scheme by constructing zero-knowledge succinct noninteractive argument of knowledge (zk-SNARK). Our scheme not only reduces the communication overhead, which only needs to send 8 constants to the verifier but also can protect the fingerprint template from disclosure. The time complexity of proof generation and proof verification are about O(C) and , respectively, and the size of the proof is only 8 constants, where C and x represent the size of the circuit and the public input, respectively. We have implemented the proposed authentication solution on a public data set of fingerprint images and evaluated the performance and security.
1. Introduction
In the field of e-government, the security of data requires that the data cannot leave the third-party security center; at the same time, there are requirements for data circulation, which requires outsourcing companies to achieve some business functions but cannot disclose secret information to outsourcing companies. The combination of biometrics and zero-knowledge proof can solve the private problems of templates in traditional biometric identification. There is no need to disclose template information to outsourcing companies, and the security center only need to prove to the outsourcing company that the verification is successful, which can maximize the benefit of society in the areas of credit assessment, risk prevention and control, monitoring and medical care, and so on.
Liu Yuqin et al. [1] classified identity authentication into three categories mainly, which are based on specific knowledge, based on markers, and based on biometrics [2]. Identity authentication based on biometrics is the most secure method currently, which is to store the extracted biometrics of the user in the database as a template. This method can avoid database collision attacks and is also user-friendly [3]. A lot of works on biometric authentication have been done, but most of the works are focused on improving the recognition accuracy of biometric authentication. Some biometric recognition methods based on neural networks have reached nearly 100% recognition accuracy. Although many identification methods can meet people’s requirements for authentication accuracy, how to achieve more secure identity authentication is particularly important and practical. The security of biometric template is particularly important in the biometric identification scheme based on privacy protection. Jain et al. [4] divided the existing schemes for protecting biometric template into two types, namely, feature transformation and biometric encryption. However, these two methods often restricted each other between security and accuracy and cannot achieve high-security identity authentication in a complex network environment. The proposal and development of zero-knowledge proof (ZKP) technology provides new ideas for secure biometric authentication. Yu et al. [5] achieved zero knowledge against the verifier. Li et al. [6] presented the primitive of fuzzy identity-based data auditing. Li et al. [7] proposed two possible solutions to balance privacy and regulation of blockchain-based cryptocurrencies. Li et al. [8] proposed a new cryptocurrency named Traceable Monero to balance the user accountability and anonymity.
ZKPs introduced by Goldwasser et al. [9] are the underlying technology of blockchain. ZKPs enable a prover to convince a verifier of the authenticity of a statement without leaking any other information. ZKPs have three important properties of completeness, soundness, and zero-knowledge. The important function of zero-knowledge proof is to protect data privacy. In 1988, Feige et al. [10] proved that zero-knowledge proof can enable the litigant to prove his identity by proving that he has certain knowledge rather than by proving the validity of the assertion. In other words, zero-knowledge proofs can be applied in the digital identity system with the function of identity information privacy protection.
Bitansky et al. [11] gave the definition of zk-SNARK for the first time and proved its existence. zk-SNARK proposed by Ben-Sasson et al. in 2014 is used in Zcash. Since then, various zk-SNARKs have appeared in recent years, for example, Groth 16 [12] (adopted by Zcash), GKM +1 8, Bulletproof [13], Sonic [14], Marlin19 [15], Plonk [16], Halo [17], Fractal [18], SuperSonic [19] etc. [27–29], of which only Groth 16 requires trusted settings. Bulletproof is an efficient zero-knowledgeness proof suitable for range proof, and it has been applied in the Monero coin blockchain system. Plonk was an optimized and improved version of Sonic, and the verification time is greatly reduced. At the 2020 European Secret Conference, Benedikt et al. [19] also proposed an optimized version of Sonic, SuperSonic, whose proof size and verification time are only logarithmic. Based on zk-SNARK (Groth 16), Guan et al. [12] proposed a blockchain Block Maze, an account model that can effectively hide account balances, transaction amounts, and privacy protection functions associated with the sender and receiver.
1.1. Contributions
A biometric identification scheme based on the zk-SNARK is proposed to address the problem of avoiding leakage of biometric templates information. In this work, we focus on using zk-SNARK to hide sensitive biological information as private input to achieve the effect of zero-knowledgeness.
We extract fingerprints feature points as biometric features because of their inherent uniqueness, persistence, accuracy, and ease of use. In order to improve the efficiency of the scheme, we group the collected fingerprint feature points.
In the scheme, the Euclidean distance of the two fingerprints feature points (biometric templates and extracted biological features) is expressed in binary form; then, we transform the problem into a polynomial domain to improve efficiency.
We implemented the proposed biometric identification on a public data set of fingerprint images and evaluated the performance and security. We also calculated the proof time, verify time, R1CS time, QAP time, and certificate time consumption of each stage corresponding to the number of different fingerprint features.
1.2. Structure of the Paper
In Section 2, some mathematical symbols, bilinear group, arithmetic circuits, and quadratic arithmetic programs, the knowledge of coefficient test and assumption for quadratic arithmetic programs and noninteractive zk-SNARK are described. The construction of our scheme is described in Section 3. Security description and performances analysis are in Section 4. The conclusion is described in Section 5.
2. Preliminaries
2.1. Notation
Table 1 shows the symbols and descriptions used in this article. We write when algorithm A on input x and randomness r, outputs y. means to sample y randomly and evenly from the set S.
2.2. Bilinear Groups
We will study over bilinear groups with the following properties:(i)Bilinearity: for all , , (ii)Nondegeneracy: there exists , such that (iii)Computability: can be calculated by an efficient algorithm for all ,
Galbraith et al. [20] classified bilinear groups as type I, where = , type II, where there is an efficiently computable nontrivial homomorphism , and type III, where does not exist to efficiently computable homomorphism in either direction between and . Type III is the most efficient type of bilinear groups, so it is suitable to be used in practical applications. Our scheme can be implemented on the above three types of bilinear groups.
2.3. The Knowledge of Coefficient Test and Assumption (KCA) for Quadratic Arithmetic Programs
For , let us call a pair of elements (a, b) in an -pair if and .
The KCA proceeds as follows:(1)Verifier chooses random and computes and then sends “challenge” pair (a, b) to the prover. Note that (a, b) is an -pair.(2)Prover must now respond with a different pair that is also an -pair.
Verifier accepts prover’s response only if is indeed an -pair (as he knows he can check if = ).
In general, if we know a set of -pair: , then the only way to generate a new -pair is to compute a linear combination of them. In other words, if you output a new (P, Q) you have to know the combination coefficient among them , satisfies , .
2.4. Noninteractive Zero-Knowledge Arguments of Knowledge
A relation generator R that given an unary security parameter returns a binary relation R that can be determined by polynomial time. For pairs , we say is the statement and is the witness. For simplicity, we will in the following assume can be deduced from the description of R. An efficient publicly verifiable noninteractive argument containing four probabilistic polynomial algorithms (Setup, Prove, Verify, Sim) are as follows: : it returns a common reference string (CRS) and a simulation trapdoor when you input relation . : it returns an argument when you input a CRS and . : it returns no or yes when you input a CRS , a statement and an argument . : it returns an argument when you input a simulation trapdoor and statement . Perfect completeness: if an honest prover knows any true statement, he can convince an honest verifier. For all , , Computational soundness: we say it is sound if it is not possible to prove a false statement. We require that the adversaries . Perfect zero-knowledge: we say it is perfect zero-knowledge if for all , , R and all adversaries .
Definition 1. We say (Setup, Prove, Verify, Sim) is a perfect noninteractive zero-knowledge argument of knowledge for R if it has perfect completeness, perfect zero-knowledge, and computational knowledge soundness.
2.5. Arithmetic Circuits and Quadratic Arithmetic Programs (QAP)
Like Boolean circuits, arithmetic circuits can be represented by graphs with edges and vertices as wires and gates, respectively [21]. Each input and output of a gate can be assigned a value. Whether the allocation is effective or not is related to the circuit. The idea behind the zk-SNARK protocol using arithmetic circuits is to convert valid circuit assignments into algebraic properties of polynomials, using QAP.
Definition 2 (see [22]). A QAP Q over field contains three sets of polynomials , , , and a target polynomial t(x), all from F[X].
Let f be a function with input variables labeled and output variables labeled . Q is a QAP that computes f if the following is true: is a valid assignment to the input/output variables of f if there exists such that t(x) divides -. The size of Q is m, deg (t(x)) represents the degree of Q.
We let the degrees of all. , , and is at most deg (t(x) − 1) because they can all be reduced by modulo t(x) without affecting the divisibility check (the check whether t(x) divides the expression).
3. Our Proposed Scheme
In the section, first of all, we outline the main idea of the biometric identification scheme based on zk-SNARK and then the design of its three stages is introduced. The symbols are described in Table 1.
3.1. Scheme Overview
In our construction, we divided our scheme into three stages: extract fingerprint feature points, construction of biometric identification based on zk-SNARK, and verification. Extract fingerprint feature points: first of all, we need to segment the fingerprint region from the background of the image, then we select the minutiae as the fingerprint feature points, and finally, the fingerprint feature points are represented in the plane right-angle coordinates and the feature points are sorted according to the distance from the center point. Construction of biometric identification based on zk-SNARK: the prover P constructs a ZKP on the basis of the known template and the second collected fingerprint feature points, which makes the verifier Sv believe whether the second collected fingerprint feature and the template are from the same person’s fingerprint without leaking the template. Verification: according to the proof provided by the prover and some public information CRS, the verifier verifies whether the prover is correct or not. The framework of biometric identification is described in Figure 1.

3.2. Extract Fingerprint Feature Points
The preprocessing of the fingerprint image goes through eight steps as shown Figure 2:(1)Fingerprint preprocessing: in the process of fingerprint preprocessing, the method in [23] is adopted. First of all, the fingerprint image is divided into blocks, and then the mean value and variance value are calculated in each block for segmenting the fingerprint area. Finally, enhance the fingerprint image to produce a fingerprint skeleton image. Here, the cross number method is used to extract the end points and the bifurcation points of the fingerprint features.(2)Selection and representation of feature points: It is necessary to select the extracted feature points to avoid the situation that the extracted minutiae have many pseudominutiae (such as the edge part and the blurred part of the fingerprint image) due to the influence of external factors such as noise. For matching the input of the fuzzy extractor, the selected feature points need to be represented in an appropriate form.(i)Selection of the center point: now, the Poincare [24] indexing algorithm is the most commonly used method to find the center point which may not be accurate enough (for example, the center area of a fingerprint is obtained instead of a specific point), but do not affect our approach, because we do not directly use it as a reference point for feature extraction or fingerprint comparison, but take the region where the central point is located as a reference and select several minutiae around the central point.(ii)Selection of the minutiae point: we use the cross number method to detect fingerprint endpoints or bifurcations. If the selected minutiae points are too many, it will increase the verification time, if too few, the uniqueness of the fingerprint will not be obvious. So, we should select appropriate number of minutiae points to represent the characteristics of the whole fingerprint. First of all, we let the minimum sampling radius (px represents picture pixels). Then, the end point of the fingerprint ridge line is selected as the reference point; after that, we establish a polar coordinate system. (0, 0) is the center point of the coordinates, and the coordinates of any minutiae point is . For any minutiae point, if it satisfies , it will be a candidate minutiae point. Meanwhile, we select the nearest 3n minutiae points closest to center point and add them to the selection queue.(iii)Sorting of the minutiae point. The minutiae selected in the previous step are sorted from small to large in value. Then, a set of minutiae points , is formed, in which 3n is the number of effective minutiae points selected. , is the distance from the center point, and is the angle relative to the polar axis.(3)Feature point processing: convert polar coordinates , to rectangular coordinates , as fingerprint feature points. The method is as follows: , where and . We divide the 3n fingerprint feature points into n groups; 3 fingerprint feature points in each group, that is, , . We save , as a template to the third-party security center (Figure 3). Take the same method for the second time to get , , , and do not need to save , , ,(4)Matching: calculate Euclidean distance , respectively. Compare the size of and the threshold , respectively, if half of the are less than or equal to , the authentication is passed, or the authentication is not passed. The framework of the matching is described in Figure 4.



3.3. Construction of Biometric Identification Based on Zero-Knowledge Proof
For security reasons, P can obtain the template , , , by obtaining the authorization of third-party security center through the private key. , is kept secret Sv. That is, P can get the template and the fingerprint information collected for the second time. Sv only gets the fingerprint information collected for the second time. Since , , , is kept secret Sv, , cannot be calculated directly, and a ZKP needs to be used.
The goal of our scheme is using the idea of zk-SNARK to convince the Sv that , and , , satisfy the distance threshold condition , is less than or equal to , that is, , . P uses the following three-step strategy to prove to Sv that and satisfy the distance threshold condition , . The scheme performs a total of n zk-SNARKs. Take the first group as an example to elaborate on the whole process. We suppose and , :(1)Convert the arithmetic equations into a circuit C The arithmetic equations: The arithmetic equations are covered into the circuit C (Figure 5). The constraints of circuit C (it is transformed into t + 7 constraints):(2)Convert constraints to R1CS. Let find the three matrices a, b, and c for each constraint , that is t + 7 constraints, so a, b, and c are three (t + 7) × (t + 18) matrices.(3)Convert R1CS to QAP. According to Lagrange interpolation, three polynomial matrices of degree t + 6 can be obtained. s meets , where , H(X) = P(X)/Z(X). The equation holds when to satisfy all constraints. We will now construct a noninteraction zero knowledge for QAP generators that outputs relations of the form: The relation defines a language of witnesses and statements (public and private inputs). such that:(1): introduce third-party trusted settings to generate random numbers , , , which keep secret from P and Sv, and can then be discarded. For the convenience of writing, we set . We define . We define :(2): P generates proof from CRS and public private input s. P calculates , , , , , , with = = : The same principle for , , , , and (3): Sv computes by CRS and verifies the following equations; if all the equality holds, the authentication is successful, output , otherwise, output ,

Verify the following equations:
Lines of the Equations (12) verifies whether A(z), B(z), and C(z) is a linear combination of set , and respectively; by verifying that L(z) = A(z) + B(z) + C(z) of lines of the Equations (12) holds, which shows that A(z), B(z), C(z) correspond to the same coefficient s, (5) lines of the Equations (12) is true, indicating that P(z) = H(z)Z(z).
The outline of the zk-SNARK is as shown in Figure 6.

3.4. Verification
When n zk-SNARKs are verified, that is, n yes is output, the biometric identification is passed, the output is accepted, or the output is rejected.
4. Security Description and Performance Analysis
4.1. Security Description
According to Schwartz-Zippel lemma [25], we randomly choose a random number substituted into equations (1)–(3); when we verify the equations hold, we say that the equations are permanent true.
SNARK-completeness: it assures a verifier always accepts an proof generated by an honest prover. The polynomials of the QAP are generated (Algorithm 1). The prover knows a secret s input and he needs to find a valid distribution for all the gates of the arithmetic circuit C. The prover constructs P(X), divides it by Z(X), and encodes the result using the proving key (Algorithm 2).
|
|
For example, means that is the encoded polynomial H(z) in the group . The verifier checks whether Z(X) divides P(X) (Algorithm 3). The verifying equation is equivalent to . So, the nondegeneracy of the pairing e implies the equation hold if and only if P(z) = H(z)Z(z). If the proof was constructed honestly, and so equation holds which shows completeness. In other words, if P knows s, that is, P knows , it must pass the verification.
|
SNARK-soundness: A malicious prover could use P(X) = Z(X) for his proof with A(X) = 1, B(X) = Z(X), and C(X) = 0. It would return Yes by the verifier, although the prover does not know the secret s. Therefore, the verifier must check twice to ensure that P(X) was built correctly. In other words, if the verification is passed, P must know s, that is, P must know :(1)Correct span: the polynomials , , and are linear combinations of , , and , respectively(2)Same coefficients: the same coefficients are used by the linear combinations , , and
It is achieved by requiring the prover to construct the same proof for the second time, but using proving keys which are different from the first ones by multiplying a random constant (Table 2).
Correct span check: the prover has to add , , and to the proof . The verifier can use the proof to check whether
Since for example holds if and only if . Based on the knowledge of exponent [26] assumption, there is a very high probability that the prover used the encoded version and .
Same coefficient check: the prover has to add to the proof which the verifier uses to check whether .
Similarly, the prover used the same coefficients , , and if the equation holds.
Since correct span and same coefficients guarantee that the prover constructed the polynomial P(X) correctly, the divisibility check means that the probability of the prover knowing the secret s is extremely high.
Zero knowledge: in the whole process, the supervisor Sv cannot extract any knowledge of , .
4.2. Performance Properties
Our experiment has two purposes: (i) Evaluating the performance of our scheme using the method described in Section 3 and observe how the size of the threshold and the number of fingerprint features extracted affects accuracy and false rejection rate (FRR). (ii) Measuring the execution times of each phase of our authentication scheme:(1)Experimental setup: in this paper, Python language was used to test the performance and the time of the scheme in the MAC environment. The MAC used in the experiment is configured as 1.8 GHz Intel Core i5, and the memory is 8 GB and 1600 MHz DDR3. The fingerprint database we use was FVC2002 DB1_B/101 and DB1_B/102 [26]. The database contains 2 fingers, each finger has 8 different fingerprint images, each person’s fingerprint has positive samples, so the positive samples have = 112 groups, while the antagonistic samples have = 128 groups. All computations in the scheme is carried out on the finite field with Taking one of the fingerprints as an example, the extracted fingerprint coordinates are shown in Figure 7. Figure 8 is the coefficients of the polynomials of QAP, which is only partially presented because of the limited space and the proofs sent by P to Sv.(2)Evaluation of the performance: in the following, we discuss the experimental evaluation of the performance, in terms of accuracy and FRR. We did the experiment by varying the size of the threshold and the number of fingerprint features extracted. In this experiment, we tested 128 + 112 = 240 groups of experiments for each threshold with varying size as and also tested 128 + 112 = 240 groups of experiments for each fingerprint feature with varying number as 9, 12, 15, …, 54. Accuracy and FAR results are reported in Tables 3 and 4 and in Figures 9 and 10. According to the experimental results, the accuracy and FRR does not change with the number of fingerprint features. The reason is that our threshold increases exponentially, so the accuracy and FRR has little to do with the number of fingerprint features. So, we only discuss the relationship between the accuracy, the FRR and the distance threshold here. Accuracy increases with the increasing of the distance threshold. In contrast, FRR decreases with the increasing number of the distance threshold. Therefore, depending on the security and usability trade-off requirement of a particular application, we select the appropriate ratio between accuracy and FRR. Because our distance threshold must be set to , which changes exponentially, the growth rate is too large to find a more suitable distance threshold, so the experimental results are not very good. In the later stage, we will design a new protocol on this basis, and there is no restriction on the threshold, that is, the threshold can be any value and then conduct further research.(3)The execution times of each phase: we extracted 9, 15, 18, …, 54 fingerprint features, respectively. The time consumption of each stage of the whole scheme is shown in Table 4. Figure 11 shows the time consumption of various stages in the entire project process since the time consumption of each stage varies greatly. Figure 12 specifically shows some of the stages that consume less time in the scheme.






Accordingly, the setup time proof time, CRS time, R1CS time, and verify time of the scheme are very small, and the efficiency of these stages is also relatively high. Figure 12 makes it clearer to discount the time consumption of these stages. The QAP stage is the most time-consuming in the whole scheme (Table 5), the scheme groups fingerprints, which greatly improves efficiency, and the efficiency will also be greatly improved by optimizing parallelism in the later stage. The setup time is the total time for fingerprint information preprocessing establishment. As the number of fingerprint features increases, the setup time and proof time show an approximately linear increase (Figures 11 and 12).
5. Conclusion
We propose a novel biometric identification scheme based on zk-SNARK. In the scheme, the biometric information and templates are entered into the equation as private input, and the Euclidean distance of the two biological features is expressed in binary form based on the size of the threshold and then transform the problem into a polynomial domain to improve efficiency. Compared with the general biometric identification scheme, our scheme not only greatly improves the efficiency but also has the function of privacy protection, which can protect the fingerprint template from disclosure. We have implemented the proposed authentication solution on a public data set of fingerprint images and evaluated the performance and security and calculated the proof time, verification time, R1CS time, QAP time, and certificate time consumption of each stage corresponding to the number of different fingerprint features, which will surely produce more and more important and far-reaching impact in the digital economy and other fields.
In our future work, we will (1) design new protocols on this basis, and there is no restriction on the threshold, that is, the threshold can be any value, and then conduct further research and (2) improve the efficiency of the scheme and reduce QAP time. (3) This solution does not design an application scenario, and we will focus on it in the next step.
Data Availability
The fingerprint database we use is FVC2002 DB1_B/101 and DB1_B/102.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this article.
Acknowledgments
This research was partially supported by the National Natural Science Foundation of China (no. 61772166) and the Key Program of the Natural Science Foundation of Zhejiang Province of China (no. LZ17F020002).