Abstract
Multiparty threshold private set intersection (MP-TPSI) protocol allows mutually untrusted parties holding data sets of size respectively to jointly compute the intersection over all their private data sets only if the size of intersection is larger than , while ensuring that no other private information of the data sets other than the intersection is revealed, where is the threshold. In the MP-TPSI protocol, multiple parties first decide whether the size of the intersection is larger than the threshold ; then, they compute the intersection if the size of the intersection is larger than the threshold . However, the existing MP-TPSI protocols use different forms of evaluation polynomials in the cardinality testing and intersection computing phases, so that parties need to transmit and calculate a large number of evaluation values, which leads to high communication and computational complexity. In addition, the existing MP-TPSI protocols cannot guarantee the security and the correctness of the results, that is, an adversary can know the additional information beyond the intersection, and the elements that are not in the intersection are calculated as the intersection. To solve these issues, based on the threshold fully homomorphic encryption (TFHE) and sparse polynomial interpolation, we propose an MP-TPSI protocol. In the star network topology, the theoretical communication complexity of the proposed MP-TPSI protocol depends on the threshold and the number of parties , not on the size of set . Moreover, the proposed MP-TPSI protocol outperforms other related MP-TPSI protocols in terms of computational and communication overheads. Furthermore, the proposed MP-TPSI protocol tolerates up to corrupted parties in the semi-honest model, where no set of colluding parties can learn the input of an honest party in the strictest dishonest majority setting.
1. Introduction
The private set intersection (PSI) protocol [1] allows two mutually untrusted parties and holding data sets and to jointly compute the set intersection , and does not reveal anything except the intersection. PSI protocol has a large number of application scenarios, e.g., DNA matching [2], botnet detection [3], and private contact discovery [4]. Over the past few decades, in the semi-honest and malicious security model, a long line of work [5–23] has been made to effectively implement the PSI protocol. The main cryptographic primitives of the existing PSI protocols include: garbled circuits (GC) [24], oblivious transfer (OT) [25], homomorphic encryption (HE) [26] and pseudorandom functions (PRF) [27], etc. To support PSI among multiple parties, several multiparty PSI (MP-PSI) protocols [28–36] have been presented.
However, in certain application scenarios, such as vertical federated learning (VFL) [37], the MP-PSI protocol mentioned above cannot satisfy the requirements. Specifically, in vertical federated machine learning, the training data is distributed among multiple parties, and each party has different features of the same object, multiple parties want to combine different features of common samples to train a better machine learning model. It is worth noting that all parties are willing to perform multiparty entity alignment only when the number of sample intersection is large. If the number of sample intersection is too small, the sample alignment will have no effect on improving the performance of the model, and the parties will not be interested in jointly computing the intersection of training samples. To meet such demands to determine whether the size of intersection is large enough before performing sample alignment, the multiparty threshold private set intersection (MP-TPSI) protocols [38–41] have been introduced, which enables mutually distrusted parties holding data sets of size respectively to jointly compute the intersection over all their private data sets only if the size of intersection is larger than , while ensuring that no other private information of the data sets other than the intersection is revealed. The MP-TPSI protocol consists of two phases: the cardinality testing phase, where multiple parties decide whether the size of intersection is larger than a certain threshold ; and the intersection computing phase, where multiple parties calculate the intersection if the size of intersection is larger than a certain threshold . Unfortunately, the existing MP-TPSI protocols [38–41] still have the heavy communication complexity. To solve this problem, using sparse polynomial interpolation and threshold fully homomorphic encryption (TFHE) [42], this paper proposes an MP-TPSI protocol with low communication complexity.
The main contributions are as follows:(1)Firstly, in a star network topology where the designated party can communicate with each party , using an evaluation method that represents the set as a polynomial, we construct an MP-TPSI protocol based on the TFHE. To reduce the communication and computational cost, we use the same form of evaluation polynomial in the cardinality testing and intersection computing phases, which enables the parties to transmit and compute only a small number of evaluation values.(2)Secondly, in the proposed MP-TPSI protocol, the theoretical communication complexity of the designated party and each party are () and (), respectively, which are smaller than the existing MP-TPSI protocols [38–40] and TAHE-based MP-TPSI protocol [41]. In contrast to conventional MP-PSI protocols [28–36], the communication complexity of the proposed MP-TPSI protocol only depends on the threshold and the number of parties , not on the size of set .(3)Finally, we evaluate the proposed MP-TPSI protocol and the related TFHE-based MP-TPSI protocol [41] under , , and . The experimental results demonstrate that, compared with the TFHE-based MP-TPSI protocol [41], the computational and communication costs in the proposed MP-TPSI protocol are reduced by nearly 92.0%–97.3% and 67.2%–67.3%, respectively. The security analysis illustrates that the proposed MP-TPSI protocol can achieve semi-honest security in the dishonest majority model where up to parties can be allowed to corrupt.
The remainder of the study is organized as follows. We introduce some related works in Section 2. In Section 3, we review some preliminaries. In Section 4, our protocol is described in detail. The performance evaluation of our protocol is presented in Section 5. The security analysis of our protocol is shown in Section 6. Finally, we conclude in Section 7.
2. Related Works
Some works [28–36, 38–41] closely related to this study are introduced in this section. For ease of description, we summarize the theoretical communication complexity of [28–36, 38–41] in Table 1.
By representing the set as a polynomial, based on threshold additive HE (TAHE) that can be realized from Paillier encryption [43], Kissner et al. [28] implement the PSI operations in multiparty setting. Leveraging the Bloom filters (BF) [44] and exponential additive HE (AHE) [45], Miyaji et al. [29] presented a scalable MP-PSI protocol, they set a dealer to decrease the computational complexity of the parties. In a star network topology, based on the two-party protocol of [46], Hazay et al. [30] described the MP-PSI protocols in semi-honest and malicious settings. Kolesnikov et al. [31] proposed a method called oblivious programmable PRF (OPPRF), designed MP-PSI protocols based OPPRF in the semi-honest model, and further optimized it to the augmented-semi-honest model. Inbar et al. [32] extend the PSI construction of [12] to multiparty setting, and described the MP-PSI protocols for semi-honest and augmented-semi-honest settings in a star network topology. Setting the elements of its own set to the roots of a polynomial, based on the OLE, in a star network topology, Ghosh et al. [33] presented an approach to achieving secure MP-PSI. Lu et al. [34] proposed an MP-PSI protocol for VFL in a star network topology, which is able to compute the intersection in the event that some of the parties are offline. Combining of the star and path communication patterns which in the former, one party at the center can communicate with all other parties, and in the latter, each party can communicate with neighboring parties, Kavousi et al. [35] presented an efficient protocol for MP-PSI using oblivious PRF (OPRF). Based on the TAHE schemes and BF, in a star network topology, Bay et al. [36] proposed an MP-PSI protocol, which is secure in the semi-honest model. However, the communication and computational complexity of the MPSI protocol [28–36] mentioned above depend on the size of the input data set, which directly becomes a basic obstacle to efficiency.
Based on the AHE, Ghosh et al. [38] introduced an MP-TPSI protocol, which is the first MP-TPSI protocol with communication complexity that depend on threshold , not on the set size . However, Abadi et al. [47] pointed out that [38]'s protocol is not secure because an adversary can learn other information about the sets of honest parties beyond the intersection. Using the OPRF and hash function, Mahdavi et al. [39] introduced two constructions for the MP-TPSI protocol, namely and , but the computational complexity is exponential in the threshold , and thus have a poor performance. By employing the TAHE from Elgamal encryption [48] and Paillier encryption [43], Branco et al. [40] developed a protocol to securely compute linear algebra functions and proposed an MP-TPSI in a star network topology. Badrinarayanan et al. [41] pointed out that [38]'s protocol has a subtle issue, that is, elements that are not in the intersection may also be computed as elements in the intersection. To solve this issue, in the star network topology, they proposed the TAHE-based MP-TPSI and TFHE-based MP-TPSI protocols. However, their TFHE-based MP-TPSI protocol uses different forms of evaluation polynomials in the cardinality testing and intersection computing phases, which requires the transmission and calculation of a large number of evaluation values, and brings to heavy communication and computational cost.
3. Preliminaries
3.1. Notations
For ease of reading, the definitions of symbols in the proposed MP-TPSI protocol are described in Table 2.
3.2. Security Model
We define the security of the proposed MP-TPSI protocol in universal composability (UC) framework [49]. Considering a multiparty protocol that realizes the ideal functionality , we can define the security of the protocol in the ideal/real world.
In an ideal world: parties transmit all inputs to , and receive the computation result. Simulator is regarded as an adversary in an ideal world, has complete control of the parties that are corrupted, and simulates 's view of on the execution of the real protocol.
In a real world: parties perform , is permitted to call an ideal functionality . Environment selects all inputs of parties, simulates anything outside . can represent the adversary and corrupt any subset of the parties.
Assuming and are the output of in the ideal and real world, respectively, we define securely realizes , if there is a so that for any we have
3.3. The definition of Threshold Fully Homomorphic Encryption
A TFHE scheme [42] consists of the distributed setup (), encryption (), addition (), multiplication (), partial decryption (), and combination () algorithms.: On input and party’s number , algorithm returns the secret key share and public key for the party .: On input and plaintext , algorithm returns the ciphertext .: On input the ciphertexts and , algorithm outputs the ciphertext .: On input the ciphertexts and , algorithm outputs the ciphertext .: On input the secret key share and ciphertext , algorithm outputs the partial decryption ciphertext .: On input a set of partial decryption ciphertexts , algorithm outputs the plaintext .
3.4. Functionality
Ideal functionalityfor MP-TPSI cardinality testing: In a star network topology, for parties holding data sets of equal size , respectively, the goal of the is to execute a multiparty protocol , at the end of , every party can know whether if its data set and intersection differ by at most , namely . The formal definition of is depicted in Figure 1.

Ideal functionalityfor MP-TPSI computing: In a star network topology, for parties holding data sets of equal size , respectively, the goal of the is to execute an multiparty protocol , at the end of , either every party outputs an intersection or outputs none . The formal definition of is described Figure 2.

4. Multiparty Threshold Private Set Intersection
In a star network topology where party to be the designated party that can communicate with other parties , suppose parties with input sets of equal size , respectively, based on TFHE with distributed setup, we propose an MP-TPSI protocol, in which each party can compute the intersection only if . The proposed MP-TPSI protocol is formally described in Figure 3.

4.1. Correctness
MP-TPSI cardinality testing: First we consider the situation where the MP-TPSI cardinality testing outputs . Based on the correctness of the TFHE, we only need to illustrate only if for any . Observe the rational interpolation polynomialwe can see that the degree of numerator and denominator is at most , and the degree of rational polynomial is at most . Therefore, can be computed from a total of evaluation values, and the equation holds. Next, we consider the situation where the MP-TPSI cardinality testing outputs . From the above equation, we can observe that . Since , the degree of and are at least , the degree of rational polynomial is at least , and hence calculating requires at least evaluation values. However, there are only evaluation values in the MP-TPSI cardinality testing. Therefore, the equation does not hold. From the above analysis, we are able to obtain that the MP-TPSI cardinality testing is correct.
MP-TPSI computing: If for any , the MP-TPSI computing quits after the MP-TPSI cardinality testing. If , observe the rational interpolation polynomialwe can see that the degree of numerator and denominator are at most , and hence is a random polynomial with degree at most . Since , no other terms will be canceled out in the numerator and denominator. Therefore, based on the correctness of the TFHE, each party is able to interpolate the rational random polynomial by utilizing evaluation values. Finally, each can easily compute intersection from the set of the roots of the denominator of polynomial .
5. Performance Evaluation
The proposed MP-TPSI protocol is an improvement of the TFHE-based MP-TPSI protocol [41], so we evaluate the proposed MP-TPSI protocol and the TFHE-based MP-TPSI protocol [41]. In the star network topology, we implement the proposed MP-TPSI protocol on top of the lattice-based multiparty HE library Lattigo [50] that implements the full-RNS BFV scheme [51] and its multiparty versions in Go. We run all experiments on a 32-core Intel Xeon CPU with 256 GB of RAM. For the multiparty BFV scheme in Go, to ensure 128 bits security, we choose that polynomial-degree is 4096, ciphertext-modulus is 109 bits, and plaintext-modulus is 17 bits. For ease of comparison, we perform all experiments on the same machine with 16 threads, emulate the networks latency by utilizing the Linux command, and consider a LAN with a 10 Gbps throughput and 0.2 ms round-trip time. It is worth noting that the authors of [41] did not implement their TFHE-based MP-TPSI protocol, for a fair comparison, we implement the TFHE-based MP-TPSI protocol [41] in the same experimental environment.
5.1. Analysis of Computational Cost
The computational cost of the proposed MP-TPSI protocol and the TFHE-based MP-TPSI protocol [41] under , , and are shown in Table 3. All running times are shown as an average of 10 experiments.
As shown in Figure 4, compared with the TFHE-based MP-TPSI protocol [41], the proposed MP-TPSI protocol has a better performance in terms of computational cost. Specifically, under and , for , the computational cost in the proposed MP-TPSI protocol is almost reduced by 92.4%, 92.2%, 92.0%, 92.1%, 92.4%, 92.9%, and 93.8%, respectively, compared with the TFHE-based MP-TPSI protocol [41]. Under and , with regard to , the proposed MP-TPSI protocol decreases by almost 94.6%, 94.6%, 94.5%, 94.4%, 94.1%, 94.6% and 95.2% respectively in computational cost in comparison with the TFHE-based MP-TPSI protocol [41]. Under and , regarding , the proposed MP-TPSI protocol reduces the computational cost by almost 96.7%, 96.7%, 96.7%, 96.6%, 96.8%, 97.0%, and 97.3%, respectively, than the TFHE-based MP-TPSI protocol [41].

(a)

(b)

(c)
5.2. Analysis of Communication Cost
In a star network topology, according to the selected parameters in Section 5.1, we can obtain the size of ciphertext, partial decryption ciphertext and plaintext are bits, bits, and bits, respectively. The comparison of communication cost between the proposed MP-TPSI protocol and the TFHE-based MP-TPSI protocol [41] are shown in Table 4.
For the TFHE-based MP-TPSI protocol [41], parties first run the MPSI cardinality testing. Each sends ciphertexts and one ciphertext to . returns one ciphertext to each . Each sends one partial decryption ciphertext to . returns one plaintext to each . If the MP-TPSI cardinality testing passes, parties then run the MP-TPSI computing. Each sends ciphertexts to . returns ciphertexts to each . Each sends ciphertexts to . returns ciphertexts to each . Each sends partial decryption ciphertexts to . returns plaintexts to each . Therefore, the communication cost of the designated party is (namely, ()), the communication cost of each is (namely, ()), and the total communication cost is bits.
For our MP-TPSI protocol, parties first run the MP-TPSI cardinality testing. Each sends ciphertexts and one ciphertext to . returns one ciphertext to each . Each sends one partial decryption ciphertext to . returns one plaintext to each . If the MP-TPSI cardinality testing passes, parties then run the MP-TPSI computing. returns ciphertexts to each . Each sends partial decryption ciphertexts to . returns plaintexts to each . Therefore, the communication cost of the designated party is (namely, ()), the communication cost of each is (namely, ()), the total communication cost is bits.
As shown in Figure 5, compared with the TFHE-based MP-TPSI protocol [41], the proposed MP-TPSI protocol has a better performance in terms of communication cost. Specifically, when comparing with and , for parties, the communication cost in the proposed MP-TPSI protocol is almost reduced by 67.3%, 67.3%, 67.3%, 67.3%, 67.3%, 67.3%, and 67.3%, respectively, compared with the TFHE-based MP-TPSI protocol [41]. When comparing with and , with regard to parties, the proposed MP-TPSI protocol decreases by almost 67.2%, 67.2%, 67.2%, 67.2%, 67.2%, 67.2%, and 67.2%, respectively, in communication cost in comparison with the TFHE-based MP-TPSI protocol [41]. When comparing with and , regarding parties, the proposed MP-TPSI protocol reduces the communication cost by almost 67.2%, 67.2%, 67.2%, 67.2%, 67.2%, 67.2% and 67.2% respectively than the TFHE-based MP-TPSI protocol [41].

(a)

(b)

(c)
6. Security Analysis
In security model, we assume an environment who is able to corrupt the set of parties, a simulator knows the output value of the ideal functionality . If , sets , otherwise sets . also has the output value or of the ideal functionality . In addition, for each corrupt party , has the input data set and random value of . The simulation strategy of is described as follows.
Initialization. represents each honest party running the distributed setup algorithm just like in the real world. also knows the secret key share of all corrupt parties .
MP-TPSI Cardinality Testing. does the following:
In Step 1, encodes the intersection set as a rational polynomial , chooses randomly a rational polynomial of degree , and computes a rational polynomial .
In Steps 2–4, whenever each honest party sends any encrypted value, computes the ciphertext employing fresh random value on behalf of just like in the real world.
In Steps 5–6, instead of computing the value by executing the partial decryption algorithm on behalf of every honest party just like in the real world, calculates the value by executing the simulator algorithm , where represents the computation circuit performed by to calculate the value just like in the real world, this corresponds to the idealworld, denotes the set of the honest parties. If is honest, sends the evaluation value just like in the real world.
MPSI Computing. does the following:
In steps 1, instead of computing the value by executing the partial decryption algorithm on behalf of every honest party just like in the real world, calculates the value by executing the simulator algorithm , where represents the computation circuit performed by to calculate the value just like in the real world, this corresponds to the ideal world. If is honest, sends the evaluation value just like in the real world.
In steps 2, outputs the interpolation polynomial and set intersection on behalf of every honest party just like in the real world.
Next, suppose a simulator , we show that the proposed MP-TPSI protocol is secure against the environment in the semi-honest setting through a set of computationally indistinguishable consecutive hybrids. : simulates all operations of honest parties just like in the real world. : simulates a ideal functionality . If , returns , otherwise returns . : simulates the partial decryption performed by the honest parties just like in the ideal world. For each , computes the partial decryption as . The rational polynomial is still calculated as in the real world. : Instead of calculating the rational polynomial just like in the real world, selects randomly a rational polynomial of degree , and computes a rational polynomial . : simulates the ciphertexts computed by any honest parties as encryption of , just like does in the ideal world.
Theorem 1. Assuming that the TFHE scheme is secure, the proposed MP-TPSI protocol securely realizes and in a star network topology, and resists a semi-honest adversary who has the ability to corrupt up to parties. It can be proved by Lemma 1–4 in Appendix.
7. Conclusion
In this study, using sparse polynomial interpolation and TFHE, we introduce a MP-TPSI protocol with low communication complexity, in which the communication complexity only depends on the threshold and the number of parties , not on the size of data set . Compared with the existing MP-TPSI protocols, the proposed MP-TPSI protocol utilizes the same form of evaluation polynomial in the cardinality testing and intersection computing phases, which enables the parties to transmit and compute only a small number of evaluation values, and hence reduces the communication and computational cost. Performance evaluation demonstrates that our MP-TPSI protocol requires 92.0% and 67.2% less computational and communication costs respectively than the competitive MP-TPSI protocol. Moreover, the proposed MP-TPSI protocol can achieve the correctness of the intersection result, and ensure the security of the data of the parties, that is, the semi-honest adversary cannot learn additional information beyond the intersection. In the future, we will explore the MP-TPSI protocol in the broadcast communication setting, optimize the rounds of MP-TPSI, and design a more efficient MP-TPSI protocol with malicious security.
Appendix
Lemma 1. and is computationally indistinguishable due to the correctness of the MP-TPSI protocol .
Proof. The difference between and is that in , calls honestly, while in , simulates the ideal functionality that returns if and otherwise. In , the output result of is correct due to the correctness of our protocol . In , the output result of is always correct. Therefore, and are computationally indistinguishable.
Lemma 2. and is computationally indistinguishable due to the simulation-based security of TFHE [42].
Proof. The difference between and is that in , computes the partial decryption of TFHE of all honest parties just like in the real world, while in , simulates the partial decryption by running . If there is an that is able to distinguish and with a non-negligible probability , we are able to build a reduction algorithm that has the ability to break TFHE’s simulation-based security with a non-negligible probability . interacts with a challenger in TFHE’s simulation-based security game, and interacts with in the game of and . The corrupt parties in the game of and are the same as the corrupt parties in the game of and . sends the public key share and secret key share of the corrupt party that it receives from to , and sends the public key share of the honest party that it receives from to . sends the corrupt party’s input data set and random value that it receives from to . sends the honest party’s ciphertext that it receives from to . sends the evaluation circuit of rational polynomial to . returns the honest party’s partial decryption to . continues to interact with for the rest progress just like in . In the interaction process, if sends honestly computed partial decryption, then the interaction process between and is associated with , if the partial decryption is simulated by , the interaction process between and is associated with .
From above, if there is an that is able to distinguish and with a non-negligible probability , has the ability to break TFHE’s simulation-based security with a non-negligible probability , this contradicts with TFHE’s simulation-based security [42]. Therefore, is computationally indistinguishable from .
Lemma 3. is statistically close to .
Proof. The difference between and is how the rational polynomial is calculated. In , computesFor each , . Thus, . Since is statistically close to a uniform random polynomial of degree , we can obtain , where is uniform random polynomials of degree . In , computes . Therefore, the distribution of in is statistically close to the distribution of in .
Lemma 4. and is computationally indistinguishable due to the semantic security of TFHE [42].
Proof. The difference between and is that in , computes the encryption of TFHE of all honest parties just like in the real world, while in , computes the encryption of .
If there is an that is able to distinguish and with a non-negligible probability , we are able to build a reduction algorithm that has the ability to break TFHE’s semantic security with a non-negligible probability . interacts with a challenger in TFHE’s semantic security game, and interacts with in the game of and . The corrupt parties in the game of and are the same as the corrupt parties in the game of and . sends the public key share and secret key share of the corrupt party that it receives from to , and sends the public key share of the honest party that it receives from to . sends the honestly generated plaintext and to . returns their ciphertexts to . uses the ciphertext it receives from to interact with . continues to interact with for the rest progress just like in . In the interaction process, if sends honestly computed ciphertext, then the interaction process between and is associated with , if the ciphertext is computed as ’s encryption, the interaction between and is associated with .
From above, if there is an that is able to distinguish and with a non-negligible probability , has the ability to break TFHE’s simulation-based security with a non-negligible probability , this contradicts with TFHE’s semantic security [42]. Therefore, is computationally indistinguishable from .
Data Availability
No data were used to support this study.
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant no. U19B2021) and the National Natural Science Foundation of China (Grant no. 62172427).