Abstract
Searchable public key encryption supporting conjunctive keyword search is an important technique in today’s cloud environment. Nowadays, previous schemes usually take advantage of forward index structure, which leads to a linear search complexity. In order to obtain better search efficiency, in this paper, we utilize a tree index structure instead of forward index to realize such schemes. To achieve the goal, we first give a set of keyword conversion methods that can convert the index and query keywords into a group of vectors and then present a novel algorithm for building index tree based on these vectors. Finally, by combining an efficient predicate encryption scheme to encrypt the index tree, a tree-based public key encryption with conjunctive keyword search scheme is proposed. The proposed scheme is proven to be secure against chosen plaintext attacks and achieves a sublinear search complexity. Moreover, both theoretical analysis and experimental result show that the proposed scheme is efficient and feasible for practical applications.
1. Introduction
With the rapid development of cloud computing, many enterprises and individuals are willing to store their data over the cloud server for the sake of its efficiency and convenience. Although data outsourcing has brought great convenience to resource-constrained users, it inevitably raises data privacy and security concerns since users’ data stored in the cloud server are generally in plaintext mode. If the cloud server is compromised by some malicious intruder, these data can be accessed by them with no obstacles. To overcome this problem, a straightforward approach is encrypting data before outsourcing them. Nevertheless, this trivial approach needs to download all encrypted data to the client and then perform a keyword search after decrypting these data. Because this method is inefficient, if the amount of data is relatively large, exploring a new technology which can retrieve encrypted data on the cloud server without decryption is of prime importance in practical application.
Searchable encryption (SE) [1] is a promising method, which offers a capacity for searching over encrypted data securely. In terms of different cryptography primitives, SE can be classified into two types, that is, searchable symmetric encryption (SSE) and searchable public key encryption (SPE). SSE is fit for the scenario of one data owner and one data user, while SPE can be applied in the scenario of many data owners and one data user, e.g., Mail Routing Service (MRS) and Personal Healthcare Record (PHR). In practice, since user’s data and query can be expressed as a keyword set, many existing SPE schemes are keyword-based schemes.
The first SPE scheme that supports a keyword search, called public key encryption with keyword search (PEKS), was proposed by Boneh et al. [2]. They first defined the framework and security definition of SPE and gave a concrete scheme supporting only a single keyword search. To support conjunctive keywords search, Park et al. gave two public key encryptions with conjunctive keyword search (PECK) schemes [3]. The first scheme requires too much bilinear pairing operations, while the second one requires many private keys that are linear with the number of keywords. However, their schemes need the keyword field as additional information to perform the keyword search. To create a PECK scheme without keyword field, a hidden vector encryption scheme was presented by Boneh and Waters [4]. To improve the search efficiency, Zhang et al. proposed a keyword field-free PECK scheme with less search overhead [5]. In the years that followed, many works aim to improve the efficiency of PECK [6, 7], which enhances the practicability of PECK. Nevertheless, the search complexities of the aforementioned schemes are all linear with the number of keywords in a corpus, which is not practical when the corpus is large.
Motivation and Objective. To better clarify the motivation of our scheme, we give Table 1 to show some recent works that are relevant to our work. According to Table 1, we find whether most SPE schemes [5–12] own additional abilities or do not utilize forward index structure, which suffers from linear search complexity. In this case, these schemes are impractical in reality applications, as the volume of data increases. To improve the search efficiency, recently, a PECK scheme using a hidden structure was proposed, which can achieve a sublinear search efficiency [13]. However, the time consumption of index building is proportional to the square of the number of keywords in each document. For the efficiency concern, it is necessary to construct a PECK scheme with high efficiency in the aspects of encryption and search.
In the field of information retrieval, index structures with sublinear search complexity, such as inverted and tree-based index structures, are very popular and have already been adopted in the domain of SSE [14–18]. Considering that the tree structure commonly owns a better search performance than the forward index structure, thus, this paper aims to construct a PECK scheme under a tree index structure to achieve a sublinear search efficiency without sacrificing encryption efficiency. For simplicity, we name this scheme as tree-based public encryption with conjunctive keyword search (T-PECK).
Contributions. The contributions of our scheme are listed as follows:(1)We design a set of keyword conversion methods, which are used to transform the keyword set in each document and a query into a group of vector representations. The semantic information embedded in these vector representations can be utilized to appraise the relationship between documents and queries.(2)Based on the above conversion methods, a tree building algorithm is proposed to create a tree index for all documents. Correspondingly, a tree search algorithm is also given to realize conjunctive keyword search with a sublinear search complexity.(3)Inspired by the definition of PECK proposed in [3], we give a detailed framework and security definition of T-PECK. Based on the framework, by taking advantage of an efficient predicate-only encryption supporting inner product scheme (PO-IPE) scheme introduced in [19] to encrypt the index tree, a concrete T-PECK scheme is proposed. Moreover, a rigorous proof is given to demonstrate the security of T-PECK according to the security definition.
In addition, to show the feasibility of our scheme, we also give a detailed experiment to verify the efficiency of our scheme. The experiment results given in Section 5 show that the search complexity of T-PECK is indeed sublinear with the number of the documents without sacrificing the encrypting efficiency, and T-PECK is more practical than recent PECK schemes [8, 13].
Related work. Generally speaking, SE can be categorized as SSE and SPE, where SSE and SPE are based on public key and symmetric key encryption systems, respectively. Although SPE is less efficient in terms of encryption and search than SSE as the use of public key cryptography, it allows more extensive application scenarios and better security properties.
The first SPE encryption solution supporting keyword search (PEKS), reckoned as a generalization of anonymous identity-based encryption (IBE) [20], was designed by Boneh et al. [2]. After this, Abdalla et al. gave definitions of computational and statistical consistency of PEKS and proposed a new PEKS scheme [21]. Considering that works in [2, 21] only realize a single keyword search, Park et al. first gave the first PECK scheme [3]. In their scheme, each keyword is affiliated with a keyword field. If two same keywords are associated with two different keyword field, they are reckoned as two different keywords. This property has limited it in applications of many scenarios. To achieve PECK without the keyword field, a hidden vector encryption (HVE) scheme was proposed by Boneh and Waters [4]. To support disjunctive keyword search, Katz et al. proposed a predicate encryption (PE) scheme supporting inner product [22], which is also called inner product encryption (IPE). In their scheme, the keyword sets of a document and a query can be converted to be an attribute and a predicate vector, respectively. By applying the PE scheme to encrypt these vectors, a PECK or a public key encryption with disjunctive keyword search (PEDK) can be obtained. In order to improve the query efficiency, some efficient PECK schemes [5, 6, 8] were proposed by reducing the bilinear pairing operations in the search process. To support conjunctive and disjunctive keyword search simultaneously, by adopting an efficient IPE scheme [23], Zhang et al. proposed a public key encryption with conjunctive and disjunctive keyword search (PECDK) scheme [24]. Then, they also presented a new PECDK scheme [25] which is more efficient than their previous scheme [24]. To support more expressive queries, Yang et al. devised a SPE scheme supporting various search functions, e.g., range search, conjunctive keyword search, and Boolean keyword search [7]. Although this scheme can support sophisticated query, the search efficiency of this scheme remains to be further improved. For efficiency concern, two SPE scheme [13, 26] supporting Boolean keyword search were proposed to improve the search efficiency. In addition, in recent years, some works are interested in other aspects of SPE, such as keyword guessing attack (KGA) [27–29], access control [30, 31], and fast search [32, 33]. These works profoundly improves the usability of SPE.
In the domain of SSE, the tree and inverted index structures are usually adopted in SSE schemes to achieve better search efficiency. The SSE schemes using inverted index were given in [17, 18]. By taking advantage of r-tree, kd-tree, and balanced binary tree, SSE schemes with the tree index structure were released in [14–16]. To the best of our knowledge, there is no PECK scheme based on the tree index structure so far. Thus, in this paper, to improve the search efficiency substantially, we attempt to construct a PECK scheme based on the balanced binary tree.
Organization. We organize this paper as follows. In Section 2, we present the framework of T-PECK and the security definition of T-PECK and introduce some backgrounds related to our scheme. The keyword conversion methods and the tree building algorithm are presented in Section 3. We construct T-PECK in Section 4, and a detailed security proof of T-PECK is also presented in this section. In Section 5, we give the theoretical and experimental analysis of T-PECK. Section 6 concludes our work.
2. Preliminary
In this section, we describe the application scenario and give the system model of the T-PECK scheme. Based on this system model, we present a formal framework and security definition of T-PECK. Besides, because PO-IPE scheme is adopted as the basic encryption module, we also introduce the bilinear pairing group and the definition of PO-IPE. To formulate our work mathematically, Table 2 shows notations that are adopted in this paper.
2.1. The Proposed T-PECK Model
In T-PECK scheme, there are three roles: data owners (DOs), data user (DU), and cloud server (CS). DU first generates a pair of keys for which the public key () is open to the public and the secret key () can be only obtained by DU. When one DO wants to send a group of documents to DU, he/she will encrypt these documents and send the encrypted documents along with an encrypted index tree. Note that this encrypted index tree is generated by using a group of keyword set derived from these documents and . When DU wants to retrieve documents containing a specific list of keywords, he/she makes use of and query keywords to construct a trapdoor and sends the trapdoor to CS. Once the trapdoor is received, CS tests the encrypted index against the trapdoor and returns the matched documents to DU. The system model is illustrated in Figure 1. As shown in Figure 1, DO builds the secure index and sends it to the CS. When CS receives the trapdoor generated by DU, it performs the search algorithm and returns the query results to DO. According to this model, the definition of the framework of T-PECK is given as follows.

Definition 1. The T-PECK scheme consists of five probabilistic polynomial time (PPT) algorithms (KeyGen, CreateTree, IndexBuild, Trapdoor, and Search):(1)KeyGen (): given a security parameter , this algorithm generates the public key  and the secret key (2)CreateTree (): given a dataset  in which  is the corresponding keyword set of document  and , the algorithm outputs a plaintext tree (3)IndexBuild (): this algorithm builds a searchable encrypted index tree  by utilizing the public key  and a plaintext tree (4)Trapdoor (): given the secret key  and a conjunctive keyword query , where  is a query keyword and , this algorithm produces a trapdoor (5)Search (, , ): this algorithm takes , , and  as input and returns a group of documents by searching the index treeCorrectness property. For a conjunctive keyword query  and a group of keyword sets , for correctly generating , , , and , if , the algorithm  outputs the document . Otherwise, it outputs  with negligible probability.
Actually, in the T-PECK scheme, the index tree  and the document set  are encrypted by  and , respectively, where  is a secure symmetric encryption scheme, e.g., AES-CBC. Moreover, in practice, the  algorithm only returns the identifiers of matched documents, and then, DU will make use of these identifiers to look up the corresponding ciphertexts. Thus, like other related work, our work only concentrates on searchable encryption part.
2.2. Security Definition of T-PECK
Before presenting the formal security definition of T-PECK, based on four kinds of private leaks, we first define a leakage function under four patterns introduced in [34], which is inevitably exposed to the cloud server.
Size Pattern. The size of encrypted documents and queries can be accessed by the cloud server as they are stored in the cloud. This leakage is denoted as the leak of size pattern.
Access Pattern. The cloud server can obtain the relationship between a specific query and identifiers of matched documents, which is called the leakage of access pattern.
Search Pattern. Given a document set and a query set , a matrix can be constructed by the cloud server. In this matrix, if matches the query , the element in the th row and th column is set to be 1. The search pattern is the information of the matrix.
Path Pattern. For a specific query, when the search algorithm traverses in the tree, the path will be revealed to the cloud server. This path information can be regarded as the leakage of the path pattern.
These leakage patterns are default in most searchable encryption schemes [5–8]. Actually, the technique called oblivious RAMs can be adopted to protect the access pattern and the search pattern, but it suffers from the problem of low efficiency in the actual applications. Thus, we do not consider these leakages in this paper.
Similar to the definition presented in [3], for two challenge index trees and based on two datasets and , we propose a security definition which demands that the encrypted index tree of and that of are computationally indistinguishable for any probabilistic polynomial time adversary A. Given a leakage function in which is an index tree of a dataset and is a keyword query, the detailed security definition is given as follows.
Definition 2. For any probabilistic polynomial time (PPT) adversary , using a security parameter , A T-PECK scheme can be seen as semantical security against chosen plaintext attacks if ’s advantage is negligible under the following game.(1)Setup: the challenger utilizes KeyGen () to create and and sends to .(2)Phase 1: for any query that wants to perform, adaptively ask for the trapdoor of .(3)Challenge: let be keyword queries that are performed in Phase 1. Two challenge index trees and are chosen and sent to under the restriction that for each . After this, selects a random bit , builds an encrypted index tree by using , and sends to .(4)Phase 2: for any keyword query , can continue to request the trapdoor under the restriction .(5)Response: outputs . If , wins the game.According the above game, the advantage of is defined as
2.3. The Concept of PO-IPE Scheme
2.3.1. Framework of PO-IPE
The original definition of PO-IPE was presented in [19]. In this definition, a ciphertext is associated with an attribute vector , and a key is associated with a predicate vector correspondingly. The significant property of PO-IPE is that a ciphertext of can be decrypted by a key of if and only if . More explicitly, the framework of PO-IPE is introduced in the following definition.
Definition 3.  and  are represented as an arbitrary set of attributes and predicated on , respectively. There are four PPT algorithms in a PO-IPE scheme with  and , i.e.,(1)Setup: this algorithm inputs a security parameter  and outputs  and master secret key ()(2)KeyGen: given  and a predicate vector  as input, this algorithm outputs the secret key  of (3)Enc: this encryption algorithm uses  to encrypt an attribute vector  and outputs the ciphertext (4)Dec. taking , , and  as input, this decryption algorithm outputs 1 or 0 eitherConsistency of PO-IPE. For correctly generating , , and , for all  and , the decryption algorithm  outputs 1 if , otherwise 0.
According to the definitions of T-PECK and PO-IPE, by converting the index and query keyword sets into an attribute vector  and a predicate vector , respectively, we can take advantage of a PO-IPE scheme to create our scheme. The key to realizing the goal of conjunctive keyword search is constructing a keyword conversion method to change the query and index keyword sets into vectors.
2.3.2. Security Definition of PO-IPE
The security of T-PECK depends on that of PO-IPE since PO-IPE is the basis of T-PECK. To present a detailed proof for the security of T-PECK, the security definition of PO-IPE given in [19] is described as follows.
Definition 4. Given a security parameter , for any PPT adversary , if ’s advantage is negligible in the game described below, we say that a PO-IPE scheme is semantically secure against chosen plaintext attacks.(1)Setup: creates and by utilizing Setup () and sends to .(2)Phase 1: can ask for predicate vectors adaptively and obtain corresponding keys , , , .(3)Challenge: two challenge attribute vectors are selected randomly by under the following constraints. For secret keys , , , requested in Phase 1, one of the following constraints should be satisfied. and . and , where . randomly chooses a bit and sends the ciphertext to .(4)Phase 2: can continue to request keys , , , , for other predicate vectors, , , , . These keys must comply with the restriction given in the challenge phase.(5)Response: a bit is given by . If , wins the game.’s advantage under the above game is described as
3. Construction of Index Tree
In this section, we introduce three methods related to building the index tree. The first method is a keyword conversion method which changes document and query keywords into an attribute vector and a predicate vector, respectively. This conversion method is used to construct leaf nodes. The second method aims to create “0-1” vectors of internal nodes of the index tree. Through randomly partitioning all the keywords in the corpus into a set of clusters, we devise a “converting and merging” algorithm to create “0-1” vectors based on these clusters. The third method is a recursive algorithm, which elaborately combines the first and second methods to create the index tree.
3.1. Keywords’ Conversion Method
We assume that any keyword can be expressed as and define a function . is collision resistance since is a prime larger than the number of words in a dictionary. That is to say, if , then , where and are two different keywords. For a document keyword set and a query , the approach that converts and into an attribute vector and a predicate vector, respectively, is described as follows:(1)For the keyword set , we construct a function: Based on coefficients of , the vector representation of is now developed.(2)Similarly, for a query , we can construct a vector , where and . is the vector representation of .(3)It is noteworthy that if there is , we can verify that . This property can be used to test whether is a subset of .
As a result, we can convert the keyword set of each document into an attribute vector and transform each query into a predicate vector by adopting the keyword conversion method. These vectors will be used to create the leaf nodes.
3.2. Construction of “0-1” Vectors
Suppose that all the keywords in a corpus is ; we randomly split into a group of document sets , where , , , and . For document sets, , we give the method of how to use to construct a vector representing for an internal node or a query as follows.
For each document along with a keyword set, , the algorithm for converting a document into a “0-1” vector is described in Algorithm 1. As shown in Algorithm 1, given a document, we first create a zero vector whose length is . Then, for each keyword, , if , the value of the th dimension of is set to be 1, where and . Finally, the value of the last dimension is set to be 1. For example, if and , suppose that , , and , and the vector of is .
For a query along with a keyword set, , the algorithm for converting a query into a “0-1” vector is described in Algorithm 2. As shown in Algorithm 2, for the query , we first create a zero vector whose length is . Then, for each keyword , if , the value of the th dimension of is increased by one, where and . At last, the last dimension’s value is set to be . For example, if and , suppose that and , the vector of is . It can be verified that if .
Moreover, taking two “0-1” vectors as input, we also give a merging algorithm to generate a new “0-1” vector, which is very useful for creating an internal node. The merging algorithm is described in Algorithm 3. Suppose that is initialized to be a zero vector; the essence of Algorithm 3 is setting = 1 if either or equals 1, where . For example, if and , the output of Algorithm 3 is .
Note that, the “0-1” vector for each document is only used in “CreateTree” algorithm and discarded when the “CreateTree” algorithm is finished.
| 
 | ||||||||||||||||||||||||
| 
 | ||||||||||||||||||||||||
| 
 | ||||||||||||||||||||
3.3. Tree Building Algorithm
The tree building algorithm mainly consists of two steps. The first one is to create the leaf node for each document. Each leaf node has two vectors: one is an attribute vector obtained by using keyword conversion method; the other is a “0-1” vector generated by using Algorithm 1. The second step is to create the internal node in a “bottom-up” manner. The internal node contains one “0-1” vector generated by using Algorithm 3. The detailed algorithms are presented in Algorithm 4 and 5.
| 
 | ||||||||||||||
| 
 | ||||||||||||||||||||||||||||||||||||||||
Algorithm 5 is for building an index tree and declared by BuildTree (CurrentNodeSet).
A formal definition of the data structure of a tree node is , where is ’s identity, and are the representation vectors of , and are two pointers that point ’s left and right children, respectively, and is the identity of a document when is a leaf node. Let be a function that can generate a unique ID for each node. According to Algorithm 4, for the leaf node associated with the document , the algorithm runs to obtain an unique ID for the node and sets the value of FID as the identifier of . Since there is no child for the leaf node, both and are set to be . Based on the keyword set of , and are generated by utilizing the keyword converting algorithm and Algorithm 1, respectively.
Based on all leaf nodes generated by Algorithm 4, the tree building algorithm is presented in Algorithm 5. involves a group of nodes without a parent node. Let be the number of nodes in . If equals 1, this means that the sole node in is the root of the index tree. Otherwise, the internal nodes should be generated by using each pair of nodes in . Concretely, is initialized as an empty set which is used to store the newly generated nodes. Suppose that and are a pair of nodes in ; a parent node of these two nodes is created as follows. ID of is generated by using the function . and point to the nodes and , respectively. Based on the “0-1” vectors of and , is generated by utilizing Algorithm 3. Both and are set to be since is an internal node. Once the node is created, it will be added to . Note that if is not even, the last node in is added to straightforwardly. Through calling Algorithm 5 recursively, the plaintext index tree can be built.
Example 1. An example of building an index tree is illustrated in Figure 2. From Figure 2, an index tree of is given, where the dimension of the “0-1” vector for each node is 6. There are three steps for building the tree. The first step is changing the keyword set of each document into an attribute vector (denoted by ) and a “0-1” vector (denoted by a sequence of “0-1”) by using the keyword conversion method and Algorithm 1, respectively. After this step, each document is transformed into a leaf node with an attribute vector and a “0-1” vector. The second step is building the index tree based on the “0-1”vectors of the leaf nodes in a bottom-up manner by taking advantage of Algorithm 4 and 5. After this step, the index tree has been established, and each internal node in the tree owns a “0-1” vector. The third step is deleting all “0-1” vectors in the leaf nodes. The reason why we delete these vectors is that the “0-1” vectors contained in leaf nodes are only used to construct internal nodes of the index tree. In the search process, the leaf node only utilizes the attribute vector to perform keywords matching test. This step can reduce the storage cost of the index tree.

4. The Proposed Scheme
In this section, we give a T-PECK scheme with sublinear search complexity based on the aforementioned algorithms. After this concrete construction, a detailed proof is given to demonstrate the security of T-PECK based on the security definition.
4.1. Construction of T-PECK
According to Definition 3, we set , , , and as four algorithms in PO-IPE, where and are the public key and master secret key, respectively, and are the attribute vector and predicate vector, respectively, and and are the ciphertext and secret key generated by utilizing and , respectively. According to the PO-IPE scheme [19], we construct our T-PECK scheme as follows. KeyGen (): given a security parameter , using the algorithm, this algorithm generates and and then sets and in which is given to data owner and is given to the data user. CreateTree (): given a group of keyword sets in which each keyword set is associated with a document , this algorithm builds the index tree according to Algorithm 4 and 5. IndexBuild (,): in order to encrypt the index tree , a subalgorithm “encryptNode (, )” is presented in Algorithm 6, where is a tree node. As shown in Algorithm 6, if is a leaf node of , “encryptNode” calls to generate and replaces with . Otherwise, it calls to generate and replaces with and then recursively traverses its children nodes. Based on Algorithm 6, the algorithm calls (, ) to generate the encrypted index , where is the root node of , and uploads to the cloud server. Trapdoor (,,): given a keyword query , the algorithm first generates a predicate vector and a “0-1” vector by utilizing the keyword conversion approach and Algorithm 2, respectively. Then, it generates and . Finally, it sends the trapdoor of the query to the cloud server. Search (,,): in order to search the encrypted index tree , a subalgorithm “searchNode (, , , )” is given in Algorithm 7, where is a tree node and is used to store the search result. Concretely, for an internal node , “searchNode” computes . If , it continues to search the children nodes of in a recursive way; Otherwise, it stops. For a leaf node , the algorithm computes . If , it adds the of this node in . Based on Algorithm 7, the “Search” algorithm first initializes an empty set , then calls (, , , and ) to obtain all documents matched to the query , where is the root node of , and finally outputs to the data user.
Example 2. We also give Figure 3 to illustrate the search process. From Figure 3, we convert the query into a “0-1” vector and a predicate vector by using Algorithm 2 and the keyword conversion method, respectively. Based on the index tree given in example 1, the search process begins with the root node and first reaches to leaf nodes and since “0-1” vectors of and are both matching the “0-1” vector of the query . Then, the search algorithm computes values of and and adds to the result list due to and . After that, the search algorithm checks the internal node . Since the “0-1” vector of the query fails to match that of the node , the children of will not be reached. Finally, the algorithm outputs .

| 
 | ||||||||||||||||||||||||||
| 
 | ||||||||||||||||||||||||||||||||||||
4.2. Security Proof
Our T-PECK scheme is constructed over the PO-IPE scheme. Thus, T-PECK’s security is guaranteed by the fully secure PO-IPE scheme. To demonstrate the security of T-PECK, we give Proposition 1.
Proposition 1. If the PO-IPE scheme is semantically secure against chosen plaintext attacks, our T-PECK scheme is -semantically secure against chosen plaintext attacks, where is the leakage function given in Section 2.2.
Proof. We can argue that a PPT algorithm can break the PO-IPE scheme if can break the T-PECK scheme. The proof process is given as follows.(1)Setup: the challenger utilizes the algorithm to generate and and sets and , where and are the public key and secret key of T-PECK.(2)Phase 1: can adaptively request trapdoors of queries . For each query , the challenger first generates two vectors and and then uses the algorithm to generate the trapdoor in which and , where . Note that, each trapdoor can be seen as two decryption keys of PO-IPE.(3)Challenge: after phase 1, generates two plaintext trees and under the restriction that , where , and sends these two trees to . Note that, can select two leaf nodes and from and , respectively, as two challenge nodes of PO-IPE. After receiving them, flipping a coin , generates an index tree by running and sends it to . According to the algorithm, we know that each node in this encrypted index tree is a ciphertext of PO-IPE. Thus, we can use the encrypted vector of in as the challenge ciphertext of PO-IPE.(4)Phase 2: can continue to ask for trapdoors under the restriction described above. Since each trapdoor is generated by using algorithm, these trapdoors can still be reckoned as a set of decryption keys of PO-IPE.(5)Response: outputs a guess . This guess is used as the guess of the security game of PO-IPE.According to the above game, we can find that the security game in T-PECK is consistent to that in PO-IPE. If this is a PPT adversary which can distinguish two encrypted index trees of the T-PECK, we can use to distinguish two encrypted vectors of PO-IPE. Thus, we can state that our T-PECK is secure if PO-IPE is secure.
4.3. Dynamic Update Operations
Previous PECK schemes usually utilized forward index structure, which enables dynamic update for documents inherently. Considering that dynamic update is crucial for the usability of PECK, it is necessary to enable the T-PECK scheme to support some dynamic operations such as document insertion, deletion, and modification. Since utilizing the tree-based index structure, we realize update operations by updating the tree’s nodes. Inspired by the update approach given in [16], we devise the update algorithm as follows:(1)Deletion: if one data owner wants to delete a document from the index tree, he/she first finds the leaf node related to and sets this leaf node as a fake one rather than deleting it. Then, he/she encrypts this fake node and sends it to the cloud server along with the location information of this leaf node. When the cloud server receives this node, it can perform the deletion operation by replacing the target leaf node with the fake one.(2)Insertion: if a data owner wants to insert a document to the tree, he/she first creates a leaf node for by using Algorithm 4. Then, the data owner will find a fake leaf node and substitute the fake node with the new leaf node. After this, according to the leaf node, the data owner updates all internal nodes on the path from the root node to the new leaf node. Finally, the data owner encrypts the leaf node and the corresponding internal nodes and then sends them along with the position information to the cloud server. When the cloud server receives these nodes, it realizes the insertion operation by replacing these nodes according to the position information. In addition, if there is no fake leaf node that can be replaced, the data owner will generate a new encrypted index tree with many fake leaf nodes and one target leaf node. After receiving this index tree, the cloud server will combine these two index trees as a new one.(3)Modification: if the data owner wants to modify a leaf node, he/she can delete this leaf node first and then insert a new leaf node according to the modification information.
The keywords in the dictionary will be changed after a period of documents insertion and deletion. Fortunately, this situation does not have much impact on the T-PECK scheme. For the leaf node of a document , the attribute vector of this leaf node is constructed by using the keyword set of , which means that the keyword conversion method can still work well even contains some new keywords. For the “0-1” vector of an internal node, we first review the process of creating a “0-1” vector. The dictionary is divided into a set of keyword sets, and each keyword set is associated with one dimension of the “0-1” vector. When a new keyword is added to the dictionary, we can put this keyword into any keyword set, which will not affect the “0-1” vectors generated before. For example, suppose that the dictionary is and is divided into three sets , , and and the new keyword sets are , , and after adding a keyword in the dictionary. Note that the “0-1” vectors for are both “101” before and after adding . Thus, according to the above analysis, the dynamic update method of our scheme is practicable.
5. Performance Evaluation
We will analyze the theoretical and experimental performance of our scheme in this section by comparing other recent PECK schemes.
5.1. Theoretical Analysis
The theoretical analysis of the proposed T-PECK scheme is evaluated and compared with the previous PECK schemes in terms of space and computation complexity. We choose four representative PECK schemes proposed recently for comparison. For simplicity, we denote these four PECK schemes introduced in [8, 10, 13] by SPE-CKS, SPE-SMKS, SA-SCF-PECKS, and PMSEHS, respectively. For clarity, we define three important parameters associated with these PECK schemes. The first parameter is the number of keywords in an index, denoted by ; the second one is the number of keywords in a query, denoted by ; the third one is the number of documents in a corpus, denoted by . In addition, for PMSEHS, since the search process only accesses the documents containing the first keyword in the query, we denote the number of documents matching the first keyword in the query by . For T-PECK, we denote the number of keyword sets by , and the number of documents whose “0-1” vector matches the query vector by since only these documents will be verified in the search process. According to these denotations, we present Table 3 to show the comparison of time and space complexities among T-PECK and other previous PECK schemes.
As shown in Table 3, the time complexities of search in SPE-CKS, SPE-SMKS, and SA-SCF-PECKS are all linear with since these three schemes adopt the forward index, while that in our scheme is linear with due to utilizing tree index structure. According to the tree building algorithm (Algorithm 5), we can evaluate that the search algorithm needs at most access nodes in the index tree. Since the search algorithm requires the pairing operations three times for each node, T-PECK will perform pairing operations at most times. Because is a large number and commonly much bigger than and , we can reckon that the proposed scheme is more efficient than SPE-CKS, SPE-SMKS, and SA-SCF-PECKS in the search process. In addition, because the tree index structure contains many internal nodes to accelerate the search process, the time and storage consumption of our scheme is more than that in SPE-CKS, SPE-SMKS, and SA-SCF-PECKS.
Table 3 also shows that the time complexity of search in PMSEHS is linear with since it utilizes a hidden structure to build the index. Compared with PMSEHS, the T-PECK scheme needs less search time when . According to the definition of , , and , we know our scheme is more efficient when the term frequency of query keywords is high and is relatively small. Moreover, PMSEHS needs much more time and storage overhead for index generation and storage than T-PECK since the hidden structure requires many group elements of to speed up the query process. As shown in Table 3, the time complexity of index building and the space complexity of index storage in PMSEHS are both linear with while that in T-PECK are both linear with . Thus, we argue that our scheme can achieve a sublinear search complexity without sacrificing the index generation time and storage space.
5.2. Experimental Results
Based on Java Pairing-Based Cryptography (JPBC) library [36], our scheme is implemented on an environment in which the CPU is Intel (R) Core (TM) i7-4570 at 3.60 GHz and the memory size is 16 GB. A type pair is used to realize the bilinear map in our scheme. The base field size of this pair is 128 bits, and the security level of this pair is equivalent to 1024 bits of DLOG [36]. The real-world corpus that our experiment adopts is an e-mail dataset named by Enron [37]. To quantify the efficiency of our scheme, we mainly focus on three parameters related to PECK in the experiment. These three parameters are , , and mentioned in Section 5.1. To demonstrate the advantages of T-PECK, two previous PECK schemes, that is, SPE-CKS and PMSEHS, are compared with our scheme. The reason for choosing these two schemes is that SPE-CKS adopts a forward index, while PMSEHS utilizes a hidden structure. The comparison experiment involves the time overhead of index building, trapdoor generation, and search.
5.2.1. Impact of the Number of Keywords in a Document () on Performance
For a query with 6 keywords () and a corpus with 300 documents (), as increases, Figure 4 shows the following facts.(1)The execution time of index building in PMSEHS far exceeds than that in SPE-CKS and T-PECK since PMSEHS performs lots of pairing operations for accelerating the search process. T-PECK needs more index building time than SPE-CKS since requiring encrypting internal nodes.(2)The trapdoor generation time in PMSEHS is independent of and is better than other two scheme for . T-PECK needs less time for trapdoor generation than SPE-CKS since it requires less time for exponentiation computation over group elements.(3)The search complexity in SPE-CKS is linear with while other two schemes are sublinear with . As increases, the search time of PMSEHS and T-PECK grows slightly due to the growth of parameters and . In addition, the reason why T-PECK needs more time than PMSEHS in search phase is that grows faster than as increases.

(a)

(b)

(c)
5.2.2. Impact of the Number of Documents in a Corpus () on Performance
According to the analysis in Section 5.1, the time cost for index building in PMSEHS, SPE-CKS, and T-PECK is linear with , which is confirmed by the experiment result. PMSEHS costs more time for index building due to requiring more pairing operations for encrypting documents. The time consumption of T-PECK is slightly more than that of SPE-CKS in index building phase since T-PECK needs encrypt extra internal nodes. As shown in Figure 5, we can find that the time cost of trapdoor generation of these three schemes is independent with . Moreover, Figure 5 also shows that the search complexity in SPE-CKS is linear with while that in PMSEHS and T-PECK are both sublinear with . The search time in T-PECK is more than that in PMSEHS, since T-PECK requires more pairing operations than PMSEHS as increases, which is identical to our theoretical analysis.

(a)

(b)

(c)
5.2.3. Impact of the Number of Keywords in a Query () on Performance
According to the analysis in Section 5.1, the parameter only impacts trapdoor generation and testing. For an index with 60 keywords (), as shown in Figure 6, the time overhead of trapdoor generation in PMSEHS is linearly with , while that in SPE-CKS and T-PECK is independent with . Moreover, as expected, the time consumption of search in SPE-CKS is independent with while that in PMSEHS is linear with . As increases, T-PECK has better search performance than PMSEHS since the number of documents, whose “0-1” vectors match the query (), is reduced. So, we can say that T-PECK is more efficient than PMSEHS when is large.

(a)

(b)
5.3. More Discussion
According to experimental results, when , the time consumption of search in T-PECK is 41 s while that in SPE-CKS is 162 s. As compensation, the time overhead of index building in T-PECK is 460 s while that in SPE-CKS is 265 s. According to this result, we can argue that the search performance of our scheme is better than that of SPE-CKS without sacrificing the time complexity of index building. Compared with PMSEHS, when , the time cost of index building in T-PECK is 460 s while that in PMSEHS is 9538 s. Accordingly, our scheme costs twice as much search time as PMSEHS. Thus, our scheme requires much less time cost in the index building process while ensuring the sublinear search efficiency. In practice, the index building algorithm is usually performed by data owners while the search algorithm is run by the cloud server. Thus, considering that the cloud server owns much more computing and storage resources than data owner, we reckon that it is worth to sacrifice a little query efficiency to reduce time and space costs of index building and storage.
To summarize, it is clear that our scheme maintains a high query efficiency without increasing the time cost of index generation too much. Considering the fact that our scheme holds a good trade-off between query and index generation, we reckon that T-PECK is practicable in applications in which data users use resource-constrained mobile devices.
6. Conclusion
In this paper, we proposed a novel algorithm for building an index tree. Through elaborately combining the index tree and an efficient PO-IPE scheme, we proposed a PECK scheme based on a tree-based index structure. The search efficiency in the proposed scheme is sublinear with the number of documents, and our scheme is proven to be -semantically secure against chosen plaintext attacks.
To evaluate the efficiency of T-PECK, a detailed theoretical and experimental analysis is proposed. This analysis shows that compared with previous PECK schemes, T-PECK is more practical such as requiring less time for index building and search. In real-world applications, the query of data users is usually more complex than conjunctive keyword search, such as Boolean keyword search, fuzzy search, and range search. Thus, it is necessary to build a tree-based SPE scheme supporting more expressive search function.
Data Availability
The data used to support the findings of this study are available from “http://www.cs.cmu.edu/./enron/.”
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant nos. 61972090 and 31872704), Natural Science Foundation of Henan (Grant no. 202300410339), and Nanhu Scholars Program for Young Scholars of XYNU.