Abstract
Encrypted file deduplication scheme (EFD) can improve its storage space utilization of cloud storage and protect the privacy of files in cloud storage. However, if an enterprise stores its files to cloud storage that has deployed an encrypted file deduplication scheme that does not support permission checking, this will destroy the permission of the enterprise files and bring some security problems. This seriously affects the practical value of EFD and prevents it from deploying in concrete cloud storage. To resolve this problem, we propose an encrypted file deduplication scheme with permission (EFDSP) and construct the EFDSP by using the hidden vector encryption (HVE). We have analyzed the security of EFDSP. The results have shown that EFDSP is secure and it can prevent the online deduplication oracle attack. We implement EFDSP and conduct the performance evaluation. The results show that the performance of EFDSP is little inferior to that of SADS, which is the only existing encrypted file deduplication scheme with permission, but the performance gap decreases with the increasing number of the authorized users and EFDSP has overcome the security weakness of SADS.
1. Introduction
1.1. Motivation
Recently, with the rapid development of network storage technology, cloud storage has become an important storage scheme. Owing to the rental cost lowness, outsourcing files of an enterprise to cloud storage can reduce its enterprise management costs and improve its competitiveness. To prevent files from information leakage, an enterprise user usually stores its files to cloud storage in an encrypted form. Encrypted file deduplication scheme can save its storage space and network bandwidth of cloud storage and improve its performance. However, in the enterprise application environment, different department employees have different permissions. Each employee can only access the files according to its permission. If an encrypted file deduplication scheme does not support permission checking, it will destroy the file permissions and bring some security problems. Li et al. proposed a secure authorized deduplication scheme based on a hybrid cloud (SADS) [1]. They introduce a private cloud in SADS to preserve the user permissions and generate a permission tag for a user when it uploads a file. When the cloud storage performs the deduplication checking for a user, it needs to check the deduplication permission for the user, and if the user does not have the deduplication permission, the user needs to upload the file even though there exists the same file in the cloud storage. Only when the user has the deduplication permission and there exists the same file in the cloud storage can the cloud storage perform file deduplication. The use of SADS can achieve the encrypted file deduplication, but there exist three shortcomings in SADS:(i)Firstly, each permission is represented by a private key. If a user has multiple permissions, it needs to store multiple private keys secretly which can cause a great deal of trouble in the user key management.(ii)Secondly, when uploads a file or queries the duplication file of , the scheme needs to use permission keys to generate encrypted file tags for (If has been assigned permissions). So the scheme causes large network traffic.(iii)Thirdly, there exists a security weakness in SADS. Assuming Mike is an enterprise manager who manages department and department . Mike has the permissions of department and department . At the same time, Mike is responsible for the financial department, so he also has the finance department permission. If a cloud storage uses SADS to deduplicate the files in the cloud storage, SADS uses the private keys of department , department , and the finance department to generate three encrypted file tags. As a result, the staffs in department and department have the permission to deduplicate their files with the payslip file. Suppose Mike has uploaded Alice’s payslip file to the cloud storage, if both Bob and Alice are employees of department . Bob wants to get the salary information of Alice. He can use the following steps (called online deduplication oracle attack) to attack SADS to obtain the salary information of Alice:(a)Bob first forges Alice’s payslip file . is a kind of small entropy file and it has a fixed format. Bob knows the file format or he even has the kind of file, i.e., he has his own payslip. At the same time, he also knows that Alice’s salary should be between and and he just does not know the concrete salary value of Alice. So Bob can set the salary value to , respectively, and generate files .(b)Bob uploads to the cloud storage, respectively. If the cloud storage deduplicates the file when he uploads a file to the cloud storage, Bob knows that the salary of Alice is the data in the uploaded file .
Obviously, the success reason for the attack is the authorization precision of SADS which is rough. When Mike generates an encrypted file tag, it has assigned the file deduplication permission to Bob and causes the file permission checking bypass. At the same time, when the cloud storage checks the file deduplication, it only checks whether the encrypted file query tag of the upload file matches the encrypted file tags stored in the cloud storage owner and does not check the user’s permission. Therefore, we want to design a securely encrypted file deduplication scheme with permission to improve the file deduplication permission check of the user and avoid the security issues of SADS.
1.2. Our Contributions
In this work, we study the problem on how to enable cloud storage to deduplicate a user encrypted file without destroying its file permission. We propose permission vector and permission relation, use permission vector to represent the user permissions, and use permission relation to compare the permission level between two users. We design an encrypted file deduplication scheme with permission, which has overcome the security weakness of SADS. In EFDSP, the file owner enables the cloud storage to perform deduplication when other users with the same or high permission level upload the duplication files to the cloud storage. Our contribution can be summarized as follows:(i)Firstly, we discover a security weakness of SADS and propose an attack method against this scheme for small entropy files.(ii)Secondly, we propose an encrypted file deduplication scheme with permission, which enables cloud storage to deduplicate the encrypted files without destroying the file permission. In EFDSP, a user with low permission level needs to upload the file even though there exists a duplication file in the cloud storage. EFDSP can prevent the online deduplication attack and overcome the security weakness of SADS.(iii)Thirdly, we define permission vector and permission relation and use permission vector, permission relation, and hidden vector encryption to construct EFDSP.(iv)Fourthly, we implement our scheme and conduct a performance evaluation, and the results demonstrate that our scheme is reasonable.
The paper is organized as follows. In Section 2, we present some preliminary knowledge. In Section 3, we describe and give the definition about the problem and define the encrypted file deduplication with permission. The permission vector and permission relation are defined in Section 4. In Section 5, we construct the encrypted file deduplication scheme with permission. In Section 6, we optimize EFDSP. In Section 7, we give some security analyses of EFDSP. In Section 8, we implement our scheme and conduct a performance evaluation, the evaluation results are presented here. In Section 9, we discuss related works. Finally, some conclusions are given in Section 10.
2. Preliminary
2.1. Bilinear Pairing
Definition 1. , , and are three multiplicative cyclic groups with prime number order , and and are the generators of and , respectively. A bilinear pairing is a surjective function of the following properties:(i)Bilinearity: for all and all we have and .(ii)Nondegeneration: , where is the identical element of .(iii)Computability: for all , there exists an efficient algorithm that can compute .
If , we call an asymmetric bilinear pairing; otherwise, if , we call a symmetrical bilinear pairing. According to Definition 1, we can get Proposition 2 easily.
Proposition 2. Let and . Then .
2.2. Hidden Vector Encryption
Hidden vector encryption (HVE) was first proposed by Doneh and Waters [2]. Subsequently, Katz [3] and Park [4] proposed some HVE schemes, respectively. HVE is a kind of predicate encryption, which has two attribute vectors associated with the ciphertext and the tag. Only when the two vectors are equal does the ciphertext match the tag. There are two character sets and in HVE, where and is a wildcard. If a vector of a component is , it means that it does not participate in any of the attributes. HVE is mainly composed of four algorithms: key generation, data encryption, tag generation, and data query.(i)In the key generation phase, the trusted authority (TA) assigns a public/private key pair to a receiver.(ii)In the data encryption phase, the user selects a vector to describe its data and also uses the receiver’s public key to encrypt the data to obtain the ciphertext .(iii)In the tag generation phase, the receiver first selects a vector to represent the query requirement and then uses its private key to generate a query tag . Finally, the receiver sends to the server.(iv)In the data query phase, if matches , it outputs , which is the plaintext of . The matching condition is defined as follows: let be the subscript set that is not , where is a vector component of . For two vectors and , let be the equality predicate that satisfies (1).
3. Problem and Definition
3.1. The System Model
In order to facilitate the enterprise management, we need to introduce a permission server (PS) to manage the user permission. At the same time, we need to introduce a key generation server (KGS) to generate an encryption key for the upload file. After introducing PS and KGS, the system model of cloud storage is shown in Figure 1. It consists of four different kinds of entities: some users, a cloud storage, a permission server, and a key generation server. The permission server and the key generation server are deployed in the enterprise domain, which are absolutely secure. The cloud storage (CS) checks whether there exists a duplication file in the cloud storage and checks whether the user has the permission to deduplicate the file. If both conditions met, the user does not need to upload the file, and the cloud storage server provides it with a file pointer; otherwise, the user needs to upload this file.

When the system is initialized, the system administrator gives the user access permission according to its permission level. The system administrator can use the role-based method [5] to assign the permission to the user; that is, it assigns the permission to the user based on the role of the user. Suppose an IT company has only three types of employees: manager, project leader, and engineer; if a user is assigned the permission of the manager, then can access any file that its access role is the manager. Each file in the cloud storage has a file permission tag to describe its permission, only when other users with the same permission upload a duplication file can the cloud storage perform the deduplication.
Cloud storage provides its users with the data storage service. To reduce its storage costs, CS only stores one unique file by using cross-user file deduplication to eliminate the redundant files in its server. PS and KGS are deployed in the enterprise secure domain, which are absolutely secure. PS is responsible for the user permission management and the file permission query, and it assists CS to perform the file permission checking and the file deduplication. KGS is responsible for generating an encryption key for the user. When a user needs to store a file to CS, it needs to interact with KGS and gets an encryption key from KGS for the file.
3.2. Problem Formalization
In this work, we study the problem on how to enable the cloud storage to deduplicate the user encrypted file without destroying the file permissions. That is to say, we study the problem on how to enable the file owner to allow the cloud storage to perform deduplication when other users with the same or high permission level upload a duplication file to the cloud storage. We can formalize the problem as follows.
When a user, say , wants to upload a file to CS, it first interacts with KGS to get the encryption key for , then it interacts with PS. PS uses , the permission level of and its private key to generate for , where is a permission query tag of . After receiving , sends to CS to query whether there exists the encrypted file in the cloud storage. If there exists in the cloud storage, does not need to upload , and it only needs to store ; otherwise, first encrypts using to get , then uses and , where is the public key of PS, to generate the encrypted file tag for . Finally, sends and to the cloud storage.
3.3. The EFDSP Scheme
In order to solve the problem that we have formalized in Section 3.2, we design an encrypted file duplication scheme with permission.
Definition 3 (EFDSP). An encrypted file duplication scheme with permission is a tuple of algorithms as follows:(i): it takes the security parameter as input and outputs the public parameter .(ii): it takes as input and outputs , which is an identity-based key pair of CS, , which is a private key/public key pair of PS, and , which is a signature/verification key pair of PS.(iii): this algorithm is run by KGS, and it takes , which is the hash value of the user file , as input and outputs a file encryption key for .(iv): this algorithm is run by , and it takes , , and as input and generates an encrypted file tag as output. is the permission level of the user. is the public key of PS.(v): this algorithm is run by PS, and it takes , , , and as input and outputs and . is the hash value of , is the permission level of , is the private key of PS, and is the signature private key of PS.(vi)FileQueryTagQueryAndFileDeduplication: this algorithm is run by CS, and it takes , , and as input and outputs and . CS uses to match some encrypted file . If it matches, let and let be the file pointer of , and add to the to the file entry of ; otherwise, let and assign NULL to .(vii): this algorithm is run by CS, and it takes , , and as input and outputs . is the hash value of , is the encrypted file of , and is the encrypted file tag of .(viii): this algorithm is run by , and it uses to encrypt to generate .(ix): this algorithm is run by , and it uses to decrypt to generate .(x): this algorithm is run by CS, and it uses to search the encrypted files in CS and returns its corresponding encrypted file .
The interaction process of EFDSP is described in Figure 2.

3.4. The Threat Model
Since PS is responsible for the user permission management and the file permission query and KGS is responsible for generating an encryption key for the user, we must assume that PS and KGS are absolutely secure and reliable. As CS performs the tasks assigned to it honestly and it is interested in the content of the user’s files and tries to get some secret information from these files, we can regard it as an honest and curious adversary [6]. Some users try to access the files beyond their permissions. At the same time, we assume that all files stored in the cloud storage are confidential; if there is information disclosure, it will result in a very large loss to the user. According to this assumption, there are two kinds of adversaries in the system.
External adversary: it tries to obtain secret information from the cloud storage or tries to access the file beyond its permission.
Internal adversary: it can access the cloud storage easily and try to get some secret information from the encrypted file tags or the query tags.
3.5. The Security Requirements
According to the threat model described in Section 3.4, there exist four security requirements as follows:
The confidentiality of the encrypted file tag: an unauthorized user, including the cloud storage server, cannot get the plaintext information from the encrypted file tags stored in the cloud storage server.
The unforgeability of the encrypted file query tag: an unauthorized user should be prevented from getting or generating the encrypted file query tags because it has no appropriate permission. It is not allowed to collude with the cloud storage server to destroy the unforgeability of the query tags.
The indistinguishability of the encrypted file query tag: a user cannot get any information from the query tags without querying the permission server, including the file content and the permissions.
The confidentiality of the file: a user who does not own the files cannot obtain the plaintext from the files stored in the cloud storage server; that is, an adversary cannot retrieve and restore files that do not belong to it.
4. The Permission Vector and the Permission Relation
In order to effectively represent the user permission, we define permission vector in this section.
Definition 4 (permission vector). Let be a collection of the system permission, to are the sequence numbers of the permissions in the system. Permission vector is a bit binary vector of bits, which are numbered to from left to right. represents the permission . If the value of is 0, it means that the permission is valid, otherwise it means that the permission is invalid.
Figure 3 is an example of role hierarchies given in [5]. It has four roles: programmer, test engineer, project member, and project supervisor. We can easily represent the permission of each role by using the permission vector. Let ITP=, project engineer, project member, project be the basic permission set of the system. Because there are only four basic permissions, we can use a 4-bit permission vector to represent the permission of each role; the sequence number of the four basic permissions in the permission vector is , and , respectively. At the same time, the permission of these roles allows being inherited in [5]. From Figure 3, we can find that the project supervisor owns the project supervisor permission and inherits both permissions of the test engineer and the programmer. According to Definition 4, it is easy to get that the permission vector of the supervisor is , the permission vector of the programmer is , and the permission vector of the project member is .

Definition 5 (permission relation). Let be the permission vector of user , be the permission vector of user ; we can define permission relation as follows:
If for each and , and there exist or more where and , then we say the permission level of is higher than that of . We use to denote it.
If for each and , and there are or more where and , then we say the permission level of is lower than that of . We use to denote it.
If there exists where and , and there exists where and , then we say the permission level of is not equal to that of . We use to denote it.
If for each , then we say the permission level of equals that of . We use to denote it.
According to Figure 3, we can get the permission vectors of the project supervisor, the programmer, and project member which are , , and , respectively. If both Alice and Bob are programmers, the permission vectors of Alice and Bob are . According to Definition 5, we can get , , , and .
With the definitions of permission vector and permission relation, we can define the permission equality predicate.
Definition 6 (permission equality predicate). Let be the permission vector of user and be the permission vector of user . If satisfies (2), then we call the permission equality predicate of the users and .
5. A Construction for EFDSP
We have defined EFDSP in Section 3.3. In this section, we use the efficient hidden vector encryption proposed by Park [4] and the permission vector defined in Section 4 to construct it. Let and be two secure cryptography hash functions, which are modeled as random oracles. Let be the security parameter, then our constructions for EFDSP are as follows:(i): it takes as input and outputs , where . is a generator of the group and is the map of .(ii)KeyGeneration: it takes as input and outputs , , and . is an identity-based key pair of the cloud storage CS, and is a private key/public key pair of PS. It randomly selects a master key and computes its corresponding public key . It computes and . It randomly selects and and computes . It also computes . Let , . is a signature/verification key pair of PS, and in our construction, we use DSA [7].(iii): it takes as input and performs the key generation algorithm based BLS signature [8] to generate .(iv): it first uses the secure cryptography hash function to compute the cryptography hash value of , then it uses to generate the permission vector according to Definition 4, and finally uses which is the public key of PS to generate the encrypted file tag . The concrete steps are as follows.(a)It gets by using the secure cryptography hash function on .(b)It uses its permission level to generate the permission vector according to Definition 4. Let be the encrypted file permission index subscript set, then .(c)It randomly selects two numbers and from and uses and to generate the encrypted file tag for according to (3).(v): PS first gets the permission level of from its permission database and then according to Definition 4 generates a permission query vector for , and finally it uses its own private key to generate an encrypted file query tag for . The concrete steps are as follows.(a)PS gets the permission level of from its permission database and then generates the permission query vector according to Definition 4. Let be the permission query index set, then .(b)PS randomly selects and for each it generates according to (4), and are the parts of which is the private key of PS.(c)PS computes according to (5).(d)PS uses its signature private key to sign to generate .(vi)FileQueryTagQueryAndFileDeduplication: CS first checks the format of and , and if there exists an error in its format, it stops. Otherwise it looks for the corresponding encrypted file tag according to , and if it does not find it, then it stops. Otherwise it computes according to where . If , then it represents that has been sent to CS and has the deduplication permission for . CS can perform deduplication for and let and return the file pointer of and add ID of to the corresponding file entry of , otherwise let and return .(vii): when CS receives , , and , it assigns a file pointer for and creates a file entry to store , , and .(viii): it encrypts by using AES [9]; that is, .(ix): it decrypts by using AES; that is, .(x): after receiving , it returns the corresponding encrypted file of .
6. Optimization for EFDSP
Since EFDSP can only deduplicate files between users that have the same permissions, it has two shortcomings. Firstly, users with the high permission level can operate the files of users with the low permission level in the actual enterprise setting. However, EFDSP does not allow the cloud storage to perform deduplication between files of a user with high permission level and files of a user with low permission level, which violates the actual permission management in the enterprise setting, and it is not conducive to improving the deduplication efficiency. Secondly, during the generation of the encrypted file query tag in EFDSP, all the permission bits are involved in the computation which increases the computation cost.
In this section, we use the example in Figure 3 to illustrate how to optimize the permission query index subscript set to overcome the above shortcomings in EFDSP. The permission vector of the project supervisor is , and the permission vector of the programmer is . According to Definition 5, . That is, the permission level of the project supervisor is higher than that of the programmer. Since indicates the user has the permission and indicates that the user does not have the permission, if EFDSP compares the permission level of with that of , it only needs to consider these permissions that are not owned by whether are owned by . If does not own these permissions that are not owned by , then it means that the permission level of is higher than or equal to that of . Otherwise if owns one permission that is not owned by , then it means that the permission level of does not match that of . (Either the permission level of is lower than that of or the permission level of is unequal to that of ). So when EFDSP compares the permission level of with that of , it only needs to consider the bits in the permission vector of which are . For example, if EFDSP wants to compare the permission level of project supervisor with that of programmer, as , , and all bits of are 0 except that bit 3 is 1, so EFDSP only needs to compare with . Because , it can derive that and can determine that the files of the project supervisors can be deduplicated with the files of the programmer that are stored in the cloud storage. If EFDSP wants to compare the permission level of the project supervisor with that of project member, as , and all bits of are 0, except that bit 3 is 1, so it only needs to compare with . Because , the permission level of the project supervisor does not match that of project member, and EFDSP can determine the files of project supervisor which cannot deduplicate with the files of project member that are stored in the cloud storage.
That is to say, EFDSP only considers the bits of the permission vector of the query user that are . These vector bits form a set which is defined in (7). We call it permission query index subscript set and use to represent it. If we replace in the FileQueryTagGeneration algorithm with , then EFDSP can enable the cloud storage to perform deduplication between files of a user with high permission level and files of a user with low permission level, which can improve its efficiency. In addition, in order to prevent all bits of a permission vector from being , the bit number of the permission vector is required to be more than the permission number, and EFDSP reserves the last two bits of the permission vector and codes them to be .
7. Security Analyses for EFDSP
In this section, we analyze EFDSP according to the security requirements discussed in Section 3.5. We analyze the correctness of EFDSP, the security of the encrypted file query tag which included unforgeability and indistinguishability, the confidentiality of the encrypted file tag, and the confidentiality of the encrypted file. Finally, we compare EFDSP with SADS [1].
7.1. The Correctness Analysis
To verify the correctness of EFDSP, we must verify the query process of the encrypted file query tag in EFDSP. In (6)We can getLet and , and we can get .
Therefore, if , then the above formula outputs ; otherwise, it does not output .
7.2. The Security Analysis
(1) The Unforgeability of the Encrypted File Query Tag Analysis. In EFDSP the user passes the authentication of PS and sends to PS. After receiving , PS first searches the permission database to find the permissions of the user and generates a permission query vector in accordance with Definition 4 for the user, and then PS uses its own private key to generate the query tag, since the private key of PS is kept secret and we ensure the unforgeability of the encrypted file query tag.
(2) The Indistinguishability of the Encrypted File Query Tag Analysis. The encrypted file query tag is made up of four parts, where , , , , , . Since PS randomly selects when it generates the query tag, we can regard and as two random numbers. According to (4) we can get and , where and are parts of the private key of PS, so that we can regard , and as random numbers, and then we can regard , , and as three random numbers. Because is a secure cryptography hash function, we can also regard as a random number. is publicly released, it is unconducive to help the probabilistic polynomial time (p.p.t) adversary to distinguish the encrypted file query tag with a random number; at the same time, there exist thousands of files with the same permission in the cloud storage, which make not useful to distinguish between the encrypted file query tag and a random number. Thus, we can ensure the indistinguishability of the encrypted file query tag.
(3) The Confidentiality of Encrypted File Tag Analysis. In EFDSP, when a user needs to generate an encrypted file tag for the encrypted file , it first uses its own permission level to generate the permission vector according to Definition 4 and finally uses to generate an encrypted file tag , where is the public key of PS. When it computes , it randomly selects two numbers and from . , and , , , , , . Since and are two random numbers, it is difficult for an p.p.t adversary to distinguish from a random number, thus it can ensure the confidentiality of .
(4) The Confidentiality of the File Analysis. In EFDSP, for any file , . is generated by the user performing a key generation protocol base on BLS signature [8] with KGS. Since the protocol is secure, that is, for any p.p.t adversary, if it does not own , it cannot know . At the same time, we use AES as the encryption algorithm , which is a secure algorithm; therefore, is secure. That is, for any p.p.t adversary who does not own , it cannot get from .
7.3. The Online Deduplication Oracle Attack Analysis
In EFDSP, when a user uploads a file to CS, uses its own permission vector and the public key of PS to generate an encrypted file tag. After that, only when a user that its permission level is equal to or higher than that of upload the same file, can CS perform the file deduplication. Assuming an adversary that its permission level is lower than that of uses the file deduplication of CS to launch the file online deduplication oracle attack, it first needs to forge some files against and then ask the PS to generate some encrypted file query tags for these files. PS uses its own private key, the permissions vector of , and these forged files to generate some encrypted file query tags and gives these tags to . sends these query tags to CS and then observes whether CS performs file deduplication for the upload files to get information about . Due to the fact that the permission level of is lower than that of , CS will not perform file deduplication for these upload files. It will ask to upload these files. In the end, cannot get any information about from CS. So EFDSP can prevent adversary from launching online deduplication oracle attack.
7.4. Comparison with SADS
Since SADS is the only existing encrypted file deduplication scheme with permission, we will compare EFDSP with SADS from the following aspects.(i)In SADS [1], each permission is represented by a private key, and if a user has permissions, it needs to keep private keys secretly. However, in EFDSP, the user permissions are managed by a permission server, and the user only needs to store its own permission vector and the public key of the permission server.(ii)In SADS, when a user uploads a file or queries a duplication file, if the user is assigned permissions, the system needs to use private keys to generate encrypted file tags for the file. So the space complexity of the network traffic of this scheme is . In EFDSP, the encrypted file tag of is , so when a user uploads a file , the space complexity of its network traffic is . However, the query tag of is , which has nothing to do with the number of permissions . So when a user queries the duplication file of , EFDSP requires the constant network traffic.(iii)SADS has a security weakness, while EFDSP has overcome the security weakness. We use the example of the attack against SADS in Section 1 to show how EFDSP can prevent such attack. In EFDSP, it uses a 5-bit vector to represent a permission. The first bit of the vector represents the permission of department A, the second bit represents the permission of department B, the third bit represents the permission of financial management, and the fourth bit and the fifth bit are reserved and it codes them to be . Mike has permissions for department A and department B, and because Mike is also responsible for financial management and it has the permission of the finance department, so his permission vector is . Bob is the employee of department B, his permission vector is . If Mike uploads the payslip file of Alice to the cloud storage, Mike uses the permissions vector and the public key of the permission server to generate the encrypted file tag and upload the encrypted file tag and the encrypted file to the cloud storage server. Both Bob and Alice are employees of department B, Bob wants to know the salary of Alice. Since the payslip file has a fixed format and it is a kind of small entropy file, Bob knows the file format or may even have such a file format in his hand, i.e., Bob has his own payslip. He also knows that the salary of Alice should be between 4000 and 4100, he just does not know the exact salary data of Alice. Bob can set the salary item to , and forge 100 payslip files, then Bob uploads the 100 files to the cloud storage respectively to perform the file deduplication in the cloud storage to launch online deduplication oracle attack. However, due to the use of EFDSP, when he needs to upload these files, it wants to get some query tags for these uploaded files and upload the query tag to the cloud storage server. According to EFDSP, since the permission level of Bob is lower than that of Mike, even if Bob uses the same file of Mike to generate the query tag, the cloud storage server does not perform file deduplication due to the permission level mismatch, Bob needs to upload all the files to the cloud storage, so Bob does not know which file in his uploaded files is the specific file; that is, Bob does not know the wage information of Alice.
8. Experiments
The experiment system is composed of four PCs, which simulate the client, the permission server, the key generation server, and the cloud storage server. All PCs are interconnected through a 100Mbps Ethernet network. The CPU in the PCs is Intel® Pentium® Dual E2160 1.68GHZ and the RAM is 4GB. The disk of these PCs is Western Digital Caviar Se hard drive that has a 320GB capacity, 7200 rpm with 8 MB cache. All experiments are performed in Fedora 8.0 with kernel 2.6.23.1. The cipher operation is implemented by invoking the OpenSSL cryptography library (0.98g) and the pairing-based cryptography (PBC) library. The key length of AES is bits and the security parameter of the bilinear pairing is bits.
We use txt, doc, and mp3, three kinds of files, as the test data set in the experiment, which is shown in Table 1. We test the computation costs of the encrypted file tag generation, query tag generation, file encryption, duplication file check, and file transmission in EFDSP. We conduct experiments on file size, file number, file duplication rate, and the user number with the same permission four aspects to analyze the performance in EFDSP, and all the experimental results are the average values of experiments.
(1) The Performance Effect of File Size on EFDSP. As the file size will affect the encrypted file tag generation and file encryption in the deduplication scheme, we first test the performance effect of file size on EFDSP. We upload files which have different sizes and then record the time spent. We upload files of different sizes in the file set and file set and record the time spent in each step. As the seven files are different, CS does not perform deduplication; the results are shown in Figure 4. From the figure, we can see that file size has a great effect on the key generation, the encrypted file tag generation, and the encryption process, which are linear.

(2) The Performance Effect of the File Number on the EFDSP. We select different files from the file set 3 to perform groups of experiments; before each group of the experiment, we need to initialize the system to avoid encrypted file deduplication. In experiment group 1, we upload one file, and in the second group, we upload two files; thus, in the next experiment group, add one file per time, and in the experiment group 10, we upload all files. When each file group is uploaded, we record the time spent on each step. Figure 5 shows the effect of the file number on each step. Experiment results show that the time spent in each step is in linear relationship with the file number.

(3) The Performance Effect of File Repetition Rate on EFDSP. In order to evaluate the performance effect of the file repetition rate, we divide the file set 3 into two different data test sets, each test set contains 10 files of 10MB. In each experiment, we uploaded all the files in the first data test set first. In the second file upload, we upload another 10 files, which are selected from the first data test set according to the given repetition rate, and the remaining files are selected from the second data test set, then we record the time spent on each step of the second upload. The experimental results are shown in Figure 6. From the figure, we can see that the time spent by EFDSP decreases as the file repetition rate increases. When the file repetition rate reaches , it is not necessary to encrypt and upload files. The time required to complete files of is that of when the repetition rate is .

(4) The Performance Comparison between EFDSP and SADS. SADS is the only existing encrypted file deduplication with permission; to compare the performance between EFDSP and SADS, we perform the following experiment. We select 10 files of 10MB from the file set 3 as the data test set, and we set up 6 users in the experiment. We regard the first user as the upload file owner, which uploads the 10 files to the cloud storage server first, and the 10 files of the other five users are the same with the first user’s file exactly. And then we configure the permission of these users on the permission server, respectively, so that one user, two users, three users, four users, and five users have the same permission with the first user, respectively, and we perform these experiments respectively. The experimental results are shown in Figure 7. The experiment results show that EFDSP is less efficient than SADS, but this gap decreases with the increasing number of authorized users; moreover, EFDSP has repaired the security weakness in SADS.

9. Related Works
Quinlan et al. proposed file deduplication to improve the storage space utilization in their document network storage system [10]. Using file deduplication to the cloud storage directly will bring some security issues to the files in the cloud storage. To demonstrate these security issues, Harnik et al. proposed three different kinds of attack methods [11]. To prevent these deduplication attacks, Halevi et al. proposed proof of ownership (POW) [12]. Some researchers have extended POW by improving its efficiency [13, 14]. However, these POWs cannot prevent attacks against small entropy files. Therefore, it is unrealistic to prevent all the deduplication attacks in the cloud storage by using the above POW.
As the same file encrypting with different keys will generate different encrypted files, the cloud storage server cannot deduplicate the encrypted file. So file encryption and file deduplication are incompatible to some extent. To solve this issue, Douceur et al. proposed convergence encryption [15]. The key of the convergence encryption is computed by using the hash function to the file that is encrypted. Different users that use the convergence encryption to encrypt the same file will generate the same encrypted file. Storer et al. proposed a block-level encrypted file deduplication scheme. The encryption key of the file block is determined by the contents of the file blocks [16], but their scheme is difficult to prevent the brute-force attack. Bellare et al. proposed message lock encryption based on the convergence encryption [17] and designed a deduplication key generation protocol based on RSA signature [18], but the efficiency of their key generation protocol is low due to using RSA signature. Armknecht et al. designed a server-assisted key generation protocol using BLS signature that could overcome the shortcomings of the Bellare protocol [8].
Xu et al. designed a secure client encrypted file deduplication scheme for cloud storage [19]. Subsequently, Kaaniche, Stanek, and Puzio et al. proposed their encrypted file deduplication scheme for cloud storage [20–22]. To tackle the problem of encrypted file deduplication without relying on a trusted key generation server, Liu et al. [23] and Dang et al. [24] proposed their secure encrypted file deduplication scheme that does not require additional servers respectively. But their schemes do not support file permission. Li et al. proposed an encrypted file deduplication scheme that supports fuzzy search [25]. In all above-mentioned encrypted file deduplication schemes, the user participates in the encrypted file deduplication passively. Li et al. proposed an encrypted file deduplication scheme based on hybrid cloud server which supports deduplication authorization [1], but the user permission key management in their scheme is trouble; it wants relatively large storage space and network traffic, and at the same time its authorization precision is rough and there exists a security weakness.
10. Conclusions
An enterprise can reduce its business cost by storing its files to cloud storage. All files have permission in the enterprise application environment. If the cloud storage uses an encrypted file deduplication scheme without permission, it will destroy the enterprise file permission and give rise to some security issues. To solve the problem, Li et al. proposed a secure encrypted file deduplication with permission based on hybrid cloud, but its scheme has a security weakness. In this paper, we design an encrypted file deduplication model and construct an encrypted file deduplication scheme with permission (EFDSP) by using the permission vector and HVE and we optimize the performance of EFDSP. We analyze the security and the performance of EFDSP, and the results show that EFDSP satisfies the security requirements defined in Section 3.5. We implement EFDSP and conduct the performance evaluation. The experimental results show that the performance of EFDSP is slightly worse than that of SADS. However, with the increasing number of the authorized user, the performance gap decreases. At the same time, EFDSP has overcome the security weakness in SADS. Liu et al. [23] and Dang et al. [24] proposed their secure encrypted file deduplication scheme without relying on a trusted key generation server respectively, but their schemes do not support file permission in deduplication. We will introduce their technologies to our EFDSP in future work.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest related to this paper.
Acknowledgments
The research was partially funded by the Doctoral Program of Hunan Institute of Engineering (Grant No. 17RC028), the Hunan Province Natural Science Foundation (Grant No. 2016JJ3051), and the National Natural Science Foundation of China (Grant No. 61502163).