Abstract
Over the recent years, cloud storage services have become increasingly popular, where users can outsource data and access the outsourced data anywhere, anytime. Accordingly, the data in the cloud is growing explosively. Among the outsourced data, most of them are duplicated. Cloud storage service providers can save huge amounts of resources via client-side deduplication. On the other hand, for safe outsourcing, clients who use the cloud storage service desire data integrity and confidentiality of the outsourced data. However, ensuring confidentiality and integrity in the cloud storage environment can be difficult. Recently, in order to achieve integrity with deduplication, the notion of deduplicatable proof of storage has emerged, and various schemes have been proposed. However, previous schemes are still inefficient and insecure. In this paper, we propose a symmetric key based deduplicatable proof of storage scheme, which ensures confidentiality with dictionary attack resilience and supports integrity auditing based on symmetric key cryptography. In our proposal, we introduce a bit-level challenge in a deduplicatable proof of storage protocol to minimize data access. In addition, we prove the security of our proposal in the random oracle model with information theory. Implementation results show that our scheme has the best performance.
1. Introduction
Cloud storage is an attractive service where clients can outsource data to a remote storage and access the outsourced data anywhere, anytime. Moreover, clients can reduce the large burden of local storage via cloud storage. Due to these advantages, cloud storage services are becoming increasingly popular. Therefore, the data stored in the cloud is explosively growing. According to a report, the volume of outsourced data in cloud storage is expected to reach 40 trillion GB in 2020 [1]. Among the outsourced data, most of them are duplicated [2]. Hence, cloud storage service providers can save huge amounts of storage space via deduplication, in which the cloud server maintains only a single copy of the redundant data and assigns a link of the data to all clients that own the same data. Moreover, through client-side cross-user deduplication, cloud servers can save not only storage space but also network bandwidth whereby the client directly checks data duplications to determine whether to transmit data. Therefore, cloud servers can save storage space and network bandwidth.
Despite these advantages, general client-side deduplication is vulnerable to a number of threats, as only a short and fixed identifier (e.g., hash value of the data) replaces the whole data. For example, cloud servers can be used as a content distribution network unintentionally, and malicious clients can launch a poison attack (Target Collison attack), etc. [3, 4]. Thus, cloud servers that provide the client-side cross-user deduplication have to confirm whether the client actually owns the upload requested data [4].
In addition, through various incidents such as cloud data leakages and corruption, the security of outsourced data has become an important issue. Since the data in clouds can be attacked by internal and external attackers, personal privacy or secret information of enterprises can be leaked or corrupted, which can be fatal. Therefore, the cloud server has to ensure confidentiality and integrity of the outsourced data. However, this can destroy the deduplication goal of using resources efficiently.
Firstly, in terms of confidentiality, if the client encrypts the data using conventional encryption systems, deduplication will fail. Since conventional encryption systems encrypt the data using different encryption keys for each user, it outputs different results even though the inputs are the same. To overcome this, convergent encryption was proposed, where the hash value of the data is used the encryption key [5]. With this, the same data becomes the same ciphertext after encryption, and it enables the cloud server to deduplicate while ensuring confidentiality. However, convergent encryption is vulnerable to brute-force attack. That is, convergent encryption can ensure confidentiality for only unpredictable data. Note that, without loss of generality, we assume that that the original data has enough entropy against message guessing attack as with [6].
Secondly, in terms of integrity, the cloud server does not intentionally damage the client’s data, however outsourced data can be corrupted during the internal process of the cloud by unintentional error. The cloud server may hide the incident to the client to maintain their reputation. Hence, the client can require an audit for the integrity of outsourced data periodically. However, applying conventional integrity check techniques to the cloud system, such as message authentication codes, can create a huge burden to both the client and cloud server as it requires the local data to verify integrity. To overcome this problem, Ateniese et al. [7] introduced “provable data possession (PDP)” and Juels and Kaliski [8] introduced “proof of retrievability (PoR)”, which enable remote data auditing probabilistically. However, in the perspective of deduplication, each client that has the same data must generate metadata in order to audit integrity. The cloud server then has to store all of their metadata. This can lead to a huge overhead of storage and can destroy the fundamental goal of efficient usage of resources.
Recently, in order to achieve the goal of deduplication while ensuring integrity in cloud storage environments, various techniques have been proposed [1, 2, 9–12]. However, [1, 2, 9] have heavy computational costs because the schemes are based on public key cryptography, and the client’s privacy can potentially be leaked as they support publicly verifiable integrity auditing. This means that, with respect to a file, subsequent clients have to get the first uploader’s public key to be used in integrity auditing. Therefore, every subsequent client can know who has the file. In the case of [10–12], new techniques that are not based on public key cryptography have been proposed, which are based on homomorphic operations. Compared to previous schemes based on public key cryptography, they are more efficient in terms of computation. However, they are still inefficient, and with large variations in efficiency depending on block and file size. Moreover, if the file size is small, almost all of the entire file needs to be checked (e.g., if the file size is less than 2 MB and block size is 4 KB, more than 89% of the entire file needs to be accessed). By capturing the issues mentioned above, we apply a bit-level challenge in order to achieve data access efficiency, even in the small data.
In this paper, we propose a secure and highly efficient deduplicatable proof of storage scheme based on symmetric key cryptography, namely Sec-DPoS, which ensures data confidentiality with brute-force attack resilience and supports integrity auditing based on symmetric key cryptography. In terms of secure client-side cross-user deduplication, we achieve a proof of ownership protocol by changing the role of prover and verifier in an integrity auditing protocol. In addition, we apply a bit-level challenge in an ownership check and integrity auditing protocol in order to support various file sizes with efficiency. Moreover, we prove the confidence of the detection probability for the bit-level challenge by information theory. We summarize the properties of our construction as follows:(1)Data confidentiality with dictionary attack resilience. In terms of encrypted data deduplication, we exploit key server to ensure dictionary attack resilience, as with [1, 13]. The clients encrypt data using the encryption key distributed from the key server and the encryption key is derived from the data.(2)Integrity auditing with deduplication based on symmetric key cryptography. To audit the integrity of the outsourced data, the client precomputes an expected response to be used later over encrypted data and uploads the expected responses to the cloud server in encrypted form. Our integrity auditing protocol is a motivated symmetric key based integrity auditing scheme. We apply a bit-level challenge to the symmetric key based integrity auditing scheme. In addition, since the expected response is generated using a message-derived key, a client with the same data can audit the integrity of the outsourced data without generating additional metadata.(3)Secure proof of ownership over encrypted data. Proof of ownership is similar to the integrity auditing protocol but with the role of prover and verifier changed. In our construction, only clients with intact data can pass the proof of ownership protocol. Also, since the proof of ownership protocol is performed over encrypted data, the protocol does not expose any information of plaintext.(4)Privacy leakage resilience. In the public key based solution, subsequent clients need to know the public key generated by the first uploader to audit the integrity. This means that subsequent clients can immediately learn who has a file. In our construction, we can prevent privacy leakage in the integrity auditing process as the scheme is based on symmetric key cryptography.
Our Contributions. The main contributions of our proposal can be summarized as follows:(1)Sec-DPoS is the first approach of deduplicatable proof of storage based on symmetric key cryptography and is a secure and highly efficient deduplicatable proof of storage scheme with ensuring confidentiality. We analyze the security of Sec-DPoS and prove that our scheme is secure in the random oracle model.(2)In contrast to the previous solutions, in order to ensure efficiency, even if the data is small, we adopt a bit-level challenge and prove confidence of detection probability in information theory.(3)We implement and evaluate our proposal. The implementation results show that our scheme has the highest performance compared to other schemes.
The remainder of this paper is organized as follows: In Section 2, we briefly review related work. In Section 3, we present the system construction, security model and design goal. We then propose Sec-DPoS in Section 4 and analyze the security of Sec-DPoS in Section 5. In Section 6, we present the implementation of our scheme and compare with other schemes. Finally, in Section 7, we conclude the paper.
2. Related Works
In this section, we discuss previous work related to secure deduplication, integrity auditing and recent solutions that achieve the goal of both deduplication and integrity auditing.
2.1. Deduplication
In cloud storage environments, cloud servers can save storage space via deduplication, where the cloud server keeps only a single copy of the files. In addition, the server can save network bandwidth, as well as storage space, via client-side deduplication. However, general client-side deduplication has vulnerabilities as only a short and fixed identifier (e.g., hash value of a file) replaces the entire file [3]. Hence the server should verify that clients who intend to upload files have the intact file. For secure client-side deduplication, Halevi et al. [4] introduced the proof of ownership (PoW) notion, in which only the client who has an intact file can pass the ownership verification. They also proposed several PoW schemes whereby the client can efficiently prove ownership to the server based on the Merkle hash tree. Pietro and Sorniotti [14] proposed a more efficient PoW scheme, known as s-PoW. Since s-PoW needs only randomly chosen k position bits, the complexity of the protocol is independent from file size.
On the other hand, in order to achieve deduplication over encrypted data, Douceur et al. [14] proposed convergent encryption (CE) that uses the hash value of a file as an encryption key. Therefore, the same files create the same results after encryption. Similarly, Bellare et al. [6] proposed a cryptographic primitive known as message-locked-encryption (MLE). However, since both CE and MLE use the hash value of the file as an encryption key, they are vulnerable to dictionary attack (i.e., predictable data can be leaked by brute force attack). To overcome this problem, Keelveedhi et al. [13] proposed DupLESS in which clients encrypt data using a message-derived key via interaction with a key server. Since the encryption key is generated by an oblivious pseudorandom function and the key generation request is bounded by rate-limiting, outsourced data can be protected from brute force attack. Liu et al. [15] proposed an encrypted data deduplication scheme without additional independent servers, while ensuring brute force attack resilience. In [15], since the protocol is based on Password Authenticated Key Exchange (PAKE), clients who have the same file can share an encryption key without additional servers. In integrated network environments, Qi et al. [16] proposed an encrypted data deduplication scheme that improves the security and network latency by introducing many key servers in the network. In terms of combining the both functionalities, various studies have been conducted to satisfy data confidentiality and to support ownership check [17–20].
2.2. Integrity Auditing
From the perspective of the client, integrity auditing of the outsourced data is one of the important issues for secure outsourcing, as the outsourced data can be corrupted by unintentional errors. Ateniese et al. [7] proposed a notion of provable data possession (PDP) for ensuring integrity of remote data, in which the client can audit the integrity of the target file without maintaining the entire file. Ateniese et al. [21] proposed a highly efficient PDP scheme based on symmetric key cryptography, with the support of a dynamic scenario except insertion. To support dynamic scenario with insertion, Erway et al. [22] proposed dynamic-PDP based on a rank-based skip list. Wang et al. [23] proposed a proxy-PDP, in order to relax the computational overhead for tag generation. Zhu et al. [24] proposed a cooperative-PDP scheme in a multicloud environment. Based on convergence encryption, Liu et al. [25] proposed integrity auditing scheme and considered integrity tag deduplication over encrypted data.
In another way to support integrity auditing, Juel and Kaliski [8] proposed a notion of proof of retrievability (PoR), in which integrity auditing is performed using a sentinel inserted into the file. Compared with PDP, PoR support retrievability, as well as integrity auditing, yet it has a limitation in the number of queries. In order to achieve both private and public verifiability, Shacham and Waters [26] proposed two types of PoR schemes using a homomorphic authenticator. Wang et al. [27] proposed an improved PoR scheme based on the Merkle hash tree to achieve the goal of PoR in dynamic scenarios. Xu and Chang [28] proposed an improved PoR scheme to reduce communication costs. Aimed at specific conditions, Li et al. [29] proposed OPoR to support PoR over resource limited devices, and Ren et al. [30] proposed a PoR scheme in dynamic scenarios for coded cloud systems.
2.3. Secure Client-Side Deduplication with Integrity Auditing
As a method that provides both secure deduplication and integrity auditing, Zheng and Xu [9] firstly proposed proof of storage with deduplication (POSD), based on public key cryptography. However, an error in security occurs if the first uploader maliciously generates a pair of public and private keys [31], and POSD does not ensure confidentiality of the outsourced data as it is run over plaintext. Yuan and Yu [2] proposed a scheme called PCAD that supports both deduplication and integrity auditing with batch auditing, in which the server can simultaneously prove the possession of multiple files. Li et al. [1] proposed two schemes, namely SecCloud and SecCloud+. In both schemes, the author introduced an auditing entity that maintains a MapReduce cloud, which helps the client to generate block tags for integrity auditing. Additionally, SecCloud+ ensures confidentiality, where the client encrypts files using a message-derived encryption key distributed from the key server. In terms of efficiency improvement, Youn el al. [32] proposed a new scheme based on the homomorphic linear authenticator [26].
However, since the schemes in [1, 2, 9, 32] are based on public key cryptography, they have heavy computational cost and privacy can be leaked. For example, for the systems that support deduplication with public key based integrity auditing, subsequent uploaders must use the public key of the first uploader. In this case, the subsequent uploader can know who has the file, leading to privacy leakages.
As another approach, Du et al. [10] proposed proof of ownership and retrievability (PoOR) based on the Merkle hash tree and homomorphic verifiable tags. Compared with public key based schemes, PoOR is more efficient in terms of computational costs. As an improved PoOR, Chen et al. [11] proposed a Message-locked PoOR scheme that applies a message-derived key and supports remote repairing. Since clients who have the same file can generate the same convergent key, privacy leakage can be preserved. However, Message-locked PoOR causes unnecessary block access in the ownership check protocol as this is based on HMAC. He et al. [12] proposed a deduplicatable dynamic proof of storage scheme based on a homomorphic authenticated tree in order to support dynamic scenarios. However, the schemes in [11, 12] are still inefficient and vulnerable to dictionary attack. Moreover, there is a large variation in efficiency, depending on block and file size. If the file size is small, almost all of the entire file needs to be checked.
3. Models and Goal
In this section, we describe components and design the goal of our proposal. We first illustrate the system model and present a threat model that can occur in the cloud storage environment. In the description of the design goal, we present a trivial solution that is a simple combination of the secure client-side deduplication and symmetric key based integrity auditing scheme. Following this, by capturing problems of a trivial solution, we describe how our approach achieves deduplicatable proof of storage.
3.1. System Model
A structure of our proposed scheme consists of three entities as shown in Figure 1.(i)The cloud server () provides cloud storage services. Typically, the cloud server operates a large storage space and computational resources. The cloud server attempts to minimize the bandwidth and to optimize the use of storage space via client-side cross-user deduplication. We assume that the cloud server is honest-but-curious.(ii)The client () uses the cloud storage service provided from the cloud server. The client uploads data and has access to the outsourced data at all times.(iii)The management server () helps clients upload data. The management server distributes a message-derived secret key and manages the challenge index. We assume that the management server is a trusted third party (the functionality of the management server is described in detail at Section 4).

In our system, when a client wants to upload a file , the client firstly interacts with the management server to get a secret key and challenge index. The client then generates an identifier of the file and sends an upload request with the identifier to the cloud server. The cloud server checks whether the file exists in storage. If the file exists in the cloud, the client does not need to upload the file and the cloud server provides a link of the file to the client after identifying whether the client actually has the file. If the file does not exist in the cloud, the client uploads the encrypted file to the cloud with the preprocessed information. After the upload, the client can audit the integrity of the outsourced data at any time.
3.2. Threat Model
In this subsection, we discuss various threats for the cross-user client-side deduplication environment within the remote data auditing system. In our system model, we assume that the cloud server is honest-but-curious. This means that the cloud server honestly performs system protocols yet it can curiously intrude client’s privacy as it has access to the client’s data. Moreover, the cloud server can be a victim of an outside attack. Hence, the client’s data in the cloud server can potentially be leaked inside and outside. Thus, we design our scheme to ensure data confidentiality with brute-force attack resilience by introducing the management server.
The cloud server does not intentionally damage outsourced data. However, the data in the cloud can be corrupted by unintentional system errors. When the data in the cloud is corrupted, the cloud server can hide the data loss incident to the client in order to maintain their reputation. Thus, we design our scheme to ensure that if the cloud server loses a part of the outsourced data, it cannot forge integrity to the client in the audit. Briefly, a cloud server that loses a part of the outsourced data cannot pass the integrity auditing protocol with a given probability (e.g., 99%). In Section 5, we prove the security of unforgeability in detail.
In the perspective of cross-user client-side deduplication, as mentioned previously, a malicious client that has only partial information of a file can claim possession in order to maliciously obtain the file. The malicious client can attempt to convince the cloud server of its ownership with the check protocol without the entire file. Thus, we design our scheme to ensure that malicious clients cannot cheat the cloud server in ownership checking. Briefly, a malicious client cannot pass the ownership checking protocol, except with a negligible probability. In Section 5, we prove the security of uncheatability in detail.
3.3. Design Goal
To achieve both integrity auditing and secure deduplication in practice, we considered Ateniese et al.’s scheme [21], a highly efficient integrity auditing scheme based on symmetric key cryptography. As a trivial solution, there is a simple method that combine Ateniese et al.’s scheme with the cross-user client-side deduplication system, as in the following case:
Trivial Solution. When clients intend to upload a file as a first uploader, they first generate, then sequentially arrange, expected responses for integrity auditing. The client then uploads the file and the arranged set of expected responses to the cloud. Note that the expected responses are encrypted using authenticated encryption before upload using a randomly chosen secret key. In this case, since the file is uploaded first, the cloud server generates metadata (expected responses used in the ownership check protocol) for secure deduplication and sends the file to the secondary storage.
Unfortunately, there are two major limitations. The first problem is that subsequent clients that have the same file cannot use the expected responses generated by the first uploader as it is encrypted by the first uploader’s private key. Hence, subsequent clients have to generate another set of expected responses, which can lead to intense overheads in terms of storage space, network bandwidth and computational costs. The second problem is the management of the challenge index. Even if the first problem is resolved, there may be a collision problem of the challenge index. Under the assumption that the first problem is solved, if subsequent clients have the same file, they can use the arranged set of expected responses generated by the first uploader. However, every client who has the ownership of the file cannot know that what values are used. This means that certain expected responses can be used repeatedly. The cloud server can then launch a replay attack i.e., the cloud server can simply avoid the integrity auditing protocol by storing the pairs of used challenges and responses.
In order to overcome above problems, we exploit a management server, in which the client can get a message-derived key from the management server, as in [1, 13]. The key is used in the file encryption and integrity auditing. Moreover, we design the management server to handle the challenge index in order to avoid the challenge index collision problem.
With respect to ownership checking, we design our proof of ownership scheme to change the role of prover and verifier in the integrity auditing protocol. Thus, the cloud server generates expected responses for proof of ownership before transmission of the file to the secondary storage and retains the expected responses in the local storage.
As illustrated above, we achieve both the goal of secure client-side cross-user deduplication and integrity auditing based on symmetric key cryptography. In our construction, the cloud server can save both storage space and network bandwidth while efficiently ensuring data integrity and confidentiality.
4. Sec-DPoS: A Symmetric Key Based Deduplicatable Proof of Storage
In this section, we describe our proposal in detail. First, we illustrate about preliminaries and notations. Then, our proposed scheme is described in a detailed way. The components of our scheme consists of four protocols.
4.1. Preliminaries
Firstly, we describe the building blocks as follows:(i)Collision-resistant hash function. A hash function is collision-resistant if it is impossible to find two different values x and that satisfy and takes a binary string of arbitrary length as input, and outputs a binary string of fixed length.(ii)Key derivation function. A key derivation function is a deterministic function that takes a secret seed and an input and outputs a secret key.(iii)Pseudorandom function. A pseudorandom function is a deterministic function that takes a key and an input and outputs a value that is indistinguishable from a truly random function of the same input within the same range. We define .(iv)Pseudorandom permutation. A pseudorandom permutation is a deterministic function that takes a key and an integer where and outputs an integer where . It is indistinguishable from a truly random permutation of the same input . In our construction, we use to extract bit indices of a file . Therefore, where denotes the bit length of the file . We define .(v)Deterministic symmetric encryption. A deterministic symmetric encryption takes a key and a plaintext as input and outputs a ciphertext. We use the notation to denote the deterministic symmetric encryption algorithm.(vi)Authenticated encryption. An authenticated encryption algorithm takes a key and a plaintext as input and outputs a ciphertext and authentication tag. We use the notation to denote the authenticated encryption algorithm.
Secondly, we describe our notation as follows:(i)Expected response set. An expected response set is an array that is a set of precomputed responses to audit the integrity of remote data. In particular, is generated from the client to audit integrity of outsourced data and is generated from the cloud server for proof of ownership, where and .(ii)Challenge index. A challenge index indicates a specific location of . In particular, indicates a specific location of and indicates a specific location of , where and . Additionally, the client retains for integrity auditing and the cloud server retains for proof of ownership.(iii)Index set. An index set is a set of ordered natural numbers where . is divided into -subsets and is a multiple of : , where denotes element of and consists of indices i.e., , for .
4.2. The Construction of Sec-DPoS
The Sec-DPoS scheme consists of four protocols. Firstly, we describe the key and index distribution protocol, where we assume that the client and management server communicate over a secure channel. The file upload process is divided into two protocols: the initial upload protocol and the subsequent upload protocol. Lastly, we describe the integrity auditing protocol.
4.2.1. Key and Index Distribution Protocol
An initial uploader generates an expected response set for a file to audit integrity using a message-derived key that is distributed from the management server. The expected response set is then uploaded with the file and every subsequent client who has the same file can use the expected response set generated by the first uploader. Note that every client who has the same file can get the same secret key via the management server. In this case, one important point is that used values in the expected response set should not be reused. However, for the case of a naïve solution, certain values can be reused as every client, including the first uploader, cannot know what values are used. In order to avoid the challenge index collision, we introduce a management server to preassign indices that are indicators of certain values in the expected response set. In addition, since clients encrypt files using the message-derived key that is distributed from the management server, outsourced data is resilient to dictionary attack in our model. The key and index distribution protocol is run as follows.
Step 1. computes a hash value and sends the key and index request to with .
Step 2. Upon receiving the key and index request, invokes Algorithm 1 and sends outputs of Algorithm 1 to .
|
Algorithm 1 generates a secret key, the highest challenge and an index subset. Upon receiving , the management server checks whether is already in data table . If not, the management server computes using master key and sets , , where is an initial ordered index set with (we assume that the size of the expected response set is predetermined and publicly known) and (in our scheme, in order to audit integrity, the client precomputes the expected response set that contains a number of expected responses. That is, the client can audit integrity as much as the number of precomputed responses. If all are used or assigned (i.e., ), a client has to generates a new expected response set. In this case, the cloud server can hold multiple expected response sets. The challenge index can then become confused. Thus, we use the highest challenge as a track of the challenge index in order to prevent confusion) is the highest challenge. The management server then records new data in , where is saved as the lookup key and , , are saved as values corresponding to (line (1)-line (5)). If is already in , the management server loads values corresponding to from . After values are built or loaded (line (6)-line (10)), the management server randomly chooses an element from (if is empty, then renewal to ) and sends , , to client. If was empty, the management server sends , , to the client with the expected response renewal request (line (11)-line (12)).
At the end of the key and index distribution protocol, the client receives ,, and the management server removes in (i.e., ).
4.2.2. Initial Upload Protocol
The initial upload process assumes that a file is uploaded as new data that is not previously been uploaded. Thus, the client generates the expected response set that is to be used in the integrity auditing and the cloud server generates another expected response set that is to be used in the ownership check. The initial upload protocol is run as follows.
Step 1. generates an encryption key and computes and . then sends a file upload request to with ( is used as a file identifier).
Step 2. Upon receiving the upload request with , checks whether is in the storage using . If not, sends a data transmission request to .
Step 3. Upon receiving the data transmission request, generates an expected response set by invoking Algorithm 2 and sends encrypted data with expected response set to .
|
Step 4. Upon receiving and , generates an expected response set by invoking Algorithm 3. then stores at a secondary storage and keeps and in local storage.
|
Algorithm 2 generates an expected response set that contains expected responses to be used in integrity auditing. Algorithm 2 takes encrypted data , a file tag value , a secret key and the highest challenge as an input and outputs an expected response set .
Firstly, the client generates for (line (1)-line (2)) and makes a counter (line (4)). The client then generates a pseudorandom key and nonce corresponding to the counter (line (5)-line (6)). Subsequently, the client extracts indices and generates a token by concatenating bits within the encrypted data corresponding to the extracted indices, where , ( denotes the bit size of and denotes -th bit of ) (line (7)). The client then generates an expected response by computing a hash value of with (line (8)). Finally, the client encrypts via the authenticated encryption scheme (line (9)). This procedure is repeated times while increasing the counter (line (3)-line (9)).
Algorithm 3 generates an expected response set that contains expected responses to be used in the ownership check. Algorithm 3 takes a ciphertext , a file tag value , a master key and the highest challenge as an input and outputs an expected response set . Note that the cloud server also uses the highest challenge as a track of the challenge index in the ownership check protocol. In the initial upload protocol, . The rest of the algorithm follows Algorithm 2. However, Algorithm 3 does not encrypt expected responses as is retained and only used by the cloud server.
At the end of the initial upload protocol, the cloud server sets . Finally, the client holds in local storage and the cloud server holds in local storage and saves at the secondary storage. Note that all values that are stored in local storage have negligible size compared to the file.
4.2.3. Deduplication Protocol
The deduplication process assumes that a file is uploaded as duplicated data from a previous upload. Thus, the cloud server must verify that the client actually has the file. The deduplication protocol contains the ownership check protocol and is run as follows:
Step 1. generates an encryption key and computes and . then sends a file upload request to with .
Step 2. Upon receiving the upload request, checks whether is in the storage using . If is in the storage, runs Algorithm 4 with .
|
Step 3. If Algorithm 4 returns “”, assigns a link of the file to the client. Otherwise, returns “”.
In Algorithm 4, the cloud server interacts with the client to verify that the client actually has the file. Firstly, the cloud server computes for (line (1) ~ line (2)), where is a master secret key of the cloud server. The cloud server then generates the challenge and , and sends them to the client, where (line (3)-line (6)). Upon receiving the challenge, the client extracts indices and generates a token by concatenating bits within the ciphertext corresponding to the extracted indices (line (7)). The client then generates a proof by hashing the token with nonce (line (8)-line (9)). Finally, the client sends the proof to the cloud server as a response. Upon receiving the response, the cloud server verifies the response by comparing with the value of the expected response set . If the proof is equal to , the cloud server accepts that the client actually has the file. Otherwise, the cloud server returns “reject” (line (10)-line (13)).
If ownership is accepted, the cloud server assigns a link of the file to the client. Thus, the client does not need to send the file. Note that, as every client who has the same file can get the same secret key, these clients can audit the file integrity without the need to upload any information.
At the end of the ownership check protocol, the cloud server computes and if , and subsequently has to renew the expected response set . The renewal process is equal to Algorithm 3, except that the highest challenge , where is the size of expected response set.
4.2.4. Integrity Auditing Protocol
The client that has ownership of a file can audit the integrity of the outsourced data at any time. Before running the integrity auditing protocol, the client chooses one element from the assigned index set , sets it to the challenge index and removes the element from .
Algorithm 5 presents the integrity auditing protocol. Firstly, the client generates for (line (1)-line (2)). Then, the client computes and , where and sends , , to the cloud server as a challenge (line (3)-line (6)). Upon receiving the challenge, the cloud server extracts indices and generates a token by concatenating bits within the ciphertext corresponding to the extracted indices (line (7)). The cloud server then generates a proof by hashing the token with challenged nonce (line (8)). Finally, the cloud server sends the proof with the value of the expected response set to the client (line (9)). Upon receiving the proof with , the client extracts an expected response value by computing (line (10)). If is not valid, the client returns “reject” (line (11)-line (12)). Otherwise the client compares the proof with . If equals , the client accepts that the outsourced data is intactly stored. Otherwise, the client returns “reject” (line (13)-line (16)).
|
At the end of the integrity auditing protocol, if all the preassigned challenge index are used (i.e., ), the client has to obtain a new challenge index set from the management server. The challenge index reissuing process is similar to Algorithm 1, however the management server does not need to load or send the secret key. Note that if the client has to renew the expected response set, the client runs Algorithm 2 and sends a new expected response set to the cloud server.
5. Security Analysis
In this section, we analyze the security of Sec-DPoS. Firstly, we formalize the security definitions that consist of two parts: client uncheatability in proof of ownership and server unforgeability in integrity auditing.
5.1. Security Definitions
In the cross-user client-side deduplication system, a malicious client that has only partial information of a file can attempt to convince the cloud server in the ownership check protocol. Thus, it is necessary that the malicious clients cannot cheat the cloud server for ownership of the entire file. We first summarize the overall process of the ownership check protocol in our Sec-DPoS scheme.
When a client attempts to upload the duplicated data, the cloud server generates a random seed key with nonce and sends these to the client as a challenge. Upon receiving the challenge, the client extracts indices and generates a token by concatenating bits within the ciphertext corresponding to the extracted indices. Subsequently, the client generates a proof by hashing the token with nonce and sends the proof to the cloud server as a response. Finally, the protocol outputs “accept” if the proof is valid. Otherwise, the protocol returns “reject”.
Now, we present the definition of client uncheatability over Sec-DPoS, based on game scenario between a challenger (the role of the cloud server) and an adversary (the role of the client). We prove the security of Sec-DPoS over a weak assumption, that the adversary can build an expected token before the challenger makes a challenge. This means that the adversary can get bits of ciphertext via an oracle, even if the adversary does not have the whole file.
In the Sec-DPoS scheme, the experiment for uncheatability can be described as follows:(i)Setup phase. The challenger randomly chooses a data , a master key and sends to an oracle. then sets up the ownership check system of Sec-DPoS over .(ii)Learning phase. The adversary can query the oracle at any point in time with indices. If queried, the oracle replies all bits corresponding to queried indices from .(iii)Challenge phase. run the ownership check protocol with . If the ownership check protocol returns “accept”, outputs 1. Otherwise, outputs 0.
Definition 1 (client uncheatability). The Sec-DPoS scheme is uncheatable if for any probabilistic polynomial time (PPT) adversary and for security parameter ,where is a negligible function.
Next, we have to consider server unforgeability. The client that has the ownership of a file can audit integrity of the outsourced data at any time. In our Sec-DPoS scheme, the integrity auditing protocol is similar to the ownership check protocol, however the role of prover and verifier is changed and the cloud server take expected responses (in encrypted form) provided by the client. According to this situation, we can build the security game.
The definition of server unforgeability is defined based on the game scenario between a challenger (the role of the client) and an adversary (the role of the cloud server). Then, the experiment for unforgeability is described as follows:(i)Setup phase. The challenger randomly chooses a data , a secret key and sends to an oracle. then sets up the integrity auditing system of Sec-DPoS over .(ii)Query phase. The adversary can query the oracle at any point in time with indices. If queried, the oracle replies all bits corresponding to the queried indices from . If the adversary queries the challenger , returns an expected response in encrypted form.(iii)Challenge phase. run the integrity auditing protocol with . If the integrity auditing protocol returns “accept”, outputs 1. Otherwise, outputs 0.
Definition 2 (server unforgeability). The Sec-DPoS scheme is unforgeable if for any PPT adversary and for security parameter ,where is a negligible function.
5.2. Security Proof
In this subsection, we prove the security of Sec-DPoS corresponds to the security definition.
Before we prove the security in terms of the security definition, we first have to go through the data confidentiality of Sec-DPoS. Regarding the data confidentiality in our model, we argue the following theorem.
Theorem 3. Sec-DPoS ensures confidentiality with a brute-force attack resilience if any PPT adversary is not allowed to compromise with the management server.
Proof. In our system, a management server generates the convergent key associated with a private key of the management server and we assume that the key distribution process is run over a secure channel. Thus, no adversary who attempts to launch a brute force attack can generate the valid encryption key without the private key of the management server. Therefore, even if the file is predictable, the adversary cannot guess the plaintext by launching a brute force attack.
However, the adversary can attempt a brute force attack via the management server as the management server cannot make a distinction between an honest and malicious client. Hence, by applying a per-client or per-file limitation strategy, as in [13] or [15], we can achieve confidentiality with a brute force attack resilience.
Now, we prove the security for uncheatability and unforgeability under the assumption that Theorem 3 holds.
Theorem 4. Let be a security parameter. Let be a random oracle with output length . Assume that the pseudorandom function and the pseudorandom permutation are secure with key length Sec-DPoS holds client uncheatability for any PPT adversary who can make queries to and queries to an oracle that returns a corresponding bit from ciphertext to queried index.
Proof. To show that Theorem 4 holds, we consider the experiment in Definition 1. In our experiment , the adversary can get bits of the data via the oracle, where , and the challenger runs the ownership check protocol with the adversary. The advantage that the adversary wins the uncheatability game is:
The second experiment is identical to except that the adversary encounters a hash collision. Since is a random oracle, we havewhere and . Then, we can have
The third experiment is identical to except that the adversary predicts the random seed key and nonce that are to be challenged. Since we assume that the pseudorandom function is secure, we havewhere . Then, we can haveThe advantage of the third experiment can be determined as follows.
Suppose that the adversary can obtain at most a fraction of (i.e., ). Now, we compute the probability that an adversary who can get a fraction of will pass the ownership checking protocol.
In the ownership checking protocol of our proposed scheme, the client has to extract bits from the target data , corresponding to random indices. Let be an event that the adversary owns fraction of , and be an event that the adversary successfully passes a single-bit challenge in (i.e., ). The probability can then be computed as follows:If the single-bit challenge is in the known fraction then the adversary can always pass (i.e., ). However, the adversary cannot response correctly if the single-bit challenge is in the unknown fraction. In this case, the best way of successful pass is to guess the response, i.e., . Let . Then, we have Finally, we can compute the probability that the adversary successfully passes a bits challenge by
We desire a probability . To achieve client uncheatability, the size of the challenge can be derived as follows:where . Thus, if the size of challenge , we can haveThen, we can show that Sec-DPoS holds client uncheatability as follows:Additionally, as a realistic scenario, the adversary may know part of the plaintext. Thus, we also have to consider the scenario whereby the adversary who has a fraction of plaintext attempts to convince the challenger. Let be an event that the adversary owns fraction of plaintext and be an event that the adversary successfully passes a single-bit challenge in the ownership check protocol (i.e., ). The probability can then be computed as follows:Under the assumption that Theorem 3 holds, unlike , the adversary has to guess the response regardless of the challenged position (i.e., ). Thus, Let , thenFinally, we can compute probability that the adversary successfully passes a bit challenge by
We desire a probability . To achieve client uncheatability in this scenario, the size of the challenge can be derived as follows:where . Thus, we can take the size of the challenge as . However, since , we have .
Next, we can directly have the following theorem for server unforgeability.
Theorem 5. Let be a security parameter. Let be a random oracle with output length . Assume that the pseudorandom function and the pseudorandom permutation are secure with key length . If is an ideal authenticated encryption function with key length and authentication code length , Sec-DPoS holds server unforgeability for any PPT adversary who can make queries to and queries to an oracle that returns a corresponding bit from ciphertext to queried index.
Proof. To show that Theorem 5 holds, we consider the experiment in Definition 2. In our experiment , the adversary can get bits of the data via the oracle and can get expected responses by the authenticated encryption form, where . The challenger then runs the integrity auditing protocol with the adversary. The advantage that the adversary wins the unforgeability game is
The second experiment is identical to except that the adversary attempts to cheat the authenticated encryption function. In this case, the adversary attempts to break encryption or forge the authentication code. In the Sec-DPoS, since we assume that is an ideal authenticated encryption function, we havewhere and . Then, we have
The third experiment is identical to except that the adversary encounters a hash collision. Since is a random oracle, we havewhere and . Then, we can have
The forth experiment is identical to except that the adversary predicts the random seed key and nonce that are to be challenged. Since we assume that pseudorandom function is secure, we havewhere . Then, we can haveThe advantage of the forth experiment can be determined as follows.
Suppose that the adversary can obtain at most a fraction of (i.e., ). Now, we compute the probability that an adversary who can get a fraction of can pass the integrity auditing protocol.
In the integrity auditing protocol of our proposed scheme, the cloud server has to extract bits from the target data corresponding to random indices. The rest is similar to the proof in . Thus, if the size of challenge , we can havewhere . We can then show that Sec-DPoS holds client unforgeability as follows:
5.3. Analysis of Detection Probability
Unlike other schemes in [1, 2, 9–12], we apply a bit-level challenge to Sec-DPoS. Thus, we should analyze the detection probability when an attacker who has only a fraction of data attempts to convince that it owns the whole data or the data is stored intactly in the ownership check protocol or integrity auditing protocol. In the integrity auditing and ownership check protocol, the size of the challenge is and , respectively, and we can choose these by setting the desired security parameter . As proved above, , , where denotes the guessing probability of single-bit challenge for an unknown fraction and and denote that adversary knows and fractions of the target data (we assume that adversary can get a portion of encrypted data by querying to an oracle) in the ownership check and integrity auditing protocol, respectively.
Firstly, we analyze the detection probability in the ownership check protocol. Let Then we can havewhere denotes the unknown fraction of the target data that the client does not know. As shown in Figure 2, we present the size of the challenge by setting to different values for various security parameters.

Next, we analyze the detection probability in the integrity auditing protocol. Let Then we can havewhere denotes the unknown fraction of the target data that the cloud server does not know. Note that the unknown fraction in this case is the same as the corrupted fraction. This means that the cloud server cannot determine the damaged fraction when the data is corrupt from unintentional errors. As shown in Figure 3, we present the size of challenge by setting differently for 99% and 95% confidence.

6. Implementation
In this section, we present the implementation results, evaluate our Sec-DPoS scheme, and compare with other schemes. In order to evaluate the efficiency of Sec-DPoS, we compared with other schemes, namely, SecCloud [1], Message-locked PoOR [11], and DeyPoS [12], and all schemes are implemented and evaluated over Intel Core i7-4790 CPU @ 3.60 GHz. All implementation results represent the median value of 100 trials.
6.1. Implementation of Ownership Check Protocol
The ownership check process in Sec-DPoS simply challenges random indices for an encrypted file. As analyzed in Section 5, the size of the challenge can be set by setting , which denotes the unknown fraction of the target data. Note that the ownership check protocol in Sec-DPoS ensures confidence. In our implementation, we take and . We assume that the selected is sufficiently reasonable as the adversary has to know 95%, 90% and 85% for encrypted data, not plaintext. As shown in Figure 2, we set the size of the challenge as for and . We present the implementation results for various and various sizes of data (see Figure 4). As shown in Figure 4, the time cost of the ownership check protocol in Sec-DPoS does not depend on the size of the data.

In order to evaluate the efficiency of Sec-DPoS for the ownership checking protocol, we also measured the challenge phase, the response phase, and the verification phase, respectively. As shown in Figure 5, we present our implementation results compared to other schemes for a 64 MB file. In particular, Sec-DPoS was measured when the size of the challenge is 2219 for 80-bit security. For the case of SecCloud, the same scheme was used as with PoWs [4]. For Message-locked PoOR, since the scheme is based on HMAC, the target file needs to be accessed in the ownership check process (other schemes do not access the target file). For the implementation of Message-locked PoOR, we set the size of the block as 4 KB, the same setting used in [11]. For DeyPoS, since the scheme is based on a homomorphic authenticated tree, there is a large variation in efficiency, depending on block size (if the size of the block is small, the height of the tree increases and vice versa). For the implementation of DeyPoS, we set the size of the block as 64 KB and the computation cost was measured more efficiently compared with SecCloud and Message-locked PoOR ([12] set the size of block as 4 KB, 16 KB, 64 KB, with the latter having the highest efficiency in our experiment). For the case of Sec-DPoS, the computation cost was measured as 0.1 ms, 10 ms and 0.001 ms in the challenge, response, and verification phase, respectively. Hence, we can evaluate that our Sec-DPoS scheme has the highest efficiency for the ownership check protocol. Note that since SecCloud and DeyPoS are based on the tree structure in the ownership check, if the size of the file increases, time cost increases while the time costs of Message-locked PoOR and Sec-DPoS are constant. In terms of the network latency, Sec-DPoS and Message-locked PoOR have communication cost and the others need communication cost.

6.2. Implementation of Integrity Auditing Protocol
The integrity auditing process in Sec-DPoS simply challenges random indices for an encrypted file. As analyzed in Section 5, the size of the challenge can be determined by setting to denote the corrupt fraction of the target data. In our implementation, we implement the integrity auditing protocol of Sec-DPoS for 99% and 95% confidence. We stress that 99% and 95% confidence are the same conditions as those of other schemes. As shown in Figure 3, we set the size of the challenge as for 99% confidence and for 95% confidence when the corrupted fraction of the target data is 0.01 (i.e., ). We present implementation results for and various sizes of data (see Figure 6). As shown in Figure 6, the time cost of the integrity auditing protocol in Sec-DPoS does not depend on the size of the file.

In order to evaluate the efficiency of Sec-DPoS for the integrity auditing protocol, we also measured the challenge phase, response phase, and verification phase, respectively. As shown in Figure 7, we present our implementation results compared with other schemes for a 64 MB file. All schemes were measured for 99% confidence. In the case of SecCloud, since the scheme is based on public key cryptography, the computation cost was measured to be greater than 300 ms. For Message-locked PoOR, we set the size of the block as 4 KB (the same setting as with [11]) and the computation cost was also measured to be greater than 300 ms. For case of DeyPoS, we set the size of the block as 64 KB and the computation cost was measured more efficiently ([12] set the size of the block as 4 KB, 16 KB, 64 KB, with the latter having the greatest efficiency in our experiment). However, since DeyPoS is based on a tree structure for the integrity auditing, if the size of the file increases, time cost also increases. Moreover, since the size of the block is set as 64 KB and the number of challenge is 480, the entire file has to be accessed with less than 30720 KB, which is impractical. Similarly, Message-locked PoOR also needs access to the entire file for less than 1920 KB. For the case of Sec-DPoS, the computation cost was measured 0.1 ms, 2 ms and 0.03 ms in the challenge, response, and verification phase, respectively. Hence, we can evaluate that Sec-DPoS has the greatest efficiency for the integrity auditing protocol. In terms of the network latency, Sec-DPoS and Message-locked PoOR have communication cost and the others need communication cost.

6.3. Implementation of the Initialization Phase
When the client uploads fresh data, the client and cloud server have to invoke a precomputation process. In the precomputation process, the client generates expected responses for integrity auditing and the cloud server generates expected responses for the ownership check. In addition, the size of challenge is in the integrity auditing and in the ownership check. Therefore, we measured the time cost of the initialization process by varying each variable for the integrity auditing and ownership check, respectively. In this experiment, we implemented over Intel Core i7-4790 CPU @ 3.60 GHz.
For the ownership check protocol, we measured time cost for various file sizes for and . As shown in Figure 8, the precomputation process for the ownership check does not depend on the file size and time cost was measured as constant. When , 1110 and 740, the time cost was measured to be approximately 2.2 ms, 1.1 ms and 0.7ms, respectively. For the integrity auditing protocol, we also measured the time cost for various file size when we take and . We assume that the integrity auditing protocol is executed more frequently than the ownership check protocol. Hence, we set to be larger than . As shown in Figure 9, the precomputation process for integrity auditing also does not depend on the file size and the time cost was measured as constant. When and 610, the time cost was measured to be approximately 4.7 ms and 3.1 ms, respectively. Note that if the number of precomputed responses (i.e., or ) increases, the time cost also increases linearly.


7. Conclusion
In order to comply with the three important requirements of cloud storage environments: data confidentiality, integrity, and storage efficiency, we proposed a secure and highly efficient Sec-DPoS scheme based on symmetric key cryptography. In our proposal, the scheme ensures data confidentiality with dictionary attack resilience and can efficiently audit integrity of outsourced data while the cloud server saves resources. By applying a bit-level challenge, we designed Sec-DPoS to perform efficiently, even for small data types. Moreover, we proved the security of Sec-DPoS in the random oracle model with the information theory, and experimental results show that Sec-DPoS has the highest efficiency compared with other schemes.
Data Availability
No data were used to support this study.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2016R1D1A1B03931071), Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government (MSIT) (No. B0717-16-0097, Development of V2X Service Integrated Security Technology for Autonomous Driving Vehicle), and Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean Government [18ZH1200, Core Technology Research on Trust Data Connectome].