FogDedupe: A Fog-Centric Deduplication Approach Using Multi-Key Homomorphic Encryption Technique

Yoosuf, Mohamed Sirajudeen; Muralidharan, C.; Shitharth, S.; Alghamdi, Mohammed; Maray, Mohammed; Rabie, Osama Bassam J.

doi:https://doi.org/10.1155/2022/6759875

Journal of Sensors

On this page

Abstract Introduction Related Works Implementation Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Special Issue

Advanced Machine Learning and Big Data Analytics with IoT Sensor Data

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 6759875 | https://doi.org/10.1155/2022/6759875

FogDedupe: A Fog-Centric Deduplication Approach Using Multi-Key Homomorphic Encryption Technique

Mohamed Sirajudeen Yoosuf,¹C. Muralidharan,²S. Shitharth,³Mohammed Alghamdi,^4,5Mohammed Maray,⁴and Osama Bassam J. Rabie⁶

Academic Editor: Sweta Bhattacharya

Received12 Jul 2022

Accepted09 Aug 2022

Published25 Aug 2022

Abstract

The advancements in communication technologies and a rapid increase in the usage of IoT devices have resulted in an increased data generation rate. Storing, managing, and processing large quantities of unstructured data generated by IoT devices remain a huge challenge to cloud service providers (CSP). To reduce the storage overhead, CSPs implement deduplication algorithms on the cloud storage servers. It identifies and eliminates the redundant data blocks. However, implementing post-progress deduplication schemes does not address the bandwidth issues. Also, existing convergent key-based deduplication schemes are highly vulnerable to confirmation of file attacks (CFA) and can leak confidential information. To overcome these issues, FogDedupe, a fog-centric deduplication framework, is proposed. It performs source-level deduplication on the fog nodes to reduce the bandwidth usage and post-progress deduplication to improve the cloud storage efficiency. To perform source-level deduplication, a distributed index table is created and maintained in the fog nodes, and post-progress deduplication is performed using a multi-key homomorphic encryption technique. To evaluate the proposed FogDedupe framework, a testbed environment is created using the open-source Eucalyptus v.4.2.0 software and fog project v1.5.9 package. The proposed scheme tightens the security against CFA attacks and improves the storage overhead by 27% and reduces the deduplication latency by 12%.

1. Introduction

Cloud computing is a technological revolution that has enabled service providers to deliver computing resources to their users through the internet. It provides easy, scalable access to the applications installed and managed in the cloud servers. The usage of cloud computing services is increasing exponentially [1]. According to the market research report from Cision 2020, the market size of cloud computing is expected to reach $832.1 billion by 2025 with a CAGR of 17.5%. On the other hand, the advancements in communication technologies and the increase in the usage of IoT devices have resulted in increased data growth. On average, 2.5 quintillion bytes of data are generated every day. As per the reports of International Data Corporation 2020, it is expected that in 2025, around 80 Zettabytes of data might be generated from IoT and smart devices [2].

Storing and managing a large quantity of data in the cloud storage servers degrades the performance of the applications that run on the cloud. It is important to address the performance degradation issues in cloud computing technology as the data are expected to grow exponentially in near future.

Storing multiple copies of the same data in the cloud storage servers is one of the reasons for performance degradation in cloud services. In 2017, Waterford Technologies, UK claimed that around 80% of data stored in the public cloud by corporate companies is redundant. Also, the MNC companies are incurring a 12% of revenue loss every year for storing redundant data in the cloud [3]. To address the performance degradation and to remove the redundant data from the cloud storage servers, deduplication schemes are executed. It identifies redundant data and eliminates it from cloud storage servers. Many cloud providers have implemented data deduplication algorithms in their cloud architecture to improve their performance. Data deduplication schemes ensure that only one copy of the data is stored in the cloud storage server. It identifies the redundant data and replaces them with a pointer to the original copy.

The deduplication techniques can be categorized into two types based on the location where the deduplication algorithm is executed. The first type is source-level deduplication where the redundancy check is performed before the data enters the cloud storage. It decreases the ingest rates of real-world data. The second type of deduplication is post-progress deduplication, in which the redundancy check is performed only after the data enters the cloud storage. To execute the post-progress deduplication algorithm, the cloud service provider must have enough space to store the full backup (unique as well as the duplicate copies) somewhere until the duplicate data is removed from the cloud storage servers.

Existing source-level deduplication schemes Hur et al. [4], Patgiri et al. [5], and Chhabraa et al. [6] use Bloom filter as an index table in the cloud servers to perform redundancy checks. Bloom filter is a space-efficient probabilistic data structure that helps in searching for a particular element from a large set. The applications of the Bloom filter in cloud computing are keyword search, retrieving the documents from cloud storage, and cache memory. However, the Bloom filter-based index tables cannot be implemented directly on the deduplication scheme as it has a high possibility of false-positive errors. Also, existing post-progress deduplication schemes Li et al. [7], Zhou et al. [8], Liu et al. [9], Liu et al. [10], and Shen et al. [11] use convergent key encryption methods to perform deduplication, in which the data blocks are hashed and the private keys for the encryption are derived from the message digest. However, the deterministic property of the hash function may produce the same private key and identical cipher text for the redundant data blocks. Later, by comparing the identical ciphertext, the redundant data are removed from the cloud storage servers. The convergent key-based encryption method is easy and efficient to perform post-progress deduplication. However, the convergent key-based encryption methods are highly vulnerable to confirmation of file attacks (CFA) and increase privacy and security issues. Also, the convergent key-based deduplication schemes become inefficient when the number of data blocks is high.

To reduce the bandwidth wastage in the source-level deduplication, and CFA security issues in the post-progress deduplication, a FogDedupe framework is proposed. Instead of performing source-level deduplication in the cloud servers, the proposed framework introduces a concept of fog-centric deduplication, which effectively reduces bandwidth wastage. Also, to perform post-progress deduplication, additive homomorphic encryption is proposed. The data owners use a multi-key homomorphic encryption algorithm to secure their data and that allows the cloud administrator to perform operations on the corresponding ciphertexts without compromising the security. The proposed multi-key homomorphic deduplication technique allows the DOs to use different private keys to encrypt the data.

1.1. Drawbacks of Existing Deduplication Schemes

The following drawbacks in the existing deduplication schemes have to be addressed to improve the security and performance of cloud services. (i)The existing convergent key-based encryption model has a high probability of information leakage as it is vulnerable to confirmation of file attacks (CFA)(ii)High probability of false-positive issues in source-level deduplication when the incoming data increases(iii)Wastage of network bandwidth in source-level deduplication, i.e., the redundancy check is performed only at the premises of the CSP. So, the redundant data and its attributes are transferred to the cloud server and that increases the communication overhead

1.2. Contributions

Our research work proposes a FogDedupe framework that executes source-level deduplication on the fog layer and post-progress deduplication on the cloud storage server. The contributions of the proposed FogDedupe frameworks are as follows: (i)FogDedupe framework implements both source-level and post-progress deduplication simultaneously to increase the performance of the cloud services(ii)The source-level deduplication is performed on the fog nodes which are placed near the cloud customers. It efficiently reduces bandwidth wastages(iii)To perform source-level deduplication, a distributed index table (DIT) is created based on the Bloom filter and master-slave protocol(iv)To perform post-progress deduplication on the cloud storage server, a multi-key homomorphic encryption method-based scheme is proposed. It efficiently overcomes the vulnerabilities of CFA attacks

The remaining section of the paper is structured as follows: Section 2 summarizes the related works on source level and post-progress deduplication. Section 3 explains the preliminaries about homomorphic encryption and additive homomorphic operations. Section 4 provides the proposed FogDedupe framework and multi-key homomorphic encryption. The proposed work is evaluated with a testbed environment and the results are presented in Section 5, and Section 6 concludes the paper.

For efficient storage utilization, many cloud service providers (CSP) such as IBM Cloud, Dropbox, Amazon Web Service, and Google Drive are using deduplication techniques in the cloud environment. This section explains the recent research works which are related to performing deduplication on cloud storage servers. There are two types of deduplication techniques as source-level and target-level deduplication.

2.1. Source Deduplication Technique

The source-level deduplication is mainly used to reduce the network traffic and bandwidth usage in large numbers. Only after performing a redundancy check using its hash values, the cloud users are allowed to transmit data blocks to cloud servers. Therefore, source deduplication has become very popular and unavoidable in cloud storage system management. To reduce network traffic, the popular cloud service providers (CSP) such as Wuala, Mozy, and Dropbox are using source-level deduplication. Some of the most popular source-level deduplication techniques are Veritas Symantec NetBackup and Amazon CommonVault.

Halevi et al. [12] have proposed source-level deduplication in the cloud storage system. Here, the data owner has to compute the hash value for each data block and sends it to the cloud server whenever the user wants to upload the data to the cloud storage. A hash table is maintained by the cloud server to store all the received data blocks and performs a redundancy check for the newly received data blocks. If there is no match found in the hash table, then the data blocks will be allowed to enter the cloud storage server. Else means that the data block is redundant. Here, the uses of hash values are as follows: (1) verify the redundancy of the data block by the cloud server and (2) act as a “proof of owner” (PoW) to the data owner. If the attacker intentionally or accidentally gains access to the hash value of the data blocks, then the attacker may claim ownership of the particular data. Internal adversarial attacks are possible as the cloud server maintains the hash value of all the user’s data blocks. The scalability feature is not used in a traditional hashing table method. The hash collision rate of the hash table will be increased when the user/data block increases, and it will provide erroneous (false positive) redundant results.

To overcome the hash-based proof of ownership security threat in Halevi’s source-level deduplication, Pietro et al. [13], have proposed an s-PoW (secure PoW) method which is based on a challenge-response scheme. Here, to prove the ownership of the data, the server challenges the cloud users, and the data owner responds with some particular bits of the requested file. This method fails to address the security threats related to internal adversary attacks and to support scalability. Blasco et al. [14] have introduced a POW verification scheme based on the Bloom filter called “bf-PoW.” It is more efficient than Halevi’s method and Pietro’s method. But this method is not assuring scalability in handling a very large volume of user data.

Zhong et al. [15] have implemented a convergent key-based proof of ownership in the cloud storage system which follows Douceur et al. [16]. To verify the ownership the cloud server uses a convergent key instead of using hash values as PoW, which is created from the hash values of the data block, a master key is used to encrypt the actual data blocks. In this method, two keys were used to protect the user data a convergent key (to verify PoW), and the other is the master key to encrypt the data blocks. This method used both convergent keys and master keys which are created by the cloud server. Here, internal adversarial attacks are possible.

Agarwala et al. [17] have implemented source-level deduplication for images using the DICE (dual integrity convergent key) protocol. In this method, message locked encryption is used to encrypt the images. This method implemented the DICE protocol on each data block instead of encrypting the image (message) as a single file; it is decomposed into several data blocks. Here, the common blocks between two or more images are stored only once at the cloud storage. Youn et al. [18] have introduced a variant of source-deduplication using CP-ABE (Cipher policy attribute–based encryption) [13] where authorized convergent encryption is formed from attribute-based encryption (ABE). It allows only authorized users to access data stored in the cloud. Both use a third-party authorization server to generate keys for the cloud users. Yoosuf et al. [19] proposed a dual auditing scheme and an inline deduplication scheme using Bloom filters.

2.2. Post-Progress Deduplication

Post-progress deduplication is introduced to reduce the workload on cloud users because the source-level deduplication gives an extra workload (hashing data blocks, communicating hash values, and ownership tags to cloud server) to the cloud users. Here, a cloud user is unaware of the deduplication process which is performed to attain maximum storage efficiency. The workload on the client-side in performing target-level deduplication is nil. Here, the storage efficiency is improved by target deduplication.

Bellari et al. [20, 21] have introduced the first target deduplication method called DupLESS architecture using the message locked encryption (MLE) technique. The private key is generated based on the data file (message) from the dedicated key-server which is received by the cloud users. A unique key is created by the MLE key generation algorithm for each message based on the content of the data. This key is used for data file encryption and mapping with a particular tag “T.” These tags are used for the file redundancy check, and the deduplication is performed on the storage server. The fixed and shorter keys are generated by the key server which avoids extra storage overhead. Bellari’s DupLESS architecture fails to address the solution for internal adversary attacks because the keys are generated by the key server (cloud key server or third party key server) from the content of data (message), which leads to the possibility of internal adversarial attacks. It also fails to support block-level deduplication and lacks security against brute force attacks.[22, 23].

Chen et al. [24] have modified the Bellari et al.’s [21] method and proposed a BL-MLE (block level-message locked encryption) to perform block-level deduplication for larger files in the cloud storage. This method addresses the issues which are block key management and proof of ownership in Bellari’s method. In the BL-MLE method, for any given input file, a master key, a single file tag, and a set of block-level keys are generated. These file tags and block tags are used to perform deduplication on the cloud storage system. Like the MLE method, BL-MLE also has a third-party key server, which creates a path to internal adversarial attacks.

Li et al. [7] have implemented a modified convergent key-based target deduplication, where the cloud user used a master key to encrypt the convergent key which is generated by the cloud server. The encrypted convergent keys are stored in the cloud storage. This modified technique uses a master-convergent key approach where an enormous number of keys are generated when the data blocks increase. The DeKey method is introduced to reduce the key size. It distributes the convergent key across multiple servers using the Ramp Secret Scheme (RSS) instead of the key managed by the user. It splits the secret key into “n” shares and distributes it to multiple servers, such that any “k” shares can recover the secret key. It is difficult to manage all the server’s keys. If a key of a data block is shared among “n” servers, then the complexity of handling and managing the key at those servers is increased. Communication between the servers on handling user keys is also increased, which leads to increased communication overhead [25–27]. The summary of the literature survey is presented in Table 1.

In all the previous deduplication methods, a dedicated key server is used to generate and manage keys for cloud users. Qi et al. [28] have implemented an encrypted deduplication scheme for multiple key servers. Liu et al. [10] have introduced the idea of target deduplication by performing an attribute-keyword search on the ciphertexts. The results are quite promising, but the computation overhead of searchable encryption is very high compared to the normal attribute-based encryption method. Here, outsourcing decryption is used to optimize the scheme. This searchable encryption-based deduplication scheme is implemented only for text documents.

2.2.1. Homomorphic Deduplication

Muguel et al. [29] have implemented the homomorphic operations on the encrypted ciphertext to identify the redundant data blocks in the cloud storage. This homomorphic-based deduplication is to overcome the convergent deduplication technique problems. This method deployed a dedicated key server at the premises of the cloud service provider called as HEDup (homomorphic encryption deduplication). The cloud user encrypts the data with the keys provided by the HEDup key server. Here, internal adversarial attacks are possible where the keys are generated by the key-server present on the CSP. It also has large storage and latency overhead in maintaining the ciphertext.

Liu et al. [30] have introduced searching on encrypted data. The traditional encryption method will not allow the user or CSP to perform any kind of operation on the ciphertext. But homomorphic encryption technique allows the CSP to search, add, and multiply (somewhat homomorphic encryption/partial homomorphic encryption) the ciphertext. It uses searchable homomorphic encryption with tags and matching keywords used to perform deduplication. Youn et al. [31] used a challenge-response protocol and a third-party auditor to ensure the security of the entire system. To perform challenge-response protocols, a homomorphic linear authenticator is created based on the BLS signature [17].

3. Preliminaries: Homomorphic Encryption

Homomorphic encryption (HE) is a technique, where computational operations are carried by cloud service providers on top of the ciphertext without modifying the data format or compromising the security of the user data. A function between two groups is homomorphic when . . Here, is a function, which takes the input from a group and performs an operation (addition and multiplication) to map with the other set.

Implementing homomorphic applications on cloud storage is a time-consuming process, but it ensures the security of the user data in the cloud environment. It also allows the cloud service provider to perform computations on the ciphertext. Rivest et al. (1976) and Rivest et al. (1978) implemented the first practical homomorphic encryption (RSA algorithm) in 1976 [32, 33]. But in the early 1980s, the computation power of the servers and systems was not capable of performing homomorphic encryption. The improvement in the computation power is achieved by homomorphic operations perform in cloud storage. In 2009, Gentry [34] has implemented fully homomorphic encryption on cloud storage based on ideal lattices. After the successful implementation of Craig Gentry’s (Stanford Ph.D. thesis 2009) work, homomorphic operations have become an important, futuristic technique in cloud computing. Some of the recent works on homomorphic encryptions are Cominetti et al. [35], Chou et al. [36], and Turan et al. [37].

3.1. Additive Homomorphic Encryption

An encryption scheme is called additive homomorphic encryption, if and only if, , where is encryption and is set of all possible messages. To develop a practical additive homomorphic encryption (PHE), additive or multiplicative functions are the only options to perform a homomorphic operation on top of the encrypted data because any Boolean circuit can be designed only through the XOR and NAND gate, where XOR performs the addition and NAND performs the multiplication. Examples of additive homomorphic encryptions are Pallier’s encryption [38] and Elgamal encryption [39] in which the plaintexts are encoded in the exponents.

4. Problem Statement

As discussed in the related works, the existing deduplication schemes have three major challenges, such as (i)Performing source-level deduplication on the cloud storage server results in increased network bandwidth wastage(ii)Due to the inability of scaling the index size, the false-positive errors are more in the source-level deduplication(iii)The convergent key-based deduplication models have a high probability of information leakage and are vulnerable to confirmation of file attacks (CFA)

To overcome these issues, the proposed FogDedupe framework implements source-level and post-progress deduplication simultaneously. To reduce bandwidth wastage, the source-level deduplication is performed on the fog nodes that are kept closer to the data owners. Also, a Bloom filter-based distributed index table (DIT) is created and managed in the fog layer to perform source-level deduplication. It uses the master-slave protocol to frequently update the index table. In addition, a multi-key homomorphic encryption method-based scheme is proposed to perform post-progress deduplication which efficiently overcomes the vulnerabilities against CFA attacks.

5. FogDedupe Framework

The proposed FogDedupe framework performs both source-level and post-progress deduplication. The entities that are involved in the proposed deduplication scheme are (i) data owners, (ii) fog layers and fog nodes, and (iii) cloud service providers (CSP). Figure 1 describes the overall design and the entities of the proposed FogDedupe deduplication framework.

Data owners (DO) are the one who creates the data and uploads it to the cloud storage. The data owners are accountable and eligible to decide who can access the information stored in the cloud within their functional limits. To retrieve the data faster and to perform source-level deduplication, the data owner hashes the data blocks with one-way hash functions and sends the message digest to the nearby fog nodes [40, 41]. Also, the data owner creates public and private keys and encrypts the data blocks using the homomorphic encryption technique. Later, it sends the cipher text to the cloud storage servers along with the non-redundant tag (NR_T) generated from source-level deduplication.

Fog layer is a cloud entity that acts as an intermediate layer between the DO and CSP. It consists of several fog nodes that are geographically dispersed and kept closer to the data owners. Also, a dynamically scalable distributed index table (DIT) is created and managed in the fog layer. Upon receiving a source-level deduplication request from the DO, the fog node verifies the index and creates tags. If the data block is unique, the fog node creates a nonredundant tag () and sends it to the DO. If it is the redundant block, the fog node prohibits uploading the data block to the cloud storage server.

Cloud service provider (CSP) is the one who provides computing (hardware and software) services to cloud users. CSP has an unlimited resource capacity to store and process the uploaded data. It performs three major tasks: (i)Verifies whether the ciphertext has authentic tags or not(ii)Stores the ciphertext in the cloud storage server(iii)Frequently performs target-level deduplication on the cloud storage servers [23]

5.1. Generating Partial Hash Values () for Data Blocks

Initially, the data owner fragments the large size of data into several data chunks each with the size of 1024 KB. Later, for each data chunk, the data owner generates the partial hash . The data owner hashes the data chunks using number of collision-resistant one-way hash functions (HF). As a result, each hash function generates a corresponding digest of size bits. In the existing inline deduplication schemes, the data owner must share the entire bit of message digests to the CSP to verify the non-redundancy in the index table. However, it increases the possibility of confirmation of file attacks. To overcome this issue, the proposed fog-centric inline deduplication scheme uses a partial hash value instead of the entire message digest. Algorithm 1 explains the process of generating partial hash values for the data chunks.

Input: Data chunks DC _{i =}1_{, 2…m} (each with the size of 1024KB)
Output: Partial hash values (pα)
Begin:
1. Data owner HASHES the data chunks (DC _{i=1, 2…m}) using ‘n’ number of hash
functions.
2. Data owners stores resultant message digests (hash values) of the data chunks
in their local storage.
3. Generate a partial hash value
Begin
a. Divide the hash values into ‘p’ partitions (i.e.) (L bits of message
digest / ‘p’ partitions).
b. To derive the partial hash values ‘pα’, choose odd partitions from
the message digest and take the even bits from each partition.
c. Combines the even bits from each odd partition and create a
partial hash value ‘pα’ for each data chunk.
d. Creates chunk id (C_id), file id (F_id) for partial hash values.
End
4. Transfer the chunk id (C_id), file id (F_id), and its corresponding partial hash
values ‘pα’ to the nearby fog nodes.

The data owner creates partial hash values for each data chunk and sends them to the nearby fog node. Sending redundant ciphertext directly to the cloud storage increases the communication overhead as well as the storage overhead. To reduce the wastage of bandwidth the proposed framework directs the data owners to send the computed partial hash values () to the nearby fog node. Using the distributed index table, the fog node verifies the incoming data chunk and generates a tag for each data chunk. If a data block is non-redundant, then the fog node creates a non-redundant tag () and sends it to the corresponding data owner. If the data block is redundant, a redundant tag () is sent to the data owner. Upon uploading the ciphertext of the data block, the tags have to be sent along with it. Algorithm 2 explains the source-level deduplication on the fog layer.

Input: Partial hash values (pα)
Output: Redundancy verification results.
Begin
1. Fog node receives the chunk id (C_id), file id (F_id), and hash values (pα)
from the data owners.
2. Fog node identifies the location of each data chunk and stored it in the
DIT.
3. Redundancy verification (incoming data chunks)
If (all corresponding bit position of pα ==1)
a. Data chunk ← Duplicate data chunk
b. Create a tag to represent that the corresponding ciphertext of the
data chunk is duplicate and the fog node prohibits the data owner
from uploading redundant data to the cloud servers.
c. Sends tag to the data owner.
Else (bit position (SIT) ==0 or bit position (SIT) == at least one 0)
a. Data chunk ← Non-duplicated data chunk
b. Calculate the percentage of non-zero hash bits in the index table.
c. Creates tag for the non-duplicated data chunk
d. Sends the non-duplicated tag to the data owner.
e. Sends the percentage of non-zero hash bits to the cloud admin.
End

After receiving the partial hash values, the fog node calculates its corresponding hash bits and stores it in the distributed index table (DIT). The values in the DIT will either be 0 or 1. Here, 0 represents that no data chunks have been accommodated in the particular location, and 1 represents that the location is already accommodated.

Consider that the data owner uses three hash functions and creates three partial hash values for a data chunk. If all the three corresponding bits of partial hash values () is 1, then it is determined as a replicated data chunk. If any two of three (2/3) corresponding hash bits in the index table are 1 and one of three (1/3) hash bits are 0, then also it will be considered as a non-duplicated data chunk. Though it is considered a non-duplicated data chunk, there is a possibility for redundancy. To monitor these high-risk data chunks, the fog node calculates the percentage of corresponding non-zero hash bits in the index table and sends them to the cloud administrator very often.

5.2. Distributed Index Table

Managing a standard bloom filter in the fog nodes to perform source-level deduplication results in increased false-positive errors as it follows a one-dimensional data structure. Also, the standard bloom filters do not support the scalability feature. However, the velocity of incoming data is very high in cloud services. So, managing a standard bloom filter in the fog nodes might result in a bottleneck situation.

To reduce the false-positive errors in identifying duplicate data chunks in the fog nodes, the proposed scheme creates a two-dimensional scalable index table that is distributed among all the fog nodes in the fog layer. When a request is received from the data owner to perform source-level deduplication, the fog node immediately accesses the distributed index table and sends the response as a tag to the DOs. Determining the initial size of the scalable index table in the fog node, the following formula is used:

The initial size of the index table is determined based on the data chunks received at a particular period. Later, based on the velocity of incoming data chunks, the size of the index table is increased.

5.3. Scalability of the Index Table

The proposed distributed index table (DIT) is capable of scaling its size from to when the velocity of the data increases. The fog nodes continuously monitor the remaining available free slots in the index, i.e., the number of 0 s in the index table is continuously monitored. If the non-accommodated slots in the index table go below a certain limit, then a large index table is generated on the same fog node. If the velocity of incoming data is less, then the newly generated index table is two times larger than the old index table and four times larger if the velocity is high. The false-positive rate of the newly created index table is always lesser than the old index table as it has a larger size. Algorithm 3 discusses the scalability of distributed index table in the fog nodes.

Input: Growth rate, false-positive rate, and the number of hash functions used.
Output: Creating a new index table with increased size.
Begin
1. Find fill ratio of existing DIT,
2. If (fill ratio > =50)
Then
If (growth rate == High)
Then
Create a new index table with a size of 200% of mn
Insert stream bits to the new index table
Else
Create a new index table with a size of 100% of mn
Insert stream bits to the index table
Else
Return null and do not create a new index table
End

5.4. Updating Distributed Index Table Using Master-Slave Protocol

To perform source-level deduplication, a distributed index table is introduced in the FogDedupe architecture. Yoosuf et al. [42] suggested the source-level deduplication in the fog node. However, the index tables were managed in the fog nodes themselves, and it has a high risk of unavailability. To overcome this issue, the proposed FogDedupe framework introduces a distributed index table (DIT), where the same copy of the index table is present in all the fog nodes of the cluster. Using distributed index table (DIT) in the fog layers efficiently overcomes the unavailability issue. Figure 2 depicts the workflow of the proposed distributed index table, and Algorithm 4 explains the process of dynamically updating the records in the DIT.

Input: Computation workload of each fog node
Output: Synchronized Distributed Index Table
Begin
1. Compute the workload of each fog node.
2. Pick the fog node which has the least computation overload in the last few hours
and make it a master fog node (MF).
3. The remaining nodes in the cluster are considered as slave nodes (SF)
3. MF node requests the SF nodes to send the current index table along with the time
stamp (T_s).
4. MF node consolidates the values received from the SF nodes and updates the
Global Distributed Index Table present in it.
5. After updating the Global DIT, MF sends an update alert (U_A) message to all the
SF nodes present in the cluster.
6. The SF downloads the recent update in the Global DIT and synchronizes with the
local DIT.
End

5.5. Multi-Key Homomorphic Encryption–Based Target Deduplication

After receiving the non-redundant tag () from fog nodes, the DO encrypts the data chunks using the additive homomorphic encryption method. Existing homomorphic encryption methods can perform computational operations on top of the ciphertext only if the data blocks are executed using the same public and private keys. It opens a path to information leakage and makes the ciphertext vulnerable to confirmation of file attacks (CFA). To overcome this issue, the proposed deduplication scheme allows the data owners to encrypt the data blocks using different ciphertexts.

In the key generation model, the data owner creates two vector keys ( and) in which is used as a private key and is used as an offset value to perform an additive homomorphic operation by the cloud service provider. After creating the keys, the data owner uses a one-time pad encryption method to encrypt the data blocks. To perform an additive operation on the ciphertext “modulo n” operation is used. However, the computation overhead of executing homomorphic encryption-based deduplication remains high in the cloud environment. Continuously performing target deduplication in cloud storage servers is impractical and may lead to an extensive workload for the cloud service provider. So, in the proposed target deduplication method, the CSP calculates the high-risk data blocks, i.e., the data chunks that have a high possibility to be redundant are calculated, and only for those data chunks, the target deduplication is performed. The high-risk data blocks are identified from the information received from the fog nodes. Algorithm 5 explains the proposed multi-key homomorphic encryption algorithm to encrypt the non-redundant data block.

Input: Input data blocks each with the size of 1024KB; Non-Redundant Tag (NR_T)
Output: Detecting the duplicate cipher chunks in the cloud storage.
Data Owner
Begin
1. The data owner receives the Non-Redundant Tag (NR_T) from the nearby fog node.
2. Data owners generate keys to encrypt the data chunk
a. Data owner generates the value ‘n’ using Pseudo-Random Number
Generator (PRNG), Here ‘n’ defines the message space and keyspace Z_n.
b. Data owner generates two random vectors ( and). Here, (a₀, a₁ …
a_x-1) is used to represent the input message (m) and (b₀, b₁ … b_x-1) is
used to represent the homomorphic output vector.
c. Data owners keep for decryption (i.e. Private Key) and sends to the
Cloud Service Provider to perform additive homomorphic encryption on
the ciphertext.
3. Encrypts the data chunks using the private keys ()
a. Generate a random number r ∈ Z_n.
b. To encrypt a single data chunk ‘m’, Enc (m, r) = (m + r) mod n is used
whereas to encrypt an array of input data chunks, Ciphertext () = Enc
() is used.
c. Ciphertext () is transferred to the server along with homomorphic vector
(b₀, b₁ … b_x-1).
End
Cloud Service Provider
Begin
1. CSP collects the index table information from the fog nodes.
2. Receives the information about the percentage of non-zero bit in the index table
for each data chunk from the fog nodes.
3. Receives the ciphertext from the data owner along with vector (b₀, b₁ … b_x-1) and stores it as, (cipher text, offset) pair (i.e.) (c_i, b_i).
4. From the information gathered from Step. 2 (CSP side), Cloud Service Provider
identifies the high risk (Possibility of redundancy) data chunks.
5. Additive Homomorphic operations are performed on high risk data chunks.
a. Consider any two key value pair (t₁ and t₂) with high risk where t₁ = (c₁,
o₁) and t_{2 =} (c₂, o₂), then the redundancy check can be performed using
additive homomorphic encryption by calculating, t₁ + t₁ = (c₁ + c₂; o₁ +
o₂) == t₂ + t₂ = (c₁ + c₂; o₁ + o₂) (this could be performed ‘n’ time for
more accurate results)
6. If the values are the same, then the data chunks are decided to be redundant data
chunks. So, the duplicate data block is removed.
End

The proposed deduplication scheme uses an additive homomorphic encryption-based algorithm to perform deduplication. It effectively identifies the redundant block in the cloud storage servers by comparing the corresponding ciphertext. However, executing an additive homomorphic operation on the ciphertext stored in the cloud puts an extra workload on the CSP.

6. Performance Evaluation

The prime objectives of the proposed FogDedupe frameworks are (1) reducing false-positive errors in the index table and (2) improving the security against confirmation of fiae Attacks (CFA).

6.1. False-Positive Error Rate

The false-positive error in the proposed fog-centric inline deduplication refers to a situation in which an element points to a particular location as that of another element in the index table. The probability of false-positive error in the proposed index table is

Here, represents the collision in the index table and is the factor of collision. The false-positive rate of the fog-centric inline deduplication highly depends on the number of collisions that happens in the index table. Since the proposed index table is capable of scaling its size, the probability of hash collision is always lesser than the standard Bloom filter-based index tables. Table 2 depicts the false-positive error rate in the distributed hash table.

The probability of false-positive errors ranges between 0.002 and 0.004. It is very low compared to the standard bloom filter. Table 3 shows the probability of the false-positive errors after scaling the size of the index table.

The probability of false-positive error after scaling the size of the index table is always lesser than the initial size of the index table. So, the proposed fog-centric inline deduplication can effectively be performed even if the velocity of the data increases.

6.2. Security Analysis of the Proposed Scheme

The existing deduplication algorithm uses the entire message digest of the data block to generate the encryption keys. It makes the existing schemes vulnerable to the confirmation of file attacks (CFA). To overcome this issue, instead of using the message digest of the data blocks to generate keys, the proposed scheme derives the partial hash values, i.e., bits of message digest/ partitions from the message digest are sent to the fog nodes. Sharing the partial hash values to the fog node allows them to perform source deduplication using the scalable index table and tightens the security. Even if the partial hash values get leaked from the fog node, no intruder can match the remaining hash bits. Let us consider a data block of size 1024 KB is hashed and produces 256 hash bits. Here, instead of sending the entire 256 bits to the fog node, the proposed deduplication method derives partial hash values () from 256 bits, i.e., of bits are shared to the fog nodes. From these partial hash bits, it is impossible to perform confirmation of file attacks. Moreover, the proposed system uses hash functions to derive the partial hash values. So it makes it more secured against hash collision attacks.

On the other hand, the proposed multi-key additive homomorphic encryption allows the CSP to execute computational functions on top of the ciphertext stored in the cloud storage servers. Though it increases the computational overhead, it also tightens the security of the data in the aspect of both internal and external attacks.

7. Implementation and Result Discussion

To assess the performance of the proposed FogDedupe framework, an open source Eucalyptus software is installed on an Intel Xeon E5 2620 server that has a processing speed of 2.1GHZ with 64GB RAM. Eucalyptus-based private cloud setup consists of cloud controller (CLC), cluster controller (CC), and walrus (W). The cloud controller is responsible for performing administrative operations of the CSP. The cluster controller controls the cluster nodes connected to the main cloud server. Two personal computers with Intel i5 -7^th gen processor and 8GB RAM are used to create the cluster nodes. The walrus represents the storage servers of the Eucalyptus private cloud. A total of 4 TB storage space with RAID 5 configuration is used as a storage server. The Eucalyptus open-source software is compatible with Amazon AWS and well-suitable for evaluating fog-based source-level deduplication. Also, the fog nodes are created between cloud storage servers and the DOs by installing FOG project v1.5.9 Intel i5 -7^th gen processor with 8GB RAM.

The data chunking process and the generation of partial hash values for the data chunks are carried out by the data owner. Operations like data chunking, key generation, encryption, and creation of partial hash values were written in the python programming language. An open-source mhealth (mobile health) dataset from UCI repository is used to assess the proposed fog-centric deduplication scheme. It comprises 172,824 IoT healthcare sensor values.

The prime objective of the proposed work is to reduce the communication overheads and improve storage efficiency. To assess the proposed scheme, the communication overhead and computation overhead on the fog nodes to perform inline deduplication and the computation overhead on the CSP to execute additive homomorphic operations on the ciphertext stored in the cloud are measured. Also, the redundancy elimination ratio of the proposed scheme is compared with BL-MLE, DupLESS, Youn et al. [18] deduplication schemes.

7.1. Communication Overhead

Two different scenarios are considered to measure the communication overhead for performing inline deduplication. (a) The data owner uploads the ciphertexts of the data chunks directly to the cloud, i.e., no fog layers are formed to perform inline deduplication. (b) The data owner verifies the redundancy of the data chunks in fog nodes first and then sends the ciphertext to the cloud.

Introducing a fog layer between DOs, the cloud effectively reduces the communication delay to a maximum of 60%, because the proposed FogDedupe source-level deduplication framework prohibits DOs to upload redundant ciphertext to the cloud storage servers directly.

The fog nodes are usually kept near to the DOs to quickly perform source-level deduplication, and the ciphertext of the DOs data is stored in the Amazon AWS-walrus storage. Table 4 shows the communication time required to access the fog node and the walrus storage. Results show that introducing fog nodes in the cloud services reduces the communication overhead by a maximum of 60%.

The communication overhead of the data owner transferring the to the fog nodes and uploading the ciphertext to the cloud storage is measured and shown in Figure 3.

7.2. Computation Overhead on the Fog Nodes

Fog nodes receive the hash value from the data owners and compute its corresponding locations in the distributed index table (DIT). In the proposed deduplication scheme, fog nodes do not generate any keys on behalf of the data owner; rather, it simply calculates the bit position from the partial hash values and changes the values from 0 to 1. However, the existing deduplication schemes use a separate key server on the third parties to create convergent keys to encrypt the chunks.

The computation overhead of the proposed deduplication scheme in the fog node is much lesser than the existing methods. Figure 4 compares the computation time required in the fog nodes to execute the proposed inline deduplication with existing key server deduplication schemes.

To evaluate the proposed scheme, 25,000 IoT healthcare IoT sensor values are randomly chosen. The data owner hashes each sensor value and creates . Later, they are communicated to the nearby fog node. The is stored in the DIT, and the fog node dynamically performs source-level deduplication. The deduplication ratio of the proposed source-level deduplication is calculated using the following formula:

To assess the performance of the proposed scheme, two different scenarios are considered. (i) The randomly selected 25,000 data chunks from the mhealth dataset are encrypted, and the ciphertext is uploaded to the walrus directly by the DO, i.e., no deduplication algorithm is executed. (ii) The randomly selected 25,000 sensor values are uploaded to the cloud storage after performing fog-centric source level deduplication.

The storage utilization of the cloud storage server is calculated for both scenarios. After implementing the fog-centric source-level deduplication, the storage efficiency is increased up to 30%. Figure 5 shows the improvements in the storage efficiency after implementing the deduplication scheme.

The redundancy elimination ratio is calculated using the following formula:

Figure 6, compares the ratio of duplicate data that are removed by BL-MLE, DupLESS, Zhen et al. 2017, Youn et al. 2019, and the proposed fog-centric inline deduplication. The redundancy elimination ratio of the proposed deduplication scheme ranges from 0.84 to 0.93. It reduces the storage overhead of the cloud by approximately 30% by maintaining the redundancy elimination ratio between 0.85 and 0.94.

8. Conclusion and Future Works

To reduce the network bandwidth wastage and to overcome the false-positive errors in the source-level deduplication, a FogDedupe framework is proposed. It performs both source-level deduplication and post-progress deduplication to improve the efficiency of the cloud service and storage servers. To perform source-level deduplication a fog layer, consisting of “n” number of fog nodes is introduced. Also, a distributed index table is created and managed on the fog layer. The fog nodes present on the same cluster use master-slave protocol to update the index values. The scalability feature in the proposed distributed index table effectively reduces the false positive errors in the source-level deduplication. Likewise, to perform target-level deduplication, a multi-key additive homomorphic method is introduced. Though the post-progress deduplication takes a slightly larger time to identify the duplicate blocks, it overcomes the security threats raised by the CFA attacks effectively. In the future, instead of using additive homomorphic operations, fully homomorphic operations are planned to be implemented on the cloud storage server to perform post-progress deduplication.

Data Availability

The data that support the findings of this study are available from the corresponding author, upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

B. Alouffi, M. Hasnain, A. Alharbi, W. Alosaimi, H. Alyami, and M. Ayaz, “A systematic literature review on cloud computing security: threats and mitigation strategies,” IEEE Access, vol. 9, pp. 57792–57807, 2021.
View at: Publisher Site | Google Scholar
IDC Data Age, “Whitepaper,” 2025, Available from, Error! Hyperlink reference not valid [Accessed on July 2022].
View at: Google Scholar
“Newspaper Article,” Available from, https://waterfordtechnologies.com/banish-your-redundant-data-like-st-patrick-banished-snakes-out-of-ireland/ , [Accessed on July 2022].
View at: Google Scholar
J. Hur, D. Koo, Y. Shin, and K. Kang, “Secure data deduplication with dynamic ownership management in cloud storage,” IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 11, pp. 3113–3125, 2016.
View at: Publisher Site | Google Scholar
R. Patgiri, D. Dev, and A. Ahmed, “dMDS: uncover the hidden issues of metadata server design,” Progress in Intelligent Computing Techniques: Theory, Practice, and Applications, Springer, Singapore, pp. 531–541, 2018.
View at: Publisher Site | Google Scholar
N. Chhabraa and M. Balab, “An optimized data duplication strategy for cloud computing: Dedup with ABE and bloom filters,” International Journal of Future Generation Communication and Networking, vol. 13, no. 1, pp. 824–834, 2020.
View at: Google Scholar
J. Li, X. Chen, M. Li, J. Li, P. P. C. Lee, and W. Lou, “Secure deduplication with efficient and reliable convergent key management,” IEEE transactions on parallel and distributed systems, vol. 25, no. 6, pp. 1615–1625, 2013.
View at: Google Scholar
Y. Zhou, D. Feng, W. Xia et al., Eds.“SecDep: a user-aware efficient fine-grained secure deduplication scheme with multi-level key management,” in 2015 31st Symposium on Mass Storage Systems and Technologies (MSST), Y. Zhou, D. Feng, W. Xia et al., Eds., pp. 1–14, Santa Clara, CA, USA, 2015.
View at: Google Scholar
M. Liu, C. Yang, Q. Jiang, X. Chen, J. Ma, and J. Ren, Eds.“Updatable block-level deduplication with dynamic ownership management on encrypted data,” in 2018 IEEE International Conference on Communications (ICC), M. Liu, C. Yang, Q. Jiang, X. Chen, J. Ma, and J. Ren, Eds., pp. 1–7, Kansas City, MO, USA, 2018.
View at: Google Scholar
X. Liu, L. Tingting, X. He, X. Yang, and S. Niu, “Verifiable attribute-based keyword search over encrypted cloud data supporting data deduplication,” IEEE Access, vol. 8, pp. 52062–52074, 2020.
View at: Publisher Site | Google Scholar
W. Shen, S. Ye, and R. Hao, “Lightweight cloud storage auditing with deduplication supporting strong privacy protection,” IEEE Access, vol. 8, pp. 44359–44372, 2020.
View at: Publisher Site | Google Scholar
S. Halevi, D. Harnik, B. Pinkas, and A. Shulman-Peleg, “Proofs of ownership in remote storage systems,” in In Proceedings of the 18th ACM conference on Computer and communications security, pp. 491–500, Chicago, IL, USA, 2011.
View at: Google Scholar
R. Di Pietro and A. Sorniotti, “Boosting efficiency and security in proof of ownership for deduplication,” in In Proceedings of the 7th ACM symposium on information, computer and communications security, pp. 81-82, Seoul, Korea, 2012.
View at: Publisher Site | Google Scholar
J. Blasco, R. Di Pietro, A. Orfila, and A. Sorniotti, “A tunable proof of ownership scheme for deduplication using bloom filters,” in 2014 IEEE Conference on Communications and Network Security, pp. 481–489, San Francisco, CA, USA, 2014.
View at: Publisher Site | Google Scholar
W. Zhong and Z. Liu, “Proof of cipher text ownership based on convergence encryption,” in In AIP Conference Proceedings, vol. 1864, no. 1, Melville, NY 1147-4300, USA, 2017.
View at: Publisher Site | Google Scholar
J. R. Douceur, A. Adya, W. J. Bolosky, P. Simon, and M. Theimer, “Reclaiming space from duplicate files in a serverless distributed file system,” in Proceedings 22nd International Conference on Distributed Computing Systems, pp. 617–624, Vienna, Austria, 2002.
View at: Publisher Site | Google Scholar
A. Agarwala, P. Singh, and P. K. Atrey, “Client side secure image deduplication using DICE protocol,” in 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 412–417, Miami, FL, USA, 2018.
View at: Google Scholar
T.-Y. Youn, N.-S. Jho, K. H. Rhee, and S. U. Shin, “Authorized client-side deduplication using CP-ABE in cloud storage,” Wireless Communications and Mobile Computing, vol. 2019, Article ID 7840917, 11 pages, 2019.
View at: Publisher Site | Google Scholar
M. S. Yoosuf and R. Anitha, “LDuAP: lightweight dual auditing protocol to verify data integrity in cloud storage servers,” Computing, vol. 13, no. 8, pp. 3787–3805, 2022.
View at: Publisher Site | Google Scholar
M. Bellare, S. Keelveedhi, and T. Ristenpart, “Message-locked encryption and secure deduplication,” in Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 296–312, Berlin, Heidelberg, 2013.
View at: Publisher Site | Google Scholar
S. Keelveedhi, M. Bellare, and T. Ristenpart, “DupLESS: Server-Aided Encryption for Deduplicated Storage,” in In 22nd USENIX security symposium (USENIX security 13), pp. 179–194, Washigton DC, 2013.
View at: Google Scholar
K. K. Chennam, R. Aluvalu, and S. Shitharth, “An authentication model with high security for cloud database,” Architectural Wireless Networks Solutions and Security Issues, Springer, Singapore, pp. 13–25, 2021.
View at: Publisher Site | Google Scholar
B. T. Devi, S. Shitharth, and M. A. Jabbar, “An appraisal over intrusion detection systems in cloud computing security attacks,” in 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), pp. 722–727, Bangalore, India, 2020.
View at: Publisher Site | Google Scholar
R. Chen, M. Yi, G. Yang, and F. Guo, “BL-MLE: block-level message-locked encryption for secure large file deduplication,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 12, pp. 2643–2652, 2015.
View at: Publisher Site | Google Scholar
T. R. Gadekallu, Q. V. Pham, D. C. Nguyen et al., “Blockchain for edge of things: applications, opportunities, and challenges,” IEEE Internet of Things Journal, vol. 9, no. 2, pp. 964–988, 2021.
View at: Google Scholar
B. Prabadevi, N. Deepa, Q.-V. Pham et al., “Toward blockchain for edge-of-things: a new paradigm, opportunities, and future directions,” IEEE Internet of Things Magazine, vol. 4, no. 2, pp. 102–108, 2021.
View at: Publisher Site | Google Scholar
M. Liyanage, Q. V. Pham, K. Dev et al., “A survey on Zero touch network and Service Management (ZSM) for 5G and beyond networks,” Journal of Network and Computer Applications, vol. 203, article 103362, 2022.
View at: Publisher Site | Google Scholar
H. Qi, Y. Han, X. Di, and F. Sun, “Secure data deduplication scheme based on distributed random key in integrated networks,” in In 2017 3rd IEEE International Conference on Computer and Communications (ICCC), pp. 1308–1312, Chengdu, China, 2017.
View at: Google Scholar
R. Miguel and K. M. M. Aung, “Hedup: secure deduplication with homomorphic encryption,” in 2015 IEEE International Conference on Networking, Architecture and Storage (NAS), pp. 215–223, Boston, MA, USA, 2015.
View at: Publisher Site | Google Scholar
L. Zhenhua, K. Yaqian, L. Chen, and F. Yaqing, “Hybrid cloud approach for block-level deduplication and searchable encryption in large universe,” The Journal of China Universities of Posts and Telecommunications, vol. 24, no. 5, pp. 23–34, 2017.
View at: Publisher Site | Google Scholar
T.-Y. Youn, K.-Y. Chang, K.-H. Rhee, and S. U. Shin, “Efficient client-side deduplication of encrypted data with public auditing in cloud storage,” Access, vol. 6, pp. 26578–26587, 2018.
View at: Publisher Site | Google Scholar
R. L. Rivest, L. Adleman, and M. L. Dertouzos, “On data banks and privacy homomorphisms,” Foundations of secure computation, vol. 4, no. 11, pp. 169–180, 1978.
View at: Google Scholar
R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and public-key cryptosystems,” Communications of the ACM, vol. 21, no. 2, pp. 120–126, 1978.
View at: Publisher Site | Google Scholar
C. Gentry, A Fully Homomorphic Encryption Scheme[Ph.D. thesis], Stanford University, California, USA, 2009.
E. L. Cominetti and M. A. Simplicio, “Fast additive partially homomorphic encryption from the approximate common divisor problem,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 2988–2998, 2020.
View at: Publisher Site | Google Scholar
E. J. Chou, A. Gururajan, K. Laine, N. K. Goel, A. Bertiger, and J. W. Stokes, “Privacy-preserving phishing web page classification via fully homomorphic encryption,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2792–2796, Barcelona, Spain, 2020.
View at: Publisher Site | Google Scholar
F. Turan, S. S. Roy, and I. Verbauwhede, “HEAWS: an accelerator for homomorphic encryption on the Amazon AWS FPGA,” IEEE Transactions on Computers, vol. 69, no. 8, pp. 1–1196, 2020.
View at: Publisher Site | Google Scholar
P. Paillier, “Public-key cryptosystems based on composite degree residuosity classes,” in In International conference on the theory and applications of cryptographic techniques, pp. 223–238, Berlin, Heidelberg, 1999.
View at: Publisher Site | Google Scholar
T. ElGamal, “A public key cryptosystem and a signature scheme based on discrete logarithms,” IEEE Transactions on Information Theory, vol. 31, no. 4, pp. 469–472, 1985.
View at: Publisher Site | Google Scholar
B. C. Gajarla, A. V. Rebba, K. S. Kakathota, M. Kummari, and S. Shitharth, “Handling tactful data in cloud using PKG encryption technique,” in 4th Smart Cities Symposium (SCS 2021), pp. 338–343, Barcelona, Spain, 2021.
View at: Google Scholar
R. Aluvalu, V. Uma Maheswari, K. K. Chennam, and S. Shitharth, “Data security in cloud computing using Abe-based access control,” Architectural Wireless Networks Solutions and Security Issues, Springer, Singapore, pp. 47–61, 2021.
View at: Publisher Site | Google Scholar
M. S. Yoosuf and R. Anitha, “Low latency fog-centric deduplication approach to reduce IoT healthcare data redundancy,” Wireless Personal Communications, vol. 122, pp. 1–23, 2022.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Mohamed Sirajudeen Yoosuf et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Journal of Sensors

Advanced Machine Learning and Big Data Analytics with IoT Sensor Data

FogDedupe: A Fog-Centric Deduplication Approach Using Multi-Key Homomorphic Encryption Technique

Abstract

1. Introduction

1.1. Drawbacks of Existing Deduplication Schemes

1.2. Contributions

2. Related Works

2.1. Source Deduplication Technique

2.2. Post-Progress Deduplication

2.2.1. Homomorphic Deduplication

3. Preliminaries: Homomorphic Encryption

3.1. Additive Homomorphic Encryption

4. Problem Statement

5. FogDedupe Framework

5.1. Generating Partial Hash Values () for Data Blocks

5.2. Distributed Index Table

5.3. Scalability of the Index Table

5.4. Updating Distributed Index Table Using Master-Slave Protocol

5.5. Multi-Key Homomorphic Encryption–Based Target Deduplication

6. Performance Evaluation

6.1. False-Positive Error Rate

6.2. Security Analysis of the Proposed Scheme

7. Implementation and Result Discussion

7.1. Communication Overhead

7.2. Computation Overhead on the Fog Nodes

8. Conclusion and Future Works

Data Availability

Conflicts of Interest

References

Copyright