Privacy-Preserving KNN Classification Algorithm for Smart Grid

Song, Zhuhuan; Ren, Yanli; He, Gang

doi:https://doi.org/10.1155/2022/7333175

Security and Communication Networks

On this page

Abstract Introduction Preliminaries Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Deep Learning Security for Emerging 5G and Internet of Things Systems

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 7333175 | https://doi.org/10.1155/2022/7333175

Privacy-Preserving KNN Classification Algorithm for Smart Grid

Zhuhuan Song,¹Yanli Ren,^1,2and Gang He¹

Academic Editor: AnMin Fu

Received30 Mar 2022

Accepted26 Apr 2022

Published11 May 2022

Abstract

With the development of Internet of Things (IOT), outsourcing data and tasks to a cloud server has become a popular and economical way for small devices with restricted ability. The k-nearest neighbors (KNNs) classification algorithm have been commonly applied to medical image classification, abnormal detection, defective product identification, and so on. The previous privacy-preserving KNN algorithms are based on two cloud servers, which have high computational and communication costs. In this paper, we design a privacy-preserving KNN classification (PPKC) algorithm with single cloud server for smart grid. Specifically, each smart meter and control center encrypted their data with Paillier cryptosystem and the cloud server does some calculation on the encrypted data. We prove that PPKC can protect the privacy of both smart meters and control centers, and the classification results are also private for the server. Besides, both smart meters and control centers can stay offline after uploading their data. The experiment results demonstrate that the proposed PPKC algorithm is more efficient than the previous algorithms and it can obtain almost the same accuracy as the original KNN, which means the PPKC algorithm is more applicable for small devices in IOT.

1. Introduction

In recent years, Internet of Things (IOT) has been widely used in medical devices, industrial detection, intelligent transportation, and so on. At the same time, more and more applications of machine learning have been applied to IOT, which can greatly facilitate our work and life [1].

As we know, a large amount of data can be generated in IOT. If massive data is given to small terminals for storage or calculation, it may undoubtedly increase the burden of these devices and reduce their efficiency [2]. In the worst case, this may even lead to calculation errors and the loss of stored data. Fortunately, the cloud server can be used to store and calculate data. Generally speaking, the cloud server has huge computing resources and storage capacity [3]. For these devices or users with insufficient computing power, they only need to spend a small amount of money and then enjoy the benefits of cloud platform including powerful computing ability and massive storage resources. For example, in the medical industry, using cloud platform can help doctors judge the specific situation of the disease quickly and then save the lives of patients [4].

Notice that training models of machine learning or deep learning on cloud platform can also bring great convenience to those users with limited computing power [5]. However, the users cannot directly outsource private data to the cloud server since it is always untrusted. Usually, the devices or users encrypt sensitive information before sending them to the cloud server. Unfortunately, calculating on the encrypted data may not get the correct answer. In addition, for small devices, many encryption algorithms may also consume a lot of computing resources and spend a lot of time. Therefore, it is urgent to propose an efficient privacy-preserving algorithm for the outsourced data.

At present, there are many machine learning algorithms, such as decision tree [6], k-means [7], support vector machine (SVM) [8], and so on. By using these algorithms, we can efficiently carry out the classification task and get the results. Note that k-nearest neighbor (KNN) [9] classification algorithm is one of the most commonly used machine learning algorithms, which has been widely applied to IOT, including smart grid and smart home. Typically, KNN classification algorithm can obtain high accuracy in medical image classification, abnormal detection, defective product identification, and so on. For example, a lot of electrical consumption data will be generated in smart grid where the control centers collect the electricity consumption in a specific area and adjust the power generation to maintain the balance of supply and power demand. However, KNN classification may have a high computational complexity for big data, and the resource-constrained terminals may spend much time to complete the algorithm. Outsourcing these tasks to the cloud server is a solution to this problem. In the past few years, many privacy-preserving KNN classification algorithms have been presented. However, they either have security risks or are based on several noncolluding cloud servers [10, 11], which may be very difficult to be realized. Therefore, we aim to propose an efficient and secure privacy-preserving KNN classification algorithm for the outsourced data with only one cloud server.

1.1. Related Work

With the rapid development of machine learning, people pay more attention to the security risks. Rahulamathavan et al. [12] designed a secure SVM classification algorithm, which uses Paillier homomorphic encryption to protect the data. However, this scheme cannot protect the output result from the cloud server. Yuan and Tian [13] put forward a secure K-means clustering algorithm by using MapReduce, which encrypts the data based on the learning with error (LWE) problem, and the cloud server iterates on the encrypted data. However, the client should interact with cloud server several times, which appends the communication costs of the client. Liu et al. [14] proposed a secure decision tree training and evaluation algorithm, which realized data privacy protection by using public key encryption with distributed trapdoors and additive secret sharing. This protocol protects the secret of users well, but it can only be used in ID3 algorithm and is not applicable to other decision tree algorithms.

Many privacy-preserving KNN classification protocols have been extensively studied in the past years. Samanthula et al. [10] presented the first privacy-preserving KNN classification protocol, which used the Paillier cryptosystem and two cloud servers to build some secure blocks, including secure Euclidean distance, secure multiplication, secure minimum value, and secure frequency to hide the private data. However, the computation complexity is very high, and it is infeasible for limited users on large datasets. Rong et al. [15] presented a cooperative KNN protocol with ElGamal cryptosystem [16]. However, the protocol has high computational complexity and uses two cloud servers. For multiple data owners, Li et al. [17] proposed a secure KNN classification protocol. However, this protocol cannot protect the outputs from the servers. Cheng et al. [18] put forward a secure KNN query protocol with multiple keys. Unfortunately, the protocol is based on multiple servers and cannot realize the privacy of data. Yang et al. [19] designed an efficient KNN scheme by using vector homomorphic encryption (VHE) [20], which achieved high efficiency. However, it is proved that VHE has some secure risks in [21]. After that, Yang et al. [22] designed a secure VHE and applied it to KNN classification, but the query data cannot be protected from the server. Liu et al. [11] presented an efficient KNN classification protocol, which greatly improves the computational efficiency by using additive secret sharing. However, this protocol uses two noncolluding servers and needs multiple interactions between servers.

In summary, there is no privacy-preserving KNN algorithm that can simultaneously realize the privacy of data, query, and classification results based on only one cloud server.

1.2. Our Contributions

In this paper, we propose a privacy-preserving KNN classification (PPKC) algorithm with only one cloud server for smart grid, which is much more secure and efficient compared with the previous algorithms [10, 11]. Specifically, the contributions of this paper are summarized as follows.(1)Based on the Paillier cryptosystem, we design a privacy-preserving KNN classification (PPKC) algorithm, which only uses one cloud server. In [10, 11], the outsourcing algorithms of KNN are all based on two noncolluding servers, which needs multiple interactions between servers and is almost impossible in practical applications.(2)The proposed PPKC can not only realize the privacy of both control centers and smart meters but also protect the final outputs from the servers. After the smart meters upload encrypted data to the cloud server, they only need to keep offline. Similarly, the control centers upload encrypted query data and then keep offline until the cloud server return the encrypted results. Therefore, the proposed algorithm can not only protect the sensitive information but also reduce the computational and communication costs of smart meters and control centers as much as possible.(3)Some experiments are evaluated to analyze the performance of PPKC on four relevant datasets. The PPKC algorithm is the most efficient for the smart meters compare with the previous work. Besides, the accuracy of proposed PPKC is nearly same as the original KNN, which means PPKC can help control centers find the abnormal power consumption of smart meters without leaking privacy.

The rest of our paper is organized as follows. We briefly introduce some preliminaries in Section 2. Section 3 formalizes the concrete construction of PPKC algorithm. We give the security proofs, computational complexity, and efficiency comparisons in Section 4. Section 5 gives the performance evaluations of efficiency and accuracy. Finally, we conclude the paper in Section 6.

2. Preliminaries

In this section, we mainly introduce the original KNN algorithm, Paillier cryptosystem, and the system model, which are the basis of the PPKC algorithm.

2.1. K-Nearest Neighbors (KNNs) Classification Algorithm

KNN algorithm is a nonparametric classification or regression algorithm in the field of pattern recognition. In this subsection, we briefly introduce the KNN classification algorithm. Assume that there are a few training data which have several attributes and a label. Next, there is a group of testing data which only has some attributes without labels. The aim of KNN classification algorithm is to get the label of the testing data. The specific process is as follows.(1)Prepare training data and testing data. There are groups of training data and each group has attributes and a label , where . Suppose there is a group of testing data .(2)Compute the Euclidean distance between each training data and the testing data.(3)All the distance and its corresponding label are used to generate an ordered collection .(4)Sort the ordered collection of distance and labels according to the distance .(5)Select the first labels from the sorted collection .(6)Count the frequency of the labels and set the label with the highest frequency as the final result of the testing data.

2.2. Paillier Cryptosystem

Paillier cryptosystem [23] is a public key cryptosystem, which realizes additive homomorphism and is mainly used in signal processing and data processing. Its security relies on the problem of determining -order residue classes. We briefly give the introduction of Paillier cryptosystem as follows.(1)Key generation: input a security parameter and two big primes and with bit length . Calculate and , where means the least common multiple. Define . Next, choose a random generator and calculate . Finally, the public key is and the private key is .(2)Encryption: input a plaintext , the ciphertext can be computed with public key as follows: where is the random number in .(3)Decryption: for a ciphertext , the plaintext can be computed with the private key as follows:

In addition, the Paillier cryptosystem has the following properties:(1)Homomorphic addition: where and are any two plaintexts and is the public key.(2)Scalar multiplicative homomorphism: where is a scalar and is a plaintext.(3)Semantic security: Paillier cryptosystem has been certified to be semantically secure against chosen plaintext attack [23]. The attacker has no access to obtain any sensitive data about plaintext from the ciphertext.

In the proposed PPKC algorithm, smart meters and control centers first use Paillier cryptosystem to encrypt sensitive data and then transmit the ciphertext to the cloud server. Notice that all data in the Paillier cryptosystem belong to . We omit mod in the following content for simplicity.

2.3. System Model

The proposed PPKC can be used in the smart grid and the participants include smart meters, control centers, and the cloud server. Figure 1 demonstrates the system model of PPKC algorithm.

Assume that there are some smart meters in a city, which can separately collect the power consumption in a region of the city and send these collected data to the cloud server. At the same time, some control centers can collect and detect abnormal data in this city. After collecting some abnormal data, the control center encrypts and sends them to the cloud server. The cloud server uses KNN classification algorithm to calculate and obtain the label of abnormal data and send it back. The control center can then timely find out which region has abnormal power consumption according to the label and solve the problem in time. The detailed introduction of each participant is listed as follows.

2.3.1. Smart Meters

The smart meters are composed of various sensors which can collect the energy consumption information of electrical equipment. However, these sensors have limited storage and cannot store massive data. In the PPKC model, the smart meters send the collected data to the cloud server. Suppose there are smart meters , where . Each smart meter has a dataset , where and are separately the attributes and the label of the dataset. At the same time, each smart meter needs to generate a tuple of random numbers , which is used to encrypt their data. For saving storage, the smart meter uploads the ciphertext to the cloud server.

2.3.2. Control Centers

The control centers in the smart grid collect abnormal data to realize power adjustment, abnormal remind, or process optimization. In the PPKC model, the control centers submit some query data to obtain the labels of abnormal data. Suppose that there are control centers , where . Each control center has a query dataset and sends encrypted query data to the cloud server. Each control center broadcasts the public key to smart meters.

2.3.3. Cloud Server

The cloud server has powerful computational resources and storage capacity to implement KNN classification algorithm. In the proposed PPKC algorithm, it will do some calculation on the encrypted data from smart meters and control centers and return the encrypted results to the control centers. The cloud server broadcasts the public key to smart meters and control centers. We suppose that the cloud server is honest but curious, which denotes that the cloud server correctly carries out every step in the algorithm, but may try to obtain the private data of the smart meters and control center.

2.4. Security Model

IND-CPA Security. In a public key cryptosystem, the indistinguishability of ciphertext against chosen plaintext attack (IND-CPA) [24] is executed by the following game between an adversary and a challenger .(1)Setup. The challenger generates public key and private key based on security parameter and publishes to the adversary and keeps secret.(2)Queries. The adversary can execute encryption queries to the challenger. The adversary sends a plaintext to the challenger. The challenger executes algorithm and generates the corresponding ciphertext and sends it to .(3)Challenge. The adversary randomly selects two plaintexts , with the same length and returns them to . These two plaintexts , are not queried in the Queries phase. The challenger randomly selects and transmits the ciphertext to .(4)Guess. sends a bit as its guess. If with a probability which is larger than 1/2, the adversary wins the game and the advantage is defined as .

An encryption system is -IND-CPA secure if all -time IND-CPA adversaries have advantage of at most in winning the game.

3. The Proposed PPKC Algorithm

We introduce the privacy-preserving KNN classification algorithm in this section. Compared with [10, 11], the proposed PPKC algorithm is constructed based on single cloud server and realizes input privacy, query privacy, and output privacy simultaneously.

As shown in Section 2, there are three types of participants in the proposed PPKC algorithm, including smart meters (SM), control centers (CC), and cloud server (CS). All the data of smart meters and control centers belong to . The public keys and private keys of cloud server and control centers are denoted as , and , . In the proposed PPKC, the cloud server and control centers encrypt their data by using Paillier cryptosystem.

3.1. The Concrete Construction

The proposed PPKC algorithm includes uploading encrypted data, uploading query data, executing KNN on the ciphertext, and decrypting the ciphertext.

3.1.1. Uploading Encrypted Data

Assume that there are smart meters. For a smart meter , it firstly chooses a tuple of random numbers and calculates . Next, it uses to encrypt the first data and uses to encrypt the last data and all the random numbers. Finally, it transmits these secure data to the cloud server. After this step, smart meters can stay offline.

3.1.2. Uploading Query Data

Assume the query data is . The control center downloads , , from the cloud server. It then decrypts and obtains random numbers , . Next, the control center calculates each random number minus all the query data and encrypts them with . Finally, the control center sends the ciphertext of , , to the cloud server. After that, the control center can stay offline.

3.1.3. Executing KNN on the Ciphertext

After receiving the encrypted data from the control center and smart meters, the cloud server computes the dots products based on and and obtains . Then, the cloud server decrypts and gets , . Next, the cloud server computes , which is the Euclidean distance between the data of smart meters and the query data, and each has a corresponding encrypted label . At this time, the cloud server has groups of . After that, the cloud server sorts and selects the smallest numbers in and their corresponding encrypted labels. Finally, these encrypted labels are transmitted to the control center .

3.1.4. Decrypting the Ciphertext

The control center decrypts these encrypted labels and calculates the frequency of each label. Finally, sets the label with the highest frequency as the result.

Please see Algorithm 1 for the details of the proposed PPKC algorithm.

	Suppose that there are smart meters and control centers. Each smart meter has , , where is the label and the attributes are . Each control center has a query data , . are public key and private key of the cloud server while are public key and private key of the control center .
	Smart MeterUploading Encrypted Data:
(1)	Generate a series of random numbers and calculate .
(2)	Use public key and to encrypt and , respectively.
(3)	Send these encrypted numbers, and , , to the cloud server.
	Control CenterUploading Query Data:
(1)	Download , , from the cloud server.
(2)	Decrypt by using the private key and obtain these random numbers , .
(3)	Calculate .
(4)	Encrypt the above random numbers by using public key and get .
(5)	Send these encrypted random numbers to the cloud server.
	The Cloud Server Executing KNN on the Ciphertext:
(1)	Compute , , and then use to decrypt these encrypted numbers.
(2)	Calculate , , which is the Euclidean distance between the data of smart meters and the query data.
(3)	Calculate the encrypted label of each Euclidean distance . In detail, . Thus, there are groups Euclidean distance and its corresponding encrypted label .
(4)	Sort and select the smallest numbers in and their corresponding encrypted labels.
(5)	Send these encrypted labels , , to the control center .
	Control CenterDecrypting the Ciphertext:
(1)	Decrypt these encrypted labels with and get .
(2)	Calculate the frequency of each label.
(3)	Set the label with the highest frequency as the final result.

3.2. Correctness

The correctness of PPKC relies on the properties of Paillier cryptosystem, including homomorphic addition and scalar multiplicative homomorphism.

After the smart meters upload encrypted data and the control center uploads query data, the cloud server executes KNN on the ciphertext. The cloud server calculates the dots products based on and . According to the homomorphic addition of Paillier cryptosystem, we can easily get the following equation:

Besides, the cloud server needs to calculate the encryption labels. According to the homomorphic addition of Paillier cryptosystem, we can get the following equation:

Finally, the cloud server returns encrypted labels to the control center . If the cloud server follows every step of the proposed PPKC, the control center finally gets the correct label.

Therefore, we can prove that the result obtained by the control center is correct through the above analysis.

4. Analysis of the PPKC Algorithm

In this section, we analyze the data privacy of smart meters and control centers and prove that the data given back by the cloud server is secure. We assume that the cloud server is semi-honest.

Therefore, we should prove that the private information of both smart meters and control centers is all private for the cloud server. Moreover, the outputs should also be well protected from the server.

4.1. Security Analysis

Theorem 1. In the PPKC algorithm, the privacy of the smart meter is ensured for the server based on the Paillier cryptosystem.

Proof. The security proof is executed between an adversary and a challenger . At the beginning of the game, the challenger is given a ciphertext of Paillier cryptosystem to decide the corresponding plaintext is or . Suppose the adversary distinguish the ciphertext of smart meter with an overwhelming probability; the challenger can distinguish the ciphertext of Paillier cryptosystem with an overwhelming probability.(1)Setup. The public keys of smart meter, cloud server, and control centers are published to the adversary and the challenger .(2)Queries. Adversary first executes encryption queries to the challenger. The adversary sends to the challenger. The challenger executes the PPKC algorithm, selects some random numbers and returns the corresponding ciphertexts: where are the public keys of cloud server and control center , respectively.(3)Challenge. The adversary randomly picks two messages , and transmits them to . Notice that these two plaintexts are not queried in the Queries phase. The challenger randomly selects and returns the ciphertext to adversary : where , and are random numbers chosen by the challenger.(4)Guess. sends a bit as its guess, and then the challenger returns . If the adversary wins the game with an advantage of , then the challenger can distinguish the ciphertext of Paillier cryptosystem with an advantage of .

Therefore, the privacy of the smart meter is assured for the server based on the Paillier cryptosystem.

Theorem 2. In the PPKC algorithm, the privacy of the control center is also ensured for the server based on the Paillier cryptosystem.

Proof. The same as Theorem 1, the challenger is firstly given a ciphertext of Paillier cryptosystem to decide the corresponding plaintext is or . Suppose the adversary distinguish the ciphertext of control center with an overwhelming probability; the challenger can distinguish the ciphertext of Paillier cryptosystem with an overwhelming probability.(1)Setup. The same as Theorem 1.(2)Queries. The adversary can execute encryption queries to the challenger. The adversary sends to the challenger. The challenger executes the PPKC algorithm, selects some numbers , and returns the corresponding ciphertexts: where is the public key of cloud server.(3)Challenge. The adversary randomly chooses two queries , and sends them to . Note that these two queries are not queried in the Queries phase. The challenger selects and returns the ciphertext to : where are random numbers chosen by the challenger.(4)Guess. The same as Theorem 1.

Therefore, the privacy of the control center is ensured for the server based on the Paillier cryptosystem.

Theorem 3. In the PPKC algorithm, the result of query is protected from the server based on the Paillier cryptosystem.

Proof. As same as Theorem 1, the challenger is firstly given a ciphertext of Paillier cryptosystem to decide the corresponding plaintext is or . Suppose the adversary distinguish the ciphertext of the query result with an overwhelming probability; the challenger can distinguish the ciphertext of Paillier cryptosystem with an overwhelming probability.(1)Setup. The same as Theorem 1.(2)Queries. Adversary can execute encryption queries to the challenger. The adversary sends the labels to the challenger. The challenger executes the PPKC algorithm and returns the encrypted labels: where is the public key of the control center and are random numbers chosen by the challenger.(3)Challenge. The adversary chooses two tuples of labels , and sends them to . Notice that these two tuples of labels are not queried in the Queries phase. The challenger randomly selects and returns the ciphertext to : where and are random numbers chosen by the challenger.(4)Guess. The same as Theorem 1.

Therefore, the result of the query is protected from the server based on the Paillier cryptosystem.

4.2. Computational Complexity

In Table 1, we give the final computation overheads of each participant, where , , separately mean the quantity of data records, smart meters, and control centers. As mentioned in Section 3, there are totally smart meters and each smart meter has data records including data attributes and one label. At the same time, there exist control centers and each control center has data records. We ignore the additions and multiplications in the following analysis since these operations have low computational costs and modular exponentiation (ME) and modular multiplication (MM) cost far more than them. For the Paillier cryptosystem, one encryption needs 2 ME and 1 MM, while one decryption needs 2 ME and 3 MM.

In the proposed PPKC, each smart meter needs to use to encrypt and use to encrypt . Thus, each smart meter performs encryptions and the total number of all smart meters is for encryption. After downloading the encrypted random numbers, the control center decrypts and then encrypts the query data by using random numbers. Thus, each control center performs decryptions and encryptions. In the proposed PPKC, the cloud server needs to compute the Euclidean distance. For an Euclidean distance, the cloud server needs to perform MM and decryptions. Thus, computing Euclidean distances needs to perform MM and decryptions. In the Decryption operation, the control center needs to decrypt encrypted labels and calculates the label with the highest frequency. We ignore the operations in this paper because is a small number.

Therefore, all the smart meters need to perform ME and MM. Each control center needs to perform encryptions and decryptions. Thus, the control center needs to execute ME and MM. The cloud server needs to perform ME and MM.

4.3. Comparisons

Table 2 shows the comparison of the proposed PPKC and other privacy-preserving KNN algorithms, where , , separately mean the quantity of data owners, the parameter of KNN, and the length of a value. Samanthula et al. [10] put forward a privacy-preserving KNN classification protocol, which is the first one to achieve query privacy, data privacy, and output privacy. Unfortunately, this protocol has high computational cost. Liu et al. [11] designed a secure KNN classification protocol with additive secret sharing, which protects all the data of each participant. Moreover, this protocol is much more efficient than [10]. However, these two protocols are constructed based on two noncolluding servers, which is difficult to be realized in reality. For multiple data owners, Li et al. [17] proposed a secure KNN classification protocol. Unfortunately, this protocol cannot totally protect the data from the servers. Cheng et al. [18] designed a secure KNN query protocol by utilizing the distributed two trapdoors public-key cryptosystem [25]. However, this protocol uses two cloud servers and cannot achieve the privacy of data. Yang et al. [19] presented another secure KNN classification algorithm with vector homomorphic encryption which is proved insecure in [21]. Thus, this protocol cannot protect any private information of data owners and query users. Yang et al. [22] used a secure VHE designed a privacy-preserving KNN classification algorithm, but the query data cannot be well protected.

Therefore, the proposed PPKC is the first algorithm which can protect the private information of smart meters, control centers, and the query results based on single cloud server.

5. Performance Evaluation

To observe the efficiency and accuracy of the PPKC algorithm, we execute the following two experiments. In the first experiment, we compare the PPKC with the previous protocols to demonstrate the efficiency of PPKC. Next, we test the accuracy of original KNN with that of PPKC to show that the accuracy does not decrease in the privacy-preserving setting.

All the experiments are designed on a MacBook Pro laptop equipped with 4 cores rated at 1.4 GHz and 16 GB of RAM by Java language.

5.1. Computational Overheads

In this subsection, we do some experiments to show the effectiveness of the PPKC. As shown in Table 2, the proposed PPKC and [10, 11] achieve data privacy, query privacy, and result privacy, but [17–19, 22] cannot realize the properties simultaneously; we thus only compare the PPKC with these two protocols. Our experiments are performed on the Car Evaluation dataset from University of California Irvine (UCI) dataset [26], which is also used in [10, 11]. Specifically, there are total 1728 records and each record has 6 data attributes in this dataset, which means , . Besides, these records are categorized into 4 classes, which means there are 4 different kinds of labels in this dataset. For a fair comparison, we compare the performance of our PPKC with different , which is chosen from 5 to 25. Moreover, we test the time of control centers, smart meters, and cloud server during executing the PPKC.

Since the length of has an effect on the efficiency of PPKC, we separately select and in the following experiments, which is the same as [10, 11]. Tables 3 and 4 mainly present the computation cost of SM, CC, and CS in the PPKC algorithm for different . We define as the total time of executing PPKC. , , and separately denote the time of smart meters, control center, and cloud server during executing PPKC.

We specifically discuss the running time of smart meters, control center, and cloud server, respectively. As shown in Tables 3 and 4, we make a comparison of , , and . When and , , , and is 7.44 s, 4.508 s, and 6.778 s, respectively. When and , is changed from 7.44 s to 7.453 s, is changed from 4.508 s to 4.98 s, and is changed from 6.778 s to 6.915 s. With the increase of , , and are almost unchanged. Similarly, when and , , , and are 40.918 s, 26.763 s, and 43.949 s, respectively. When and , , , and are 41.72 s, 28.011 s, and 43.621 s, respectively. With the increase of , , and do not obviously change. Thus, we can conclude that the increase of does not have a great impact on , , and .

Considering there are 1728 data records in this dataset, which means there are 1728 smart meters. Since the total time of smart meters executing PPKC is 7.453 s and 41.72 s, respectively, each smart meter only needs 4.31 ms and 24.14 ms to encrypted their data records when is varied from 512 to 1024, which is quite time-saving. As shown in Tables 3 and 4, each control center may spend more time than each smart meter since it needs to encrypt all the query data. It is acceptable for the control center because there is only one cloud server in PPKC algorithm and the control center need to do more computation to protect the query.

Next, we compare the efficiency of PPKC with those of Samanthula et al. [10] and Liu et al. [11] during executing the whole algorithm as shown in Figures 2 and 3. When is varied from 5 to 25 and , the total running time of PPKC is varied from 18.726 s to 19.348 s. Similarly, when , the total running time of PPKC is varied from 111.63s to 113.352s. In [10, 11], when and , their running time is 10.01 min and 0.59 min, respectively. Simultaneously, when and , their running time is 67.97 min and 3.091 min, respectively. Therefore, the proposed PPKC is much more efficient than the previous privacy-preserving KNN protocols.

5.2. Classification Accuracy

We test the classification accuracy of KNN and PPKC on some datasets from UCI dataset [26] in this subsection.

Assume that , , , are separately the number of data records, the attributes, the training data records, and query data. The query data are randomly selected from the dataset. Thus, each query data has its own label, which can be judged whether the classification results of PPKC are true or false. Table 5 demonstrates some information of the datasets.

We totally use 4 different datasets. The first dataset is Car Evaluation dataset. As demonstrated in previous part, there are 1728 data records and 4 different labels, including unaccuracy, accuracy, good, and very good, while each data record has 6 attributes. Here, we set the first 1400 records as training data and the other 328 records as testing data.

The second dataset is Mammographic Mass dataset. There are a total of 830 data records with 5 attributes and 2 different labels including benign and malignant. In this dataset, the first 600 data records are training data and the other 230 data records are testing data.

The third dataset is Electrical Grid Stability dataset, which is widely used in smart grid. In this dataset, there are some attributes including power consumed, value for electricity producer, and so on. There are a total of 10000 data records with 11 attributes and 2 different labels including stable and unstable. In this dataset, the first 8000 data records are training data and the other 2000 data records are testing data.

The fourth dataset is Letter Recognition dataset. In this dataset, there are a total of 20000 data records which have 16 attributes. The labels are 26 capital letters in the English alphabet. The first 16000 records are training data and the other 4000 records are testing data.

For different datasets, we totally carry out 5 groups of experiments with different . For each dataset, we set the value of 1, 2, 3, 5, and 10.

As shown in Table 6, we give the accuracy of original KNN and PPKC with different . In the first dataset, the accuracy of original KNN is varied from to , while the accuracy of PPKC is varied from to . In the other three datasets, it is also not much difference between the accuracy of original KNN and PPKC. No matter what is, the accuracy of KNN is almost the same as that of PPKC. Besides, the proposed PPKC can protect the data of smart meters and control centers, which can prevent cloud server from obtaining their private information. Moreover, the results of Electrical Grid Stability dataset shows that the proposed PPKC can be well applied to the smart grid. Thus, PPKC is secure and practical for smart grid.

6. Conclusion

In this paper, we propose a privacy-preserving KNN classification algorithm based on single cloud server for smart grid. The proposed PPKC algorithm can protect private data of smart meters, control centers, and the classification results based on Paillier cryptosystem. Compared with the previous protocols, PPKC only uses one cloud server, which is more practical in reality. The experiments results demonstrate that PPKC is much more efficient than the previous protocols and the classification accuracy of PPKC is almost the same as that of the original KNN, which means PPKC is efficient and feasible for smart grid. However, the proposed PPKC algorithm is only applicable to the KNN algorithm, and we will pay more attention to other privacy-preserving machine learning algorithms in the future.

Data Availability

All data are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The work described in this paper was supported by the Natural Science Foundation of Shanghai (20ZR1419700 and 22ZR1481000) and Henan Key Laboratory of Network Cryptography Technology (LNCT2021-A13).

References

B. Kuang, A. Fu, S. Yu, G. Yang, M. Su, and Y. Zhang, “Esdra: an efficient and secure distributed remote attestation scheme for iot swarms,” IEEE Internet of Things Journal, vol. 6, no. 5, pp. 8372–8383, 2019.
View at: Publisher Site | Google Scholar
Y. Zhang, X. Xiao, L.-X. Yang, Y. Xiang, and S. Zhong, “Secure and efficient outsourcing of pca-based face recognition,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 1683–1695, 2020.
View at: Publisher Site | Google Scholar
C. Zhou, A. Fu, S. Yu, W. Yang, H. Wang, and Y. Zhang, “Privacy-preserving federated learning in fog computing,” IEEE Internet of Things Journal, vol. 7, no. 11, pp. 10782–10793, 2020.
View at: Publisher Site | Google Scholar
M. Kumar and S. Chand, “A secure and efficient cloud-centric internet-of-medical-things-enabled smart healthcare system with public verifiability,” IEEE Internet of Things Journal, vol. 7, no. 10, pp. 10650–10659, 2020.
View at: Publisher Site | Google Scholar
Y. Li, H. Li, G. Xu, T. Xiang, X. Huang, and R. Lu, “Toward secure and privacy-preserving distributed deep learning in fog-cloud computing,” IEEE Internet of Things Journal, vol. 7, no. 12, pp. 11460–11472, 2020.
View at: Publisher Site | Google Scholar
A. J. Myles, R. N. Feudale, Y. Liu, N. A. Woody, and S. D. Brown, “An introduction to decision tree modeling,” Journal of Chemometrics, vol. 18, no. 6, pp. 275–285, 2004.
View at: Publisher Site | Google Scholar
A. Likas, N. Vlassis, and J. J. Verbeek, “The global k-means clustering algorithm,” Pattern Recognition, vol. 36, no. 2, pp. 451–461, 2003.
View at: Publisher Site | Google Scholar
J. A. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural Processing Letters, vol. 9, no. 3, pp. 293–300, 1999.
View at: Google Scholar
G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “Knn model-based approach in classification,” On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, vol. 2888, pp. 986–996, 2003.
View at: Publisher Site | Google Scholar
B. K. Samanthula, Y. Elmehdwi, and W. Jiang, “K-nearest neighbor classification over semantically secure encrypted relational data,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 5, pp. 1261–1273, 2015.
View at: Publisher Site | Google Scholar
L. Liu, J. Su, X. Liu et al., “Toward highly secure yet efficient knn classification scheme on outsourced cloud data,” IEEE Internet of Things Journal, vol. 6, no. 6, pp. 9841–9852, 2019.
View at: Publisher Site | Google Scholar
Y. Rahulamathavan, R. C.-W. Phan, S. Veluru, K. Cumanan, and M. Rajarajan, “Privacy-preserving multi-class support vector machine for outsourcing the data classification in cloud,” IEEE Transactions on Dependable and Secure Computing, vol. 11, no. 5, pp. 467–479, 2014.
View at: Publisher Site | Google Scholar
J. Yuan and Y. Tian, “Practical privacy-preserving mapreduce based k-means clustering over large-scale dataset,” IEEE Transactions on Cloud Computing, vol. 7, no. 2, pp. 568–579, 2019.
View at: Publisher Site | Google Scholar
L. Liu, R. Chen, X. Liu, J. Su, and L. Qiao, “Towards practical privacy-preserving decision tree training and evaluation in the cloud,” IEEE Transactions on Information Forensics and Security, vol. 15, pp. 2914–2929, 2020.
View at: Publisher Site | Google Scholar
H. Rong, H.-M. Wang, J. Liu, and M. Xian, “Privacy-preserving k-nearest neighbor computation in multiple cloud environments,” IEEE Access, vol. 4, pp. 9589–9603, 2016.
View at: Publisher Site | Google Scholar
T. ElGamal, “A public key cryptosystem and a signature scheme based on discrete logarithms,” IEEE Transactions on Information Theory, vol. 31, no. 4, pp. 469–472, 1985.
View at: Publisher Site | Google Scholar
F. Li, R. Shin, and V. Paxson, “Exploring privacy preservation in outsourced k-nearest neighbors with multiple data owners,” Proceedings of the 2015 ACM Workshop on Cloud Computing Security Workshop - CCSW '15, pp. 53–64, 2015.
View at: Publisher Site | Google Scholar
K. Cheng, L. Wang, Y. Shen et al., “Secure k-nn query on encrypted cloud data with multiple keys,” IEEE Trans. Big Data, vol. 7, no. 4, pp. 689–702, 2021.
View at: Google Scholar
H. Yang, W. He, J. Li, and H. Li, “Efficient and secure knn classification over encrypted data using vector homomorphic encryption,” 2018 IEEE International Conference on Communications (ICC), pp. 1–7, 2018.
View at: Publisher Site | Google Scholar
H. Zhou and G. Wornell, “Efficient homomorphic encryption on integer vectors and its applications,” Proc. Inf. Theory Appl. Workshop, (ITA), pp. 1–9, 2014.
View at: Publisher Site | Google Scholar
S. Bogos, J. Gaspoz, and S. Vaudenay, “Cryptanalysis of a homomorphic encryption scheme,” Cryptography and Communications, vol. 10, no. 1, pp. 27–39, 2018.
View at: Publisher Site | Google Scholar
H. Yang, S. Liang, J. Ni, H. Li, and X. S. Shen, “Secure and efficient k nn classification for industrial internet of things,” IEEE Internet of Things Journal, vol. 7, no. 11, pp. 10945–10954, 2020.
View at: Publisher Site | Google Scholar
P. Paillier, “Public-key cryptosystems based on composite degree residuosity classes,” Advances in Cryptology - EUROCRYPT '99, pp. 223–238, 1999.
View at: Publisher Site | Google Scholar
S. Goldwasser and S. Micali, “Probabilistic encryption,” Journal of Computer and System Sciences, vol. 28, no. 2, pp. 270–299, 1984.
View at: Publisher Site | Google Scholar
X. Liu, R. H. Deng, K.-K. R. Choo, and J. Weng, “An efficient privacy-preserving outsourced calculation toolkit with multiple keys,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 11, pp. 2401–2414, 2016.
View at: Publisher Site | Google Scholar
M. Bohanec and B. Zupan, “The UCI KDD archive,” 1997, http://archive.ics.uci.edu/ml.
View at: Google Scholar

Copyright

Copyright © 2022 Zhuhuan Song et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies