Differential Privacy Protection for Support Vector Machines for Nonlinear Classification

Huang, Yuxian; Yang, Geng; Bai, Yunlu; Dai, Hua

doi:https://doi.org/10.1155/2022/7941915

Security and Communication Networks

On this page

Abstract Introduction Related Work Preliminaries Experimental Results and Analysis Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2022 | Article ID 7941915 | https://doi.org/10.1155/2022/7941915

Differential Privacy Protection for Support Vector Machines for Nonlinear Classification

Yuxian Huang,^1,2Geng Yang,^1,2Yunlu Bai,^1,2and Hua Dai^1,2

Academic Editor: Xiaohui Yuan

Received23 Dec 2021

Accepted31 Aug 2022

Published23 Sept 2022

Abstract

Currently, private data leakage and nonlinear classification are two challenges encountered in big data mining. In particular, few studies focus on these issues in support vector machines (SVMs). In this paper, to effectively solve them, we propose a novel framework based on the concepts of differential privacy (DP) and kernel functions. This framework can allocate privacy budgets and add artificial noise to different SVM locations simultaneously, which makes the perturbation process freer and more delicate. In addition, under this framework, we propose three algorithms, DP SVMs that perturb the training data set, perturb the kernel function, and utilize mixed perturbation (DPSVM-TDP, DPSVM-KFP, and DPSVM-MP, respectively), all of which can realize accurate classification while ensuring that the users’ privacy is not violated. Moreover, we conduct privacy analysis on these algorithms and prove that they all satisfy DP. Finally, we conduct experiments to evaluate the algorithms in terms of different aspects and compare them with the DPSVM with dual-variable perturbation (DVP) algorithm (DPSVM-DVP) to determine the optimal perturbation method. The results show that DPSVM-KFP can achieve the highest data utility and strictest privacy protection with the shortest running time.

1. Introduction

With the development of information technology, a tremendous amount of data has been collected, which makes the extraction of useful information an imperative and challenging issue. As an effective way to retrieve potentially valuable information hidden in massive data, big data classification [1] has penetrated many aspects of society, such as medical prediction, video semantic analysis [2], and human activity recognition [3, 4].

However, most of the existing classification algorithms still face two challenges [5, 6]. First, these algorithms generally put personal information at risk because the attacker may infer the user’s private information from the classification results. At the same time, people are becoming increasingly concerned with the use of their sensitive data. To guarantee the fundamental right of individuals to privacy, appropriate measures to protect data against disclosure risks must be implemented before classification results are published. Unfortunately, traditional privacy protection technologies, including k-anonymity [7], l-diversity [8], and t-closeness [9], are all built on the premise that attackers own no background knowledge. In fact, attackers can infer user privacy from published classification results by referring to some additional user information. Therefore, these technologies cannot provide adequate security. Differential privacy (DP) [10], an emerging and provable privacy protection model, exhibits many substantial advantages over previous privacy protection methods. DP is based on solid mathematical theory and quantifies privacy levels through privacy budgets, making privacy protection a comparable concept. Second, most classification algorithms do not perform well when dealing with nonlinearly separable data sets. Under such conditions, samples with different labels may become clustered, making it difficult to directly identify a classification plane to distinguish them in the sample space. Kernel method [11] is an excellent solution to deal with nonlinear classification because it can project samples to a high-dimensional feature space, where these samples can be separated by a hyperplane.

As a kind of predominant supervised learning model and data classification algorithm, support vector machines (SVMs) [12] are widely deployed in many applications. Recently, in order to prevent sensitive information leakage, several studies have been conducted on DP protection in SVM algorithms. These studies realize DP protection by adding artificial noise to SVMs. The effort in [13] obfuscates an input data set systematically through independently and identically distributed noise, thereby providing privacy protection for SVMs. Without disturbing the given data set, the research in [14, 15] perturbs the output of an SVM by adding random noise to the normal vector classification hyperplane. In addition, other researchers are mainly focused on perturbing some SVM parameters. For instance, the research in [16] perturbs the dual variables of the hyperplane, while the work of [17] perturbs the kernel function.

According to the above works, we find that although researchers have made many contributions to achieving DP protection for SVMs, there are still some shortcomings. First, the DP protection approaches of SVMs lack a general theoretical framework. Second, most studies in this area focus only on privacy protection and ignore the problem that most data sets in real tasks are nonlinearly separable. Third, the work of [17] complete classification and privacy protection for nonlinearly separable data by adding noise to the kernel function, but in this paper, the sensitivity of the kernel function and the scale of the added noise are not analysed. In addition, they only consider the radial basis function (RBF) as the kernel function and do not take the linear kernel and sigmoid kernel into account.

To effectively solve information leakage and nonlinear classification in SVMs, this paper proposes a novel framework based on the concepts of DP and the kernel method to perturb training data sets and kernel functions by purposely adding noise. A literature review finds that this approach is the first attempt to develop a general DP protection framework for SVMs. Furthermore, under this framework, DPSVM algorithms that perturb the training dataset, perturb the kernel function, and utilize mixed perturbation (DPSVM-TDP, DPSVM-KFP and DPSVM-MP, respectively) are proposed, all of which are able to classify nonlinearly separable datasets and satisfy DP. Furthermore, in DPSVM-KFP, we consider perturbing different kernel functions and complete an adequate privacy analysis on them. The main contributions of this paper are summarized as follows:(i)We propose a novel and general framework that organically combines DP, the kernel method, and an SVM. This framework can realize the classification of nonlinearly separable data sets while limiting the risk of privacy leakage by purposely perturbing the training data set and kernel function.(ii)We propose three algorithms based on the developed framework that utilize the Laplace mechanism to implement DP protection for SVM. The DPSVM-TDP and DPSVM-KFP algorithms add Laplace noise to the training data set and kernel function, respectively. The DPSVM-MP algorithm adopts a mixed perturbation method to disturb both the training data set and the kernel function. Moreover, these algorithms can be applied to the classification of nonlinearly separable data sets.(iii)We compute the global sensitivities of both the training data set and the kernel function, which is not an easy task in DP since it requires the consideration of all possible data sets differing by one record. Then, we prove that these algorithms satisfy DP based on global sensitivities under a certain noise perturbation level with the Laplace mechanism. In addition, we conduct extensive experiments on three University of California, Irvine (UCI) data sets to evaluate the performance of our algorithms. Furthermore, via a comparison with the DPSVM algorithm with dual-variable perturbation (DPSVM-DVP), we verify the efficiency of our algorithms.

The remainder of this paper is organized as follows. In Section 2, we show some related work and existing shortcomings. Section 3 introduces SVMs, the kernel method, and DP. Section 4 proposes a novel framework and three specific algorithms. In Section 5, we calculate the sensitivity levels of the algorithms and provide proofs of their DP. We provide the experimental results and related analysis in Section 6. Section 7 concludes this paper.

In recent years, some researchers have begun to utilize DP to realize privacy protection for SVMs. The ways in which such studies implement privacy protection are mainly divided into input perturbation, output perturbation, and algorithm perturbation methods [18].

Input perturbation usually disturbs the training data set. There is no doubt that this method is relatively straightforward, but the conditions of privacy protection are serious, and sometimes the classification effect is unsatisfactory. Liu [10] considers the problem of publicly releasing a data set for SVM classification while not infringing on the privacy of data subjects. An input data set is systematically obfuscated to protect privacy through noise, whose optimal distribution is determined by maximizing the weighted sum of privacy and utility metrics. Senekane [19] implements a privacy-preserving image classification scheme by adding noise to every pixel in the image. By controlling the scale of noise, he makes the scheme meet the requirements of differential privacy. In contrast to input perturbation, output perturbation is the most commonly employed method by researchers to accomplish DP protection for SVMs. The key to this method is to perturb the classification hyperplane of the given SVM, thus achieving reliable data classification under the condition of privacy protection. For instance, Xu et al. [14] use the Laplace mechanism to add random noise to the classification hyperplane so that the attackers cannot recover the data set through the model parameters. Cai et al. [15] consider the problem of unbalanced distribution of customer data and implements a weighted DP-SVM algorithm by adding noises of different scales to the output. The last method is algorithm perturbation, which adds noise to an intermediate value. For example, Zhang et al. [16] propose the DPSVMDVP algorithm, which does not directly perturb the normal vector of the classification hyperplane but perturbs the dual variable of the support vector. The algorithm first solves the dual problem of an SVM by the SMO method, and the difference between the estimated value and real value of each support vector is recorded. Then, it calculates the ratio of each support vector to the sum of all support vectors and adds different levels of Laplace noise to dual variables corresponding to these ratios. Wang and Li [17] propose a privacy-preserving SVM algorithm under DP for multiclassification. The algorithm disturbs the kernel function in three different ways: direct Laplace noise injection, Taylor formula replacement, and a combination of the previous two methods. The purpose of the algorithm is to protect small sample data while not interfering with the overall classification effect of the model.

In summary, such studies implement DP protection by adding noise to certain positions in SVMs. However, the existing studies fail to consider a general DP protection framework for SVMs, which forms the major task of our study. Compared with those of conventional works, the framework of our study can allocate privacy budgets and add artificial noise to different locations of SVMs simultaneously. In other words, this general framework can help to choose the locations and scales of perturbation more freely and delicately to deliver a more accurate classification result while ensuring the security of individuals’ privacy. Moreover, previous studies seldom take nonlinearly separable data sets into account, a shortcoming that is addressed in the framework of this study.

3. Preliminaries and Definitions

In this section, we introduce the related concepts of SVM, the kernel method and DP.

3.1. Support Vector Machines

SVM [20] is a binary classification model that defines a classifier that maximizes the minimum margin in the feature space. Specifically, for a data set , the number of samples is , and each sample has features. In addition, and are the features and labels of the th sample, respectively, where . Then, the classification decision function can be defined aswhere and are the normal vector and intercept of the classification hyperplane, respectively.

According to the idea of maximizing the classification hyperplane with the minimum margin, the original problem can be transformed into a constrained SVM optimization problem, which is usually described as follows:

Notably, the objective function is a convex function. Therefore, the Lagrange multiplier method [21] can be used to solve the problem so that we can obtain the following formula:

is the Lagrange multiplier.

Then, we can calculate the derivatives of and b and set the derivative functions equal to 0.

Substituting the result into (1), we can obtain the following formula:

3.2. Kernel Method for SVMs

The kernel method [22] is a significant way to solve nonlinear classification issues, especially in SVMs, because it can project samples to a higher dimensional feature space, making the sample separable. Therefore, if we assume that is the feature vector projected by , then according to (4), the model corresponding to the classification hyperplane can be expressed as

Next, we can assume that there is a function that equals the inner product of and in the feature space; that is,

The function K is called the kernel function.

Furthermore, for SVMs, there are three commonly used kernel functions: linear kernel, Gaussian kernel, and sigmoid kernel functions [23, 24].

The expression of the linear kernel function is

The expression of the Gaussian kernel function is

γ indicates the influence of a single sample on the entire classification hyperplane.

The expression of the sigmoid kernel function is

is the hyperbolic tangent function. γ indicates the influence of a single sample on the entire classification hyperplane. c indicates the independent item in the sigmoid kernel.

3.3. Differential Privacy

Definition 1. (DP [25]) A randomized mechanism is considered to satisfy DP if for any two neighboring data sets and (with at most one different sample) and for all outputs (),where is the privacy budget controlling the strength of the privacy protection. A lower ensures a greater privacy guarantee.

Definition 2. (Global Sensitivity [26]) For a function and any two neighboring data sets and , the global sensitivity of is defined asIn DP, the sensitivity can be understood as the approximate adjacency degree of the two data sets for the function

Definition 3. (Laplace Mechanism [27]). Given a data set D, for a function with sensitivity , the mechanism provides DP satisfying:] represents the noise sampled from the Laplace distribution with a scaling parameter , and its probability density function isThe Laplace mechanism is a common method for implementing DP protection. It is realized by adding random noise that satisfies the Laplace distribution. Furthermore, the scale of the noise is determined by the global sensitivity and the privacy budget. Therefore, when Laplace noise is added, the result of the user query is not a fixed value but rather an uncertain random number. The probability density function of the random number obeys the Laplace distribution.

Definition 4. (Composability of DP [28]). Suppose that the privacy budgets of algorithms are ; then, for the same data set , the combined algorithm satisfies DP.
This property indicates that an algorithm composed of a sequence of DP protection algorithms has a privacy budget that is equal to the sum of all individual privacy budgets.

4. Framework and Algorithms

In this section, we propose an SVM framework that satisfies DP protection for nonlinearly separable data. Moreover, we propose three algorithms based on the developed framework. The DPSVM-TDP and DPSVM-KFP algorithms perturb the training data set and kernel function, respectively. The DPSVM-MP algorithm uses a mixed perturbation method to disturb both the training data set and the kernel function. Table 1 shows the definitions of the notations that will be used in this paper.

4.1. Proposed Framework

Given that nonlinear classification and privacy leakage are two urgent challenges faced by SVMs, this study utilizes a kernel function and DP to address them by proposing a general framework, as shown in Figure 1.

First, an input data set is preprocessed to obtain the model’s input, which is represented as ; this means that there are n samples in total, the feature of the th sample is and its label is . For most data sets, the samples may be nonlinearly separable. In other words, in the current dimension, it is impossible to determine a plane to distinguish samples with different labels. Therefore, these samples must be projected into a higher dimensional space to find a hyperplane in the high-dimensional space and distinguish the samples. However, this process greatly increases the incurred computational costs, so the kernel method is adopted to simplify the calculation.

According to formula (6), the kernel function is equal to the inner product of and in a high-dimensional space. As a result, the kernel function between different features is calculated and used as the input for model training. Furthermore, to protect the data from being leaked, this study adds noise to the features of the training data set or kernel function. It is also necessary to select a reasonable noise-adding mechanism and noise scale parameters for the algorithm to meet the protection conditions of DP. Common noise-adding mechanisms include Gaussian mechanisms, Laplace mechanisms, exponential mechanisms, and so on. The selection of the noise scale parameters depends on the available privacy budget and the required global sensitivity. Finally, the classification hyperplane is acquired after training the model.

Based on this general framework, we propose three specific algorithms by adjusting the artificial noise. The DPSVM-TDP algorithm disturbs the training data set, the DPSVM-KFP algorithm disturbs the kernel function, and the DPSVM-MP algorithm adds noise to both.

4.2. DPSVM-TDP Algorithm

The DPSVM-TDP algorithm is a method of input perturbation that realizes DP by adding noise to the training data set. In DPSVM-TDP, the data set is preprocessed mainly to normalize the features so that they are in the interval . We assume that is the th feature of the th sample in the data set , and is the feature after normalization. The normalization formula can be expressed as follows:

Subsequently, we adopt the Laplace mechanism to add artificial noise to the training data set. According to formula (12), the scale parameter of the noise is determined by both the global sensitivity of the training data set and the privacy budget. The process of adding artificial noise can be expressed by the following formula:where the scale parameter of the Laplace noise is . is the global sensitivity of the training data set, and its calculation is shown in Section 5. is used to represent the training data set after adding noise.

Next, we project the features of the samples in to the high-dimensional space and use the kernel function to replace the inner product between the features in the high-dimensional space. Therefore, we can obtain the expression of the classification hyperplane as follows:

Finally, we train the model and input the test data set to achieve classification accuracy.

The DPSVM-TDP Algorithm 1 is described as follows.

	Input: data set , samples , attributes , privacy budget .
	Output: classification accuracy with Laplace noise added.
(1)	Normalize the data set so that the features of any sample belong to the interval ;
(2)	Add Laplace noise to the training data set . The scale parameter of noise is . is the global sensitivity of the training data set. is used to represent the training data set after adding noise;
(3)	Project the low-dimensional data to the high-dimensional feature space, and the mapped feature vector is obtained;
(4)	Describe the classification decision function as ;
(5)	Substitute the inner product of vector with kernel function and obtain the function ;
(6)	Train the SVM model and compute the classification decision function;
(7)	Input test data and achieve the classification accuracy with Laplace noise added.

4.3. DPSVM-KFP Algorithm

In DPSVM-KFP, the method of preprocessing the data set is the same as that in DPSVM-TDP. In addition, we do not need to add noise to the training data set during the process of perturbation. According to Section 3.2, we find that the kernel function can be used to replace the inner product of two high-dimensional vectors in an SVM to solve the issue of nonlinear classification. On this basis, to implement DP, Laplace noise is added to the kernel function. The scale of the noise is determined by both the global sensitivity of the kernel function and the privacy budget, and we obtain the following formula:where the scale parameter of the Laplace noise is . is the global sensitivity of the kernel function, and its calculation is also shown in Section 5. is used to represent the kernel function after adding noise. Then, we can obtain the formula of the perturbed classification hyperplane, as shown below:

Finally, we train the model and obtain the classification accuracy on the test data set. The DPSVM-KFP algorithm is described as follows.

Notably, Algorithm 2 realizes DP protection by perturbing the kernel function. This method is classified as a perturbation algorithm because it adds noise to the parameters of the given SVM.

	Input: data set , samples , attributes , privacy budget .
	Output: classification accuracy with Laplace noise added.
(1)	Normalize the data set so that the features of any sample belong to the interval ;
(2)	Project the low-dimensional data to the high-dimensional feature space, and the mapped feature vector is obtained;
(3)	Describe the classification decision function as ;
(4)	Substitute the inner product of vector with kernel function and obtain the function ;
(5)	Add Laplace noise to the kernel function . The scale parameter of noise is . is the global sensitivity of the kernel function. is used to represent the kernel function after adding noise;
(6)	Train the SVM model and compute the classification decision function;
(7)	Input test data and achieve the classification accuracy with Laplace noise added.

4.4. DPSVM-KFP Algorithm

In Sections 4.2 and 4.3, we perturb the training data set and the kernel function, respectively, to achieve DP protection. In this part, according to the composability of DP, we decide to allocate the available privacy budget to the training data set and the kernel function proportionally and realize DP protection by adding noise to both of them simultaneously.

The way in which we perturb the training data set is shown inwhere the scale parameter of the Laplace noise is . is the proportional parameter, which denotes the proportion of the privacy budget that we allocate to the training data set.

Moreover, the way in which we perturb the kernel function is shown inwhere the scale parameter of the Laplace noise is and is the proportion of the privacy budget that we allocate to the kernel function. The remaining steps are similar to those in the DPSVM-TDP and DPSVM-KFP algorithms. The DPSVM-MP algorithm is described as follows.

In Algorithm 3, we assign a privacy budget of to the training data set and assign a privacy budget of to the kernel function. According to Definition 4, we know that Algorithm 3 can finally accomplish DP. The specific proof is elaborated in Section 5.

	Input: data set , samples , attributes , privacy budget , proportional parameter .
	Output: classification accuracy with Laplace noise added.
(1)	Normalize the data set so that the features of any sample belong to the interval ;
(2)	Add Laplace noise to the training data set . The scale parameter of noise is . is the global sensitivity of the training data set. is used to represent the training data set after adding noise;
(3)	Project the low-dimensional data to the high-dimensional feature space, and the mapped feature vector is obtained;
(4)	Describe the classification decision function as ;
(5)	Substitute the inner product of vector with kernel function and obtain the function ;
(6)	Add Laplace noise to the kernel function . The scale parameter of noise is . is the global sensitivity of the kernel function. is used to represent the kernel function after adding noise;
(7)	Train the SVM model and compute the classification decision function;
(8)	Input test data and achieve the classification accuracy with Laplace noise added.

5. Privacy Analysis

In this section, we conduct a privacy analysis on the above three algorithms. We first calculate the global sensitivity levels of the training data set and the kernel function and then prove that these algorithms all satisfy DP.

5.1. Global Sensitivity of the Training Data set

According to the Laplace mechanism of DP, the scale of the added noise is related to the privacy budget and global sensitivity. Since the privacy budget is adjustable, we also need to calculate the global sensitivity to determine the amount of noise added. From Section 4.1, we know that there are two disturbed objects: the training data set and the kernel function, so we should calculate their global sensitivity levels.

Lemma 1. The global sensitivity of training data set equals to .

Proof. Suppose there are two adjacent data sets and , and the -th record is different. If the global sensitivity of the training data set is denoted as , according to the definition of global sensitivity,For matrix , , So can be expressed asBecause we normalize the data set, the values of all features belong to the interval . Therefore, we can obtain the maximum value of

5.2. Global Sensitivity of the Kernel Function

Regarding the kernel function, we have three different options, namely, a linear kernel, Gaussian kernel, and sigmoid kernel. According to Section 3.2, we know that these kernel functions are calculated in different ways. Therefore, their global sensitivities are also different.

Suppose there are adjacent data sets and , and the th record is different. is the kernel function of any two records in the data set , while is the kernel function of any two records in the data set . If we set the sensitivity of the kernel function as , according to the definition of sensitivity,

Lemma 2. The global sensitivity of linear kernel function equals to .

Proof. According to Section 3.2, the linear kernel function can be expressed asThus, we can obtain the sensitivity of the linear kernel function.Because the data set has been normalized, all the features of any sample belong to the interval . Therefore, when all the features in and take the value 1, and all the features in and take the value 0, achieves the maximized. Thus, the sensitivity of linear kernel function is

Lemma 3. The sensitivity of Gaussian kernel function equals to .

Proof. According to Section 3.2, the Gaussian kernel function can be expressed asThus, we can obtain the global sensitivity of the Gaussian kernel function.Because the data set has been normalized, all the features of any sample belong to the interval . Therefore, when all the features in and have the same value, is the maximized.When all the features in take the value 1 and all the features in take the value 0, is the minimized.Thus, the sensitivity of the Gaussian kernel function is

Lemma 4. The global sensitivity of sigmoid kernel function equals to .

Proof. According to Section 3.2, the sigmoid kernel function can be expressed asThus, we can obtain the sensitivity of the sigmoid kernel function.The hyperbolic tangent function can be transformed as follows:Therefore, we can transform according to formula (36):Then, let and , can be expressed asIn addition, function decreases monotonically when . Moreover, since , is maximized when () and achieve the minimized when (Thus, the sensitivity of the Gaussian kernel function is

5.3. Differential Privacy Proof of Algorithms

Theorem 1. The DPSVM-TDP algorithm satisfies DP.

Proof. Suppose that there are two adjacent data sets and . Their th records are different. and are the Laplace noise, which is added to the training data set and , respectively. In addition, the scale parameter of noise is . is the result of adding noise to the and Therefore, the training data set after adding Laplace noise satisfies DP. In addition, due to the consistency of DP, our subsequent operations on the training data set do not affect its privacy protection. As a result, the DPSVMTDP algorithm satisfies DP.

Theorem 2. The DPSVM-KFP algorithm satisfies DP.

Proof. Suppose that there are two adjacent data sets and . Their th records are different. is the kernel function of any two records in the data set , while is the kernel function of any two records in the data set . and are the Laplace noise, which is added to the kernel functions and , respectively. Besides, the scale parameter of noise is . is the result of adding noise to the and Therefore, the kernel function after adding Laplace noise satisfies DP. In addition, due to the consistency of DP, our subsequent operations on the kernel function do not affect its privacy protection. As a result, the DPSVM-KFP algorithm satisfies DP.

Theorem 3. The DPSVM-MP algorithm satisfies DP.

Proof. Suppose there is a data set . is the kernel function. is a function that perturbs the training data set, and the noise scale parameter is . is a function that perturbs the kernel function, and the noise scale parameter is .
According to the Theorem 1, we know that satisfies DP. In addition, in line with Theorem 2, satisfies DP.
As a result, according to Definition 4, the privacy budget of algorithm DPSVM-MP is the sum of and . Therefore, algorithm DPSVM-MP satisfies DP.

6. Experimental Results and Analysis

In this section, we evaluate the performance of the DPSVM-TDP, DPSVM-KFP, and DPSVM-MP algorithms on three UCI data sets.

First, different from DPSVM-TDP and DPSVM-KFP, DPSVM-MP contains a parameter , which may affect the performance of this algorithm. Therefore, we conduct experiments to analyse the classification accuracy of DPSVM-MP by varying the values of and the privacy budget . Second, according to the results of Experiment 6.1, we can determine the value of that maximizes the classification accuracy of DPSVM-MP and then conduct experiments to compare this algorithm with DPSVM-TDP and DPSVM-KFP. Third, to characterize the performance of our algorithms, we compare the algorithm that performs best in Experiment 6.2 with the DPSVM-DVP algorithm [13] in terms of their classification accuracies and running times. Three UCI data sets are used in our experiments: DryBean, Sensorless, and Covtype. The data set information is shown in Table 2.

6.1. Performance Evaluation of the DPSVM-MP Algorithm

In the DPSVM-MP algorithm, we allocate the privacy budget to the training data set and the kernel function proportionally and implement DP protection by adding noise to both simultaneously. According to Section 4.3, the privacy budget allocated to the training data set is , and the privacy budget allocated to the kernel function is . To analyze the performance of the DPSVM-MP algorithm when takes different values, we design the first experiment.

This experiment is based on the DryBean data set, which has a total of 13611 records, and each record has 16 features and one label. Moreover, there are two parameters ( and ) in the Gaussian kernel and sigmoid kernel. In scikit-learn, and default to and 0, respectively, where is the number of records in the data set. Therefore, in this experiment, we also set and to and 0, respectively.

When the parameters and are fixed, we are able to observe the classification accuracy changes with different proportional parameters by adjusting the value of the privacy budget. The results are shown in Figures 2 and 3. Notably, regardless of which kernel function is adopted, the classification accuracy of the DPSVM-MP algorithm is higher when the parameter is closer to 0.5. In other words, when the privacy budget is relatively evenly distributed to the training data set and the kernel function, the availability of the classification results is better. In addition, we find that the classification accuracy of the DPSVM-MP algorithm with the sigmoid kernel function is best, while the accuracy achieved when using the linear kernel is relatively low.

6.2. Comparison of the DPSVM-TDP, DPSVM-KFP, and DPSVM-MP Algorithms

In this section, to verify which perturbation algorithm is most efficient, we compare the classification accuracies of the DPSVM-TDP, DPSVM-KFP, and DPSVM-MP algorithms on the Sensorless data set. According to the previous experiment, we know that when the proportional parameter , the performance of the DPSVM-MP algorithm is best. Therefore, in the second experiment, the proportional parameter is set to 0.5. In addition, the two parameters and in the Gaussian kernel and sigmoid kernel are still set to and 0, respectively.

The results of the experiment are shown in Figure 4. First, we find that the performance of the DPSVM-KFP algorithm is generally higher than that of DPSVM-TDP. This outcome is caused by the following reason. In DP, if the privacy budget is fixed, the scale of the noise is proportional to the sensitivity of the parameters to be disturbed. In other words, if the sensitivity increases, the noise scale is enhanced, and the classification accuracy of the data worsens. DPSVM-TDP, as an input perturbation method, adds noise to the training data. According to Sections 5.1 and 5.2, the sensitivity of the training data set is . In contrast, DPSVM-KFP, as an algorithm perturbation method, adds noise to the kernel function. If we choose a linear kernel, the sensitivity is also . At this time, the scale of the added noise is the same, but due to the different perturbation locations, there is a slight gap between the accuracies of the DPSVM-TDP and DPSVM-KFP algorithms. For the Gaussian kernel and sigmoid kernel, the sensitivities of the two corresponding kernel functions are and , respectively, which are much smaller than . There is no doubt that in these two cases, the DPSVM-KFP algorithm performs much better the DPSVM-TDP algorithm, especially when the privacy budget is relatively low.

(a)

(b)

(c)

In addition, the classification accuracy of DPSVM-MP is always lower than that of DPSVM-KFP and DPSVM-TDP. In DPSVM-MP, we evenly allocate the privacy budget to the training data set and the kernel function and disturb both of them. Moreover, whether for the training data set or the kernel function, the allocated privacy budget is only half of that in DPSVM-TDP and DPSVM-KFP. Therefore, the scale of the noise added in DPSVM-MP is twice that of DPSVM-KFP and DPSVM-TDP. For instance, in DPSVM-TDP, we add noise with a scale parameter of to the training data set. However, in DPSVM-MP, we not only have to add noise with a scale parameter of to the training data set but also add noise with a scale parameter of to the kernel function. In addition, since is smaller than , the noise added in DPSVM-MP is greater. Therefore, the performance of DPSVM-MP, as shown in the experimental results, is worse than that of DPSVM-TDP and DPSVM-KFP.

In summary, to achieve DP protection for an SVM, the kernel function perturbation method has the best usability, perturbing the training data set is the second-best approach, and the method of mixed perturbation has the worst effect.

6.3. Performance Evaluation of the DPSVM-KFP and DPSVM-DVP Algorithms

In the previous experiment, we verified that the classification accuracy of DPSVM-KFP is superior to that of DPSVM-TDP and DPSVM-MP. To further evaluate the performance of DPSVM-KFP, in this section, we compare it with an SVM and DPSVM-DVP [16] on the Covtype data set.

DPSVM-DVP is an algorithm perturbation method that does not directly perturb the normal vector of the classification hyperplane but rather perturbs the dual variable of the support vector. The algorithm first solves the dual problem of SVMs by the SMO method, and the difference between the estimated value and real value of each support vector is recorded. Then, DPSVM-DVP calculates the ratio of each support vector to the sum of all support vectors and adds different levels of Laplace noise to the dual variables corresponding to these ratios.

Notably, both DPSVM-KFP and DPSVM-DVP are algorithm perturbation methods, but the disturbed parameters are quite different. The former disturbs the kernel function , while the latter disturbs the dual variable . In the following experiment, for these two algorithms, we choose the sigmoid kernel function. In addition, the parameters and are still set to and 0, respectively.

First, we analyse the relationship between the classification accuracy and the privacy budget. We fix the number of samples to 5000 here. The privacy budget epsilon is set to 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, and 1 in succession. For each privacy budget, we conduct three experiments by using the SVM, DPSVM-KFP, and DPSVM-DVP algorithms. The results of the experiment are shown in Figure 5. Notably, since no noise is added by the SVM algorithm, it has the highest classification accuracy and is not affected by the privacy budget. In addition, when the privacy budget is greater than 0.5, the classification accuracy of the DPSVM-DVP algorithm is slightly higher than that of the DPSVMKFP algorithm, both of which are above 0.8. However, if , the usability of DPSVM-KFP is significantly higher than that of DPSVM-DVP. Finally, when the privacy budget is less than 0.001, the classification accuracies of the two algorithms both tend to 0.5.

As a result, upon judging the usability of the algorithms, compared with that of DPSVM-DVP, the classification accuracy of DPSVM-KFP becomes more stable when the privacy budget decreases. In addition, we find that DPSVM-KFP is able to withstand more noise and maintain acceptable classification accuracy.

Second, we analyse the relationship between the classification accuracy and sample size. The privacy budget is fixed to 1. The number of samples is set to 100, 500, 1000, 5000, 10000, 20000, and 40000 in succession. The results of the experiment are shown in Figure 6. As the number of samples increases, the classification accuracy of DPSVM-KFP and DPSVM-DVP exhibits upward trends. However, in general, DPSVM-KFP performs better, both at low and high data volumes.

Third, in addition to algorithm availability, running time is also an important criterion for measuring the pros and cons of an algorithm. Therefore, we fix the privacy budget to 1. The number of samples is set to 500, 1000, 5000, and 10000 in succession. Then, we observe the running times of the SVM, DPSVM-KFP, and DPSVM-DVP.

The results of the experiment are shown in Figure 7. It can be seen from the figure that the running times of the SVM and DPSVM-KFP are close and shorter than that of DPSVM-DVP. According to Section 4.2, in DPSVM-KFP, we only add noise to the calculated value of the kernel function. Therefore, the running time of DPSVM-KFP is close to that of the SVM. In contrast, we need to calculate the privacy budget allocated to each dual variable in the DPSVM-DVP algorithm, which undoubtedly requires a longer running time. Especially when dealing with large-scale data sets, our algorithm can save more running time. For example, when the number of samples is equal to 10,000, algorithm DPSVM-DVP takes 93.6 minutes, while our algorithm only takes 72.9 minutes, saving nearly 22% of the running time.

7. Conclusions

Currently, nonlinear classification and privacy data leakage are two major challenges. Few studies focus on these issues in SVMs.

To solve these problems, we propose a universal framework and three specific algorithms in this paper. Moreover, we conduct privacy analysis by calculating the sensitivities of the DPSVM-TDP, DPSVM-KFP, and DPSVM-MP algorithms. Then, we prove that all three algorithms satisfy DP. Last, we complete three experiments and obtain the following results. (1) When we disturb the training data set and the kernel function simultaneously, the more evenly the privacy budget is allocated, the better the performance of the DPSVM-MP algorithm. (2) The method of perturbing the kernel function is more efficient than training data set perturbation and mixed perturbation, especially for the sigmoid kernel, and the classification accuracy is higher. (3) Compared with DPSVM-DVP, DPSVM-KFP has better classification accuracy and stability and a shorter running time.

In addition, there are some shortcomings in our research. For example, the linear kernel, Gaussian kernel, and sigmoid kernel are suitable for different types of data sets with SVM. However, on the Sensorless data set, the sigmoid kernel is able to endure a lower privacy budget than that required for the linear kernel and the Gaussian kernel. However, it is insufficient to compare the fitness levels of these three kernel functions with DP on a single data set. As a result, in the future, we will add Laplace noise to these kernel functions on different types of data sets and compare their classification accuracy.

Data Availability

The data set DryBean used to support the findings of this study is available through visiting: https://archive.ics.uci.edu/ml/datasets/Dry+Bean+Dataset/. The data set Sensorless used to support the findings of this study is available through visiting https://archive.ics.uci.edu/ml/datasets/Dataset+for+Sensorless+Drive+Diagnosis/. The data set Covtype used to support the findings of this study is available through visiting https://archive.ics.uci.edu/ml/machine-learning-databases/covtype/.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (61872197 and 61972209), the Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX180891), the Natural Science Foundation of Jiangsu Province (BK20161516 and BK20160916), and the Postdoctoral Science Foundation Project of China (2016M601859).

References

X.-Y. Jing, X. Zhang, X. Zhu et al., “Multiset feature learning for highly imbalanced data classification,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 1, pp. 139–156, 2021.
View at: Publisher Site | Google Scholar
M. Luo, X. Chang, L. Nie, Y. Yang, A. G. Hauptmann, and Q. Zheng, “An adaptive semisupervised feature analysis for video semantic recognition,” IEEE Transactions on Cybernetics, vol. 48, no. 2, pp. 648–660, Feb. 2018.
View at: Publisher Site | Google Scholar
D. Zhang, L. Yao, K. Chen, S. Wang, X. Chang, and Y. Liu, “Making sense of spatio-temporal preserving representations for EEG-based human intention recognition,” IEEE Transactions on Cybernetics, vol. 50, no. 7, pp. 3033–3044, July 2020.
View at: Publisher Site | Google Scholar
K. Chen, L. Yao, D. Zhang, X. Wang, X. Chang, and F. Nie, “A semisupervised recurrent convolutional attention model for human activity recognition,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 5, pp. 1747–1756, May 2020.
View at: Publisher Site | Google Scholar
C. Hou, L. L. Zeng, and D. Hu, “Safe classification with augmented features,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 9, pp. 2176–2192, 2019.
View at: Publisher Site | Google Scholar
A. S. Mohammad, A. Rattani, and R. Derahkshani, “Eyeglasses detection based on learning and non-learning-based classification schemes,” in IEEE International Symposium on Technologies for Homeland Security (HST), pp. 1–5, 2017.
View at: Google Scholar
F. Fei, S. Li, H. Dai, C. Hu, W. Dou, and Q. Ni, “A K-anonymity based schema for location privacy preservation,” IEEE Transactions on Sustainable Computing, vol. 4, no. 2, pp. 156–167, 2019.
View at: Publisher Site | Google Scholar
G. Yang, J. Li, S. Zhang, and Y. Li, “An enhanced l-diversity privacy preservation,” in Proceedings of the International Conference on Fuzzy Systems & Knowledge Discovery, pp. 1115–1120, Zhangjiajie, China, August 2014.
View at: Google Scholar
M. Hochman, “Geometric rigidity of $\times m$ invariant measures,” Journal of the European Mathematical Society, vol. 14, no. 5, pp. 1539–1563, 2012.
View at: Publisher Site | Google Scholar
F. Liu, “Generalized Gaussian mechanism for differential privacy,” IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 4, pp. 747–756, 2019.
View at: Publisher Site | Google Scholar
M. Kafai and K. Eshghi, “CROification: accurate kernel classification with the efficiency of sparse linear SVM,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 1, pp. 34–48, 1 Jan. 2019.
View at: Publisher Site | Google Scholar
J. Mathew, C. K. Pang, M. Luo, and W. H. Leong, “Classification of imbalanced data by oversampling in kernel space of support vector machines,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 9, pp. 4065–4076, 2018.
View at: Publisher Site | Google Scholar
F. Farokhi, “Privacy-preserving public release of datasets for support vector machine classification,” IEEE Transactions on Big Data, vol. 7, no. 5, pp. 893–899, 2021.
View at: Publisher Site | Google Scholar
F. Xu, J. Peng, J. Xiang, and D. Zha, “A practical differentially private support vector machine,” in Proceedings of the 2019 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation, pp. 1237–1242, Leicester, UK, August 2019.
View at: Google Scholar
J. Cai, X. Liu, and Y. Wu, “SVM learning for default prediction of credit card under differential privacy,” in Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice, pp. 51–53, USA, December 2020.
View at: Google Scholar
Y. Zhang, Z. Hao, and S. Wang, “A differential privacy support vector machine classifier based on dual variable perturbation,” IEEE Access, vol. 7, pp. 98238–98251, 2019.
View at: Publisher Site | Google Scholar
H. Wang and S. Li, “Differential private multiple classification algorithm for SVM,” in Proceedings of the 2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS), pp. 604–609, Nanjing, China, November 2018.
View at: Google Scholar
T. Zhu, G. Li, W. Zhou, and P. S. Yu, “Differentially private data publishing and analysis: a survey,” IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 8, pp. 1619–1638, 2017.
View at: Publisher Site | Google Scholar
M. Senekane, “Differentially private image classification using support vector machine and differential privacy,” Machine Learning and Knowledge Extraction, vol. 1, pp. 483–491, Mar. 2019.
View at: Publisher Site | Google Scholar
C. C. Chang and C. J. Lin, “LIBSVM: a library for support vector machines,” ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 1–27, April 2011.
View at: Publisher Site | Google Scholar
P. Bouboulis, S. Theodoridis, C. Mavroforakis, and L. EvaggelatouDalla, “Complex support vector machines for regression and quaternary classification,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 6, pp. 1260–1274, 2015.
View at: Publisher Site | Google Scholar
J. C. Platt, “A fast algorithm for training support vector machines,” Journal of Information Technology, vol. 2, no. 5, pp. 1–28, 1998.
View at: Google Scholar
S. Jayasumana, R. Hartley, M. Salzmann, H. Li, and M. Harandi, “Kernel methods on riemannian manifolds with Gaussian RBF kernels,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 12, pp. 2464–2477, 2015.
View at: Publisher Site | Google Scholar
Q. Liu, P. Lian, and H. Liu, “Natural gas load forecasting using fuzzy sigmoid kernel support vector machines with genetic algorithms,” in Proceedings of the 2019 Chinese Automation Congress (CAC), pp. 640–645, 2019.
View at: Google Scholar
C. Xu, J. Ren, Y. Zhang, Z. Qin, and K. Ren, “DPPro: differentially private high dimensional data release via random projection,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 12, pp. 3081–3093, 2017.
View at: Publisher Site | Google Scholar
C. Dwork, “A firm foundation for private data analysis,” Communications of the ACM, vol. 54, no. 1, pp. 86–95, 2011.
View at: Publisher Site | Google Scholar
Y. Huang, G. Yang, Y. Xu, and H. Zhou, “Differential privacy principal component analysis for support vector machines,” Security and Communication Networks, vol. 2021, Article ID 5542283, pp. 1–12, 2021.
View at: Publisher Site | Google Scholar
M. Du, K. Wang, Z. Xia, and Y. Zhang, “Differential privacy preserving of training model in wireless big data with edge computing,” IEEE Transactions on Big Data, vol. 6, no. 2, pp. 283–295, 2020.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Yuxian Huang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Security and Communication Networks

Differential Privacy Protection for Support Vector Machines for Nonlinear Classification

Abstract

1. Introduction

2. Related Work

3. Preliminaries and Definitions

3.1. Support Vector Machines

3.2. Kernel Method for SVMs

3.3. Differential Privacy

4. Framework and Algorithms

4.1. Proposed Framework

4.2. DPSVM-TDP Algorithm

4.3. DPSVM-KFP Algorithm

4.4. DPSVM-KFP Algorithm

5. Privacy Analysis

5.1. Global Sensitivity of the Training Data set

5.2. Global Sensitivity of the Kernel Function

5.3. Differential Privacy Proof of Algorithms

6. Experimental Results and Analysis

6.1. Performance Evaluation of the DPSVM-MP Algorithm

6.2. Comparison of the DPSVM-TDP, DPSVM-KFP, and DPSVM-MP Algorithms

6.3. Performance Evaluation of the DPSVM-KFP and DPSVM-DVP Algorithms

7. Conclusions

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright