D2D Big Data Privacy-Preserving Framework Based on (a, k)-Anonymity Model
Algorithm 1
Multisensitive value (a, k)-anonymity algorithm based on MapReduce.
Input: Data table DT, anonymous constrains (k), specified sensitive value (s), frequency constrains (a).
Output: Anonymous table that satisfies multiple sensitive values (a, k)-anonymity
(1)
Obtain global files from the Distributed File System (DFS)
(2)
Eq-id = findFinestEQ () and . // Find the identifier of the optimal EC, and at least k data record in the EC.
(3)
Output-V = Ø. //V is a list of values.
(4)
Add key-value pairs ((dim, 1) (s, 1)) for each dimension in .
(5)
Output (k, V ((dim, 1) (s, 1))).
(6)
If , superimposes the frequency of dim. // is the first dimension traversed to.
(7)
Otherwise, iterate again until the next and dim are found to have the same value.
(8)
If , superimpose the frequency of s. // is the first SA traversed.
(9)
Otherwise, traverse again until the next and s are found to have the same value.
(10)
Where s contains all the values if SA and superimposes the frequency of occurrence of the SA, (s1, 1) (s2, 1) (s3, 1), …, (sn, 1).
(11)
. // Constrain the SA so that it satisfies the (a, k)-anonymity, refers to the dimension frequency containing SA, and refers to the frequency of the dimension.
(12)
For (i in [1, …, dimension]). // i indicates the number if dimensions.
(13)
c-dim = findDimToCut (V, ).
(14)
For p in [q, …, 2]. // p is the number of EC, and q is the maximum number that can be split.
(15)
If there is no violation of the (a, k)-anonymity during the sharing process, continue cutting, .