Research Article

A Data Leakage Prevention Method Based on the Reduction of Confidential and Context Terms for Smart Mobile Devices

Algorithm 1

CBDLP.
Input: - Confidential documents set
- Non-confidential documents set
- The minimum similarity threshold
Output: - The set of clusters, each with the centroid and corresponding graph
- The set of confidential terms in clusters
- The set of context terms
(1)
(2) % The result of clustering is saved in
(3) Initializing %The scores of confidential terms are saved in
(4) Initializing %The context terms set of each confidential term is saved in
(5) for (each in )
(6) Calculate the similarity between and the other clusters
(7) Create language model for , and calculate the scores for each confidential term
(8) initial the threshold of cluster similarity
(9) while ()
(10) All clusters whose similarity to >
(11) Create language model for the documents of
(12) Based on new language model, Update the scores of confidential terms
(13) for(each confidential term in )
(14) Detect the occurrence of ct in
(15) For each context term of , calculate the probability of the appearance both and
the context term in confidential documents.
(16) For each context term of , calculate the probability of the appearance both and the
context term in non-confidential documents.
(17) Calculate the value of for each confidential term
(18) Detect all clusters whose similarity is greater than , and detect the occurrences of all terms in the clusters.
(19) Update the probability of the context terms that appear in the scopes of different confidential terms
(20)
(21) Reduce the value of
(22)
(23)