Research Article

ASCF: Optimization of the Apriori Algorithm Using Spark-Based Cuckoo Filter Structure

Algorithm 2

ASCF phase two.
INPUT: R = Transactions (lists of items resulting from phase 1), min_sup = Minimum Support, and Frequent Items = RDD of all the frequent items that are stored in Cuckoo filter.
OUTPUT: K-F-Itemsets = RDD of all the k-frequent itemsets.
Foreach transaction T in R
 filter(T: len (T ≥ k))
End Foreach
mapPartitions (delete non frequent items)
Do while k ≥ 2
Foreach transaction T in Pruned transactions
  filter (T: len (T ≥ k))
End Foreach
 mapPartitions (MakePairOfkItems)
Foreach list in lists of pairs of k items
 flatmap (list: pair of k items)
Foreach pair of pairs of k items
   map (pair: (Pair, 1))
  End Foreach
End Foreach
 ReduceByKey (Pair, Count)
 End ReduceByKey
 K-F-Itemsets = filter (findFrequent (min_sup))
IF K-F-Itemsets.count() >1 then
  Foreach itemset in K-F-Itemsets
   F = flatmap (itemset: get Items).distinct()
  End Foreach
   Differ = Frequent Items. Subtract (F)
  IF Differ! = [ ]
   Cf.delete (Differ)
   K = k + 1
   Frequent items = F
   Foreach transaction T in Pruned transactions
    filter (T: len(T ≥ k))
   End Foreach
    mapPartitions (delete non frequent items)
  Else k = k + 1
Else k = 1
End