Research Article

An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering

Algorithm 3

Algorithm of MapReduce-based document similarity calculation.
Document similarity calculation
Input: <m, list(d)>
Output: <pair of d, similarity>
Notation: Write (k, v) outputs <k, v>
Class mapper
 Method map (heading, list(d))
  m ← heading
  For each d1 ∈ D
   r ← Sim (m, d1)
   For each d2 in list(d)
    s ← string (d1 +” &” + d2)
    Write (s, r)
   End for
  End for
Class reducer
 Method reduce (s, list(r))
Sum ← 0, count ← 0
 For each r in list(r)
   Sum ← sum + r
   Count ← count +1
  End for
  Write (s, sum/count)