Research Article
An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering
Algorithm 3
Algorithm of MapReduce-based document similarity calculation.
Document similarity calculation | Input: <m, list(d)> | Output: <pair of d, similarity> | Notation: Write (k, v) outputs <k, v> | Class mapper | Method map (heading, list(d)) | m ← heading | For each d1 ∈ D | r ← Sim (m, d1) | For each d2 in list(d) | s ← string (d1 +” &” + d2) | Write (s, r) | End for | End for | Class reducer | Method reduce (s, list(r)) | Sum ← 0, count ← 0 | For each r in list(r) | Sum ← sum + r | Count ← count +1 | End for | Write (s, sum/count) |
|