Research Article
An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering
Algorithm 3
Algorithm of MapReduce-based document similarity calculation.
| Document similarity calculation | | Input: <m, list(d)> | | Output: <pair of d, similarity> | | Notation: Write (k, v) outputs <k, v> | | Class mapper | | Method map (heading, list(d)) | | m ← heading | | For each d1 ∈ D | | r ← Sim (m, d1) | | For each d2 in list(d) | | s ← string (d1 +” &” + d2) | | Write (s, r) | | End for | | End for | | Class reducer | | Method reduce (s, list(r)) | | Sum ← 0, count ← 0 | | For each r in list(r) | | Sum ← sum + r | | Count ← count +1 | | End for | | Write (s, sum/count) |
|