Research Article

Matching Large Scale Ontologies Based on Filter and Verification

Algorithm 1

Pseudocode description of the framework.
 Inputs: Source ontology (SO) and Target ontology (TO)
 Outputs: Alignments (A)
 Variables:ik
 hashmap<(length, prefix), ID>//save length and prefix of entities in SO and TO
 typicalList < IDeso, IDeto>//save typical entities in SO and TO
 blockSetSO//save the set of blocks of SO
 blockSetTO//save the set of blocks of TO
 subOntoSetSO//save the set of sub-ontologies of SO
 subOntoSetTO//save the set of sub-ontologies of TO
(1)//Step 1: generate typical entities
(2)for entity ei in SO, TO:
(3) length = number of letters in label of ei
(4) prefix = prefix of label of ei
(5) hashmap.put(length, prefix, IDei)
(6)end for
(7)for key (length, prefix) in hashmap:
(8) for IDs in values (IDe1, IDe2, …, IDen):
(9)  if simT(IDei, IDej) > threshold and IDei ∈ SO and IDej ∈ TO://simT is shown in formula (2)
(10)   typicalList.add(IDei, IDej)
(11)  end if
(12)end for
(13)end for
(14)//Step 2: clustering typical entities
(15)for every entities IDeso, IDeto in typicalList < IDeso, IDeto>:
(16) setSO.add(IDeso), setTO.add(IDeto)
(17) blockSetSO = partition(setSO), blockSetTO = partition(setTO)//partition is shown in [15].
(18)end for
(19)//Step 3: partitioning/merging blocks
(20)for every entity sets Si in blockSetSO:
(21) for every entities IDeso[k] in Si:
(22)  tempEntity = typicalList.get(indexOf(IDeso[k])).get(1)
(23)  if k = 0:
(24)   //correSet records the corresponding setTO in blockSetTO
(25)   correSet = setTO which contains tempEntity in blockSetTO
(26)  else if tempEntity is not in correSet:
(27)   //sourceSet records the corresponding setSO in blocksSetSO
(28)   sourceSet = setSO which contains tempEntity in blockSetSO
(29)   correSet.add(tempEntity)
(30)   sourceSet.delete(tempEntity)
(31)  end if
(32) end for
(33)end for
(34)//Step 4: extracting sub-ontologies
(35)for every entity sets Si in blockSetSO, blockSetTO:
(36) do:
(37)  n = Si.size
(38)  tempSet = null
(39)  //rH, rC, ek is shown as formula (3)∼(8)
(40)  for every entities IDeso[k] in Si:
(41)  candidateEntity = IDeso[k].get(rdfs:sub-ClassOf)
(42)   if rH(candidateEntity, IDeso) > threshold:
(43)    tempSet.add(candidateEntity)
(44)  candidateEntity = IDeso[k].get(rdfs: hasSomeValueFrom)
(45)   if rC(candidateEntity, IDeso) > threshold:
(46)    tempSet.add(candidateEntity)
(47)  update extension factor ek
(48)  end for
(49)  Si.add(tempSet)
(50) while(Si.size ! = n)
(51)end for
(52)subOntoSetSO = blockSetSO
(53)subOntoSetTO = blockSetTO
(54)//Step 5: matching sub-ontologies
(55)for every pair sub-ontologies (subSO[i], subTO[i]) in subOntoSetSO, subOntoSetTO:
(56) //matchStruc by V-DOC in [18].
(57) structureAlignment = matchStruc(subSO[i], subTO[i])
(58) //matchSema by GMO in [19].
(59) sematicAlignment = matchSema(subSO[i], subTO[i], structureAlignment)
(60) A.add(structureAlignment), A.add(sematicAlignment)
(61)end for