| input data set | | k: number of nearest neighbor | | : threshold for LOF | | N: data block number | | output data set which | | Initialize a Hadoop Job | | Set TaskMapReduce class | | Logically divide X into multiple data blocks: . | | In the -th TaskMapReduce | | FirstMapper | | input | | output <key, value> = < > | | for each data di, i = 1, 2, ..., m do | | Calculate disij = distance (di, dj), j = 1, ... , m | | Sort disij of di | | for each disij of di do | | if & | | add di and disij in k-distinct-neighbor record (, ) | | end | | Calculate k-distinct-distance record k-dis (di) | | end | | First Record | | input <key, value> = < di, [(ok, dis (di, ok)), k-dis (di) > | | output <key, value> = < di, [(ok, dis (di, ok)), k-dis (di) > | | SecondMapper | | input < key, value> = < di, [(ok, dis (di, ok)), k-dis (di) > | | output < key, value> = < di, [(ok, reach-dis (di, ok)) > | | for ok ∈ k-distinct-neighbor do | | if k-dis (di) < dis (di, ok) | | reach-dis (di, ok) = dis (di, ok) | | else reach-dis (di, ok) = k-dis (di, ok) | | end | | SecondReducer | | input < key, value> = < di, (ok, reach-dis (di, ok)) > | | output < key, value> = < di, lrd (di) > | | for value do | | , | | ok ∈ k-distinct-neighbor | | end | | ThirdMapper | | input < key, value> = < di, lrd (di) > | | output < key, value> = < di (lof (di) > θ), lof (di) > | | for ok ∈ k-distinct-neighbor do | | , | | ok ∈ k-distinct-neighbor | | end | | if lof (di) > θ | | output | | ThirdReduce | | input < key, value> = < di (lof (di) > θ), lof(di) > | | output < key, value> = < , lof () > | | for value do | | Sort for and record | | End |
|