Input:A, ,x, N, x_norm, r_norm |
Output:x |
(1) | fork (A, ,x, N, x_norm, r_norm) |
(2) | Th_id, Tt_id (Thread id and Total Thread by OS) |
(3) | St_RowId ← Ed_RowId ← 0 {Start and end ids} |
(4) | IfTh_id < N mod Tt_ththen |
(5) | St_RowID ← Th_id∗N/Tt/th+Th_id |
(6) | Ed_RowID ← St_RowID+N/Tt_th+1 |
(7) | else |
(8) | St_RowID ← Th_id ∗ N/Tt_th+Tt_th |
(9) | Ed_RowID ← St_RowID+N/Tt_th |
(10) | end_if |
(11) | r {Local Residual vector for allocated nodes} |
(12) | error ← 1 |
(13) | Whileerror > Thresholddo |
(14) | r_tmp ← x_tmp ← 0 {Local Norms for thread} |
(15) | foreachi ← St_RowID, Ed,RowIDdo |
(16) | repeat |
(17) | r[i-St_RowID]←b[i]-A[i,j]∗x[j] |
(18) | until For each entry j for row i of A |
(19) | r_tmp ← r_tmp + r[i-St_RowID]2 |
(20) | end for |
(21) | x_norm ← r_norm ← 0 [Global Norms] |
(22) | barrier |
(23) | foreachi ← St_RowID, Ed_RowIDdo |
(24) | x[i] ← x[i] + r[i-St_RowID]/A[i,i] |
(25) | x_tmp ← x_tmp+x[i]2 |
(26) | end for |
(27) | lock |
(28) | r_norm ←r_norm+r_tmp |
(29) | x_norm←x_norm+x_tmp |
(30) | unlock |
(31) | barrier |
(32) | error ← |
(33) | end while |
(34) | join |