Research Article
3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies
Algorithm 5
CUDA multi-GPU code of NLM algorithm with partial unrolling strategy.
| (1) /* “my_in_img” and “my_out_img” are respectively the sections | | of the images “in_img” and “out_img” splitted between the | | “n_gpus” GPUs. */ | | (2) int const i_1 = threadIdx.x + blockDim.x*blockIdx.x; | | (3) int const i_2 = threadIdx.y + blockDim.y*blockIdx.y; | | (4) /* local statements */ | | (5) if ((i_1 ≥ 0) && (i_1 < X_Dim) && (i_2 ≥ 0) && (i_2 < Y_Dim)) { | | (6) for (i_3 0; i_3 Z_Dim/n_gpus; i_3) { | | (7) /* compute my_out_img i_1 + i_2*X_Dim + i_3*X_Dim*Y_Dim/n_gpus | | using my_in_img i_1 + i_2*X_Dim + i_3*X_Dim*Y_Dim/n_gpus */ | | (8) } } |
|