Research Article
Facile Conversion and Optimization of Structured Illumination Image Reconstruction Code into the GPU Environment
Figure 4
Inline code. The calling function requires additional memory access to keep machine states in memory, called a stack. Thus, frequent function calls will result in frequent memory access, which results in performance degradation. The
forward_diff() and
backward_diff() functions are called 24 times in each iteration in the loop, which incurs significant performance overhead. This figure shows that the function body can be directly used instead of calling functions, i.e., inline code. Note that the dimension of the input acted on by the
diff () function is reduced in size by in the output. To preserve the matrix size and avoid creating temporary variables, we use two preallocated matrixes
(temp1 and
temp2) and store the reduced matrix (output of
diff()) using array indexing. We set the preallocated matrixes
(temp1 and
temp2) to zeros, i.e., , before calling the
diff() function as the original Hessian-SIM code fills zeros to the reduced dimension. Note that we use element-wise multiplication (
.) to set a matrix to zeros as we found that it offers better performance than other known approaches, e.g., .