Research Article

Facile Conversion and Optimization of Structured Illumination Image Reconstruction Code into the GPU Environment

Figure 4

Inline code. The calling function requires additional memory access to keep machine states in memory, called a stack. Thus, frequent function calls will result in frequent memory access, which results in performance degradation. The forward_diff() and backward_diff() functions are called 24 times in each iteration in the loop, which incurs significant performance overhead. This figure shows that the function body can be directly used instead of calling functions, i.e., inline code. Note that the dimension of the input acted on by the diff () function is reduced in size by in the output. To preserve the matrix size and avoid creating temporary variables, we use two preallocated matrixes (temp1 and temp2) and store the reduced matrix (output of diff()) using array indexing. We set the preallocated matrixes (temp1 and temp2) to zeros, i.e., , before calling the diff() function as the original Hessian-SIM code fills zeros to the reduced dimension. Note that we use element-wise multiplication ( .) to set a matrix to zeros as we found that it offers better performance than other known approaches, e.g., .