Research Article

Facile Conversion and Optimization of Structured Illumination Image Reconstruction Code into the GPU Environment

Figure 8

Impact of improved hardware and code on algorithm execution performance. We first measured the elapsed time of the vanilla code with a given image as a performance baseline from all the machines. Then, we applied each approach independently and measured the elapsed time to show the performance improvement of each approach. We applied all approaches with the single CPU core in the CPU-Single-core-All case. The CPU-Multicores-Process and CPU-Multicores-Threads cases show elapsed time when each task is executed in different cores without applying other approaches (red bar number 1). We applied all the approaches including multicores with which six tasks are executed in different cores, i.e., the CPU-Multicores-All case, which shows the best performance without exploiting the GPU (red bar number 2). The GPU-gpuArray case shows the elapsed time when we utilize the GPU by using gpuArray() function only without applying other approaches. This case clearly shows that performance improvement is limited even with the GPU if the code is written inefficiently. The GPU-All (Script-dup) case, the GPU-All (Script-non-dup) case, and the GPU-All (Func-dup) case show the benefits of avoiding duplicated operations and utilizing functions instead of scripts. While the performance improvement from these approaches was marginal in CPU-only code, they affect overall execution time significantly in GPU-optimized code when the execution time is less than a second. The GPU-All case shows the elapsed time with all approaches that we introduced in this work, and the best performance we can achieve (bottom red bar in each image panel).
(a) Intel Core i7 8750H, 32 GB DDR4 RAM
(b) Intel Core i9-12900K, 64 GB DDR5 RAM
(c) AMD Ryzen Threadripper 3990X, 128 GB DDR4 RAM