Research Article
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU
Figure 4
Coalesced access pattern for the elementwise operation on a vector and the subsequent parallel reduction operation per thread group in shared memory.