Research Article
High Performance Implementation of 3D Convolutional Neural Networks on a GPU
Algorithm 2
Convolutional layer implemented with WMFA
(
, ).
| is the number of image tiles. | | is the input tile size. | | Neighbouring tiles overlap by . | | is input tile in channel . | | is filter in channel . | | is output tile in filter . | | for to do | | for to C do | | | Scatter to matrices : | | end for | | end for | | for to do | | for to C do | | | Scatter to matrices : | | end for | | end for | | for to do | | for to do | | for to do | | | end for | | end for | | end for | | for to do | | for to do | | Gather from matrices | | | end for | | end for |
|