Research Article

A Real-Time and Long-Term Face Tracking Method Using Convolutional Neural Network and Optical Flow in IoT-Based Multimedia Communication Systems

Table 1

Architectures of cascade convolutional networks. “” stands for kernel size, “” means stride, and “” is padding number.

Stage 1
InputColor-scale image

ConvolutionOutputs: 10, : , : 1, : 1
MaxPooling: , : 2
ConvolutionOutputs: 16, : , : 1, : 1
ConvolutionOutputs: 32, : , : 1, : 1
ConvolutionOutputs: 6, : , : 1, : 1
Stage 2
InputColor-scale image
ConvolutionOutputs: 28, : , : 1, : 1
MaxPooling: , : 2
ConvolutionOutputs: 48, : , : 1, : 1
MaxPooling: , : 2
ConvolutionOutputs: 64, : , : 1, : 1
DenseOutputs: 128
DenseOutputs: 6
Stage 3
InputColor-scale image
ConvolutionOutputs: 32, : , : 1, : 1
MaxPooling: , : 2
ConvolutionOutputs: 64, : , : 1, : 1
MaxPooling: , : 2
ConvolutionOutputs: 128, : , : 1, : 1
DenseOutputs: 256
DenseOutputs: 6