Dynamic Warping Network for Semantic Video Segmentation

<div>Qualitative results from the Cityscapes dataset. Baseline method: training the model on single frames and inferring the segmentation maps on single frames. Warping-based method: adopting the original warping operation implemented with standard bilinear interpolation to propagate and fuse the features brings a slight improvement. Our method: utilizing the flow-guided convolution to adaptively warp the interframe features and introducing temporal consistency loss to explicitly supervise the warped features instead of simple feature propagation and fusion, hence producing more accurate segmentation results.</div>

Complexity

Dynamic Warping Network for Semantic Video Segmentation

Figure 1