Research Article

A Symmetric Fusion Learning Model for Detecting Visual Relations and Scene Parsing

Figure 2

A brief illustration of our proposed approach. We utilize the image annotation from the visual relationship data set and pretrained word vectors from large-scale Common Crawl for training. We predict the visual relationship triplets by matching the embeddings of the visual and semantic modules. Then we apply the symmetric learning module to perform reverse supervision and correction while alleviating noisy labels.