Research Article
Multiple Context Learning Networks for Visual Question Answering
Figure 1
A diagram of the proposed context learning method, which simultaneously learns multiple contexts by using a uniform context learning framework. VCL and TCL model the intra-modal contexts in the MCL layer, while VTCL models the visual-textual context. For image features (V) and question feature (T), the handled features with context information are (V)′ and (T)′, respectively.