Mobile Information Systems

Research Article

RDMMFET: Representation of Dense Multimodality Fusion Encoder Based on Transformer

Figure 2

Pretraining strategy of the RDMMFET model. The strategy consists of three parts: masked language model (a), masked image model (b), and multimodality fusion task (c).