Research Article
RDMMFET: Representation of Dense Multimodality Fusion Encoder Based on Transformer
Figure 2
Pretraining strategy of the RDMMFET model. The strategy consists of three parts: masked language model (a), masked image model (b), and multimodality fusion task (c).