Multisemantic Level Patch Merger Vision Transformer for Diagnosis of Pneumonia

<div>The overall establishing and using pipeline of MP-ViT Model. The raw images are input to the Patch Fuser after image enhancement and layer normalization, and then the fusioned features are obtained after model process. Those fusioned features are trained together with smoothed labels to build MP-ViT Model, and then it is used for prediction.</div>

Computational and Mathematical Methods in Medicine

fig1

Figure 1

Figure 1: Multisemantic Level Patch Merger Vision Transformer for Diagnosis of Pneumonia