Research Article

PTF-SimCM: A Simple Contrastive Model with Polysemous Text Fusion for Visual Similarity Metric

Table 1

Notation list of the proposed method.

SymbolDescription

Input image
Textual description of image
Image augmentation
Image encoder
Cross-modal encoder
The number of cross-modal embeddings
Multimodal projector module
Predictor module
Fused feature
Metric embedding
Vector output from predictor
Stop-gradient operation