Research Article

PTF-SimCM: A Simple Contrastive Model with Polysemous Text Fusion for Visual Similarity Metric

Algorithm 1

PTF-SimCM’s main learning algorithm.
Input:
set of images with description and distributions of transformations;
initial parameters, encoder, multimodal projector, predictor;
cross-modal encoder;
the number of cross-modal embeddings;
optimizer, updates parameter using the loss gradient;
total number of optimization steps and batch size;
learning rate schedule;
(1)for to do
(2)   //sample a batch of N image-text pairs
(3)  for do
(4)    //sample image transformations
(5)   
(6)   for to do
(7)    
(8)    
(9)    
(10)   end
(11)    //compute the total loss
(12)  end
(13)   //compute the total loss gradient
(14)   //update parameters
(15)end
(16)return