Complexity

Research Article

PTF-SimCM: A Simple Contrastive Model with Polysemous Text Fusion for Visual Similarity Metric

PTF-SimCM’s main learning algorithm.

	Input:
	set of images with description and distributions of transformations;
	initial parameters, encoder, multimodal projector, predictor;
	cross-modal encoder;
	the number of cross-modal embeddings;
	optimizer, updates parameter using the loss gradient;
	total number of optimization steps and batch size;
	learning rate schedule;
(1)	for to do
(2)	//sample a batch of N image-text pairs
(3)	for do
(4)	//sample image transformations
(5)
(6)	for to do
(7)
(8)
(9)
(10)	end
(11)	//compute the total loss
(12)	end
(13)	//compute the total loss gradient
(14)	//update parameters
(15)	end
(16)	return