Research Article
PTF-SimCM: A Simple Contrastive Model with Polysemous Text Fusion for Visual Similarity Metric
Algorithm 1
PTF-SimCM’s main learning algorithm.
| Input: | | set of images with description and distributions of transformations; | | initial parameters, encoder, multimodal projector, predictor; | | cross-modal encoder; | | the number of cross-modal embeddings; | | optimizer, updates parameter using the loss gradient; | | total number of optimization steps and batch size; | | learning rate schedule; | (1) | for to do | (2) | //sample a batch of N image-text pairs | (3) | for do | (4) | //sample image transformations | (5) | | (6) | for to do | (7) | | (8) | | (9) | | (10) | end | (11) | //compute the total loss | (12) | end | (13) | //compute the total loss gradient | (14) | //update parameters | (15) | end | (16) | return |
|