Research Article
PTF-SimCM: A Simple Contrastive Model with Polysemous Text Fusion for Visual Similarity Metric
Algorithm 1
PTF-SimCM’s main learning algorithm.
| | Input: | | | set of images with description and distributions of transformations; | | | initial parameters, encoder, multimodal projector, predictor; | | | cross-modal encoder; | | | the number of cross-modal embeddings; | | | optimizer, updates parameter using the loss gradient; | | | total number of optimization steps and batch size; | | | learning rate schedule; | | (1) | for to do | | (2) | //sample a batch of N image-text pairs | | (3) | for do | | (4) | //sample image transformations | | (5) | | | (6) | for to do | | (7) | | | (8) | | | (9) | | | (10) | end | | (11) | //compute the total loss | | (12) | end | | (13) | //compute the total loss gradient | | (14) | //update parameters | | (15) | end | | (16) | return |
|