Research Article
PTF-SimCM: A Simple Contrastive Model with Polysemous Text Fusion for Visual Similarity Metric
Table 2
Hyperparameters set in experiments.
| Parameters | Description | Performance comparison | Ablation study |
| | Initial learning rate | 0.05 | 0.05 | | Weight decay | 1e − 4 | 1e-4 | | Momentum | 0.9 | 0.9 | | Dimension of image view features | 2048 | 2048 | | Dimension of cross-modal embedding | 1024 | {512, 1024, 2048} | | Dimension of metric embedding | {64, 128} | {64, 128, 256, 512} | | Number of cross-model embedding | 2 | {1, 2, 3, 4} | | Layers of multimodal projector | 4 | 4 | | Layers of predictor | 2 | 2 |
|
|