Research Article
CCAH: A CLIP-Based Cycle Alignment Hashing Method for Unsupervised Vision-Text Retrieval
Figure 2
The entire architecture of our model is represented in the figure above, with the orange region indicating the imaging modality and the green region is the text modality. We construct similarity matrices within and across modalities, and the generated hash matrices are also aligned between modalities to be able to guarantee semantic alignment within modalities, hash encoding, and features across modalities, and hash matrix to hash matrix alignment.