Research Article

CCAH: A CLIP-Based Cycle Alignment Hashing Method for Unsupervised Vision-Text Retrieval

Figure 1

As images contain richer higher-order semantic information than text, text retrieval of images usually pays attention to image regions that are consistent with the text representation, resulting in missing vision modal semantics and reduced accuracy of text retrieved images.