Research Article
CCAH: A CLIP-Based Cycle Alignment Hashing Method for Unsupervised Vision-Text Retrieval
Figure 4
We use the text features to construct the adjacency matrix, and then the text features are attention-weighted and summed with the constructed adjacency matrix to obtain the new text representation features, which have clustered the information of the surrounding neighbor nodes.