CCAH: A CLIP-Based Cycle Alignment Hashing Method for Unsupervised Vision-Text Retrieval

<div>t-SNE visualization of the data on the Flickr-25K. (a) Original image features. (b) Image encoded feature distribution. (c) Original text features. (d) Text encoded feature distribution. In the figure, the circle (○) and star (<span class="nowrap"><svg height="6.01072pt" id="M67" style="vertical-align:-0.04980993pt" version="1.1" viewbox="-0.0498162 -5.96091 7.75925 6.01072" width="7.75925pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M471 153C471 170 463 194 452 212C400 220 373 229 322 255C373 281 400 290 452 298C463 316 471 339 471 357C456 366 431 371 410 370C377 329 356 310 308 279C311 336 317 364 336 413C326 432 310 451 294 459C279 451 262 432 252 413C271 364 277 336 280 279C232 310 211 329 178 370C157 371 132 367 117 357C117 340 125 316 136 298C188 290 215 281 266 255C215 229 188 220 136 212C125 194 117 171 117 153C132 144 157 139 178 140C211 181 232 200 280 231C277 174 271 146 252 97C262 78 278 59 294 51C309 59 326 78 336 97C317 146 311 174 308 231C356 200 377 181 410 140C431 139 456 143 471 153Z"></path></g></svg>)</span> denote the representation of text and image samples, respectively, and different colors denote the representation with different semantic categories.</div>

International Journal of Intelligent Systems

fig5

Figure 5

Figure 5: CCAH: A CLIP-Based Cycle Alignment Hashing Method for Unsupervised Vision-Text Retrieval