Mathematical Problems in Engineering

Research Article

Research on Cross-Platform Image Recommendation Model Fusing Text Information

Table 1

Benchmarking checklist for methods of multimodal data analysis.


Checklist issues references	SFI: original representations of multimodals are mapped into the same feature space, respectively, to generate single features for multimodals	JFI: joint features are generated for multimodals from their original representations

Rasiwasia et al. [2]	(1) Text and images are mapped into a correlative space by using canonical correlation analysis.	—
	(2) Text and images are mapped into a semantic space by constructing vectors of posterior probabilities between text (images) and document class labels.

Ngiam et al. [6]	Original representations of audios and videos are input to a bimodal autoencoder simultaneously to generate single deep features for audios and videos.	—

Srivastava and Salakhutdinov [7]	—	The joint features for text and images are learned by using a deep Boltzmann machine.

Jia et al. [8]	—	The joint features for two modals of images are generated by utilizing double broad learning and canonical correlation analysis.

Xiong et al. [11]	Original representations of images and audios are mapped into a semantic topic space by using probabilistic latent Semantic Analysis (PLSA).	—

Tang et al. [12]	—	Original representation of signals and images are concatenated directly to generate joint features for support vector machine.

Lin et al. [13]	Networks based on attention mechanism are used to generate single features for text and images.	Single features of text and images are fused by using tensor fusion method to generate joint features.

Our method	(1) Single features of images are the image class labels generated by using convolutional neural network.	(1) Single features of images and text are fused to generate a fusion matrix with elements of “image class label - keyword” pairs.
	(2) Single features of text are keyword vectors generated by using keyword extraction method.	(2) The fusion matrix is transformed to the word embedding matrix by using Word2Vec.