Research Article

Beyond Words: An Intelligent Human-Machine Dialogue System with Multimodal Generation and Emotional Comprehension

Table 2

Manual annotation results of different methods (lower values indicating better models).

Ground truth vsAppropriateness (%)Informativeness (%)Emotional (%)

Text-based57.369.359.8
Emotion-based54.262.754.2
Image-Chat58.255.152.8
Ours-Emotion63.261.852.1
Ours-Visual60.460.556.5

Ours54.353.446.7

Bold values indicate the best results.