| Evaluation type | Metric | # of studies | Studies |
| Evaluation of generated responses | BLEU | 24 | [17, 39, 47, 49–53, 55, 57–59, 64, 65, 68, 69, 72–76, 79, 83] | Perplexity | 20 | [17, 36, 48, 49, 51, 52, 54, 55, 60–67, 72, 73, 75, 79] | Distinct-1 grams Distinct-2 grams | 12 | [36, 50, 53, 56–59, 64, 67, 74, 76, 80] | ROUGE | 5 | [39, 52, 54, 59, 66] | METEO | 4 | [39, 48, 52, 66] |
| Evaluation of emotions | F1 | 5 | [47, 60, 71, 78, 80] | Precision | 9 | [17, 47, 48, 60, 67, 69, 71, 78, 84] | Recall | 8 | [47, 48, 60, 67, 69, 71, 78, 84] | Accuracy | 22 | [8, 17, 48–52, 55, 57, 58, 61, 66, 68, 71–73, 75, 76, 78, 80, 81, 84] |
|
|