Research Article

Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach

Figure 4

(a) The effect of nonoptimality in demonstrations. The experiment setting: initial nonoptimal demonstration “A5” with 60% optimality and 100 demonstrations in the first stage (see Figure 3). (b) The case where all demonstrations in the first stage are optimal but sparse. The experiment setting: initial sparse demonstration “B2” and 20 demonstrations in the first stage (see Figure 3). Two kinds of data are provided during the experiment: evaluative feedbacks related to the , , and policy combination (first horizontal axis), and state-action pairs in extraoptimal demonstrations used in the (second horizontal axis).
(a)
(b)