Unbiased Model-Agnostic Metalearning Algorithm for Learning Target-Driven Visual Navigation Policy

<div>An example of exploring processes with two kinds of termination condition. The episode of navigating to bed ends when the agent observes the target clearly at the nearest viewpoint. Since in the other exploring process the agent fails to gain the full view of the television, the training episode is terminated at step 10,000.</div>

Computational Intelligence and Neuroscience

fig2

Figure 2

Figure 2: Unbiased Model-Agnostic Metalearning Algorithm for Learning Target-Driven Visual Navigation Policy